Re: [PR bootstrap/56750] implement --disable-stage1-static-libs

2018-02-08 Thread Aldy Hernandez
On Thu, Feb 8, 2018 at 6:00 PM, Jeff Law  wrote:
> On 02/08/2018 04:42 AM, Aldy Hernandez wrote:

> You are a brave soul.

I would've preferred to close as WONTFIX from the gun, but no one
commented, so  I tried my hand at a patch :).

>
> Every time I look at 56750 I shake my head and say it's not worth the
> pain to untangle (sorry Mike).
>
> There's no single solution here that will satisfy everyone that I'm
> aware of.   Given that there is a workaround that I think will work for
> the problems Mike is trying to address, I think this is a WONTFIX.
> Another configury option just adds more complexity here that we're not
> likely to test and is likely to bitrot over time.
>
> Go ahead with WONTFIX and if Mike complains, point him at me :-)

No complaints from me.  The less overlapping options the better.

Thanks for looking at this.
Aldy


[PATCH v3] Disable reg offset in quad-word store for Falkor

2018-02-08 Thread Siddhesh Poyarekar
Hi,

Here's v3 of the patch to disable register offset addressing mode for
stores of 128-bit values on Falkor because they're very costly.
Following Kyrill's suggestion, I compared the codegen for a53 and
found that the codegen was quite different.  Jim's original patch is
the most minimal compromise for this and is also a cleaner temporary
fix before I attempt to split address costs into loads and stores for
gcc9.

So v3 is essentially a very slightly cleaned up version of v1 again,
this time with confirmation that there are no codegen changes in
CPU2017 on non-falkor builds; only the codegen for -mcpu=falkor is
different.



On Falkor, because of an idiosyncracy of how the pipelines are
designed, a quad-word store using a reg+reg addressing mode is almost
twice as slow as an add followed by a quad-word store with a single
reg addressing mode.  So we get better performance if we disallow
addressing modes using register offsets with quad-word stores.  This
is the most minimal change for gcc8, I will volunteer to make a more
lasting change for gcc9 where I split the addressing mode costs into
loads and stores wherever possible and needed.

This patch improves fpspeed by 0.17% and intspeed by 0.62% in CPU2017,
with xalancbmk_s (3.84%) wrf_s (1.46%) and mcf_s (1.62%) being the
biggest winners.  There were no regressions beyond 0.4%.

2018-xx-xx  Jim Wilson  
Kugan Vivenakandarajah  
Siddhesh Poyarekar  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_movti_target_operand_p):
New.
* config/aarch64/aarch64-simd.md (aarch64_simd_mov): Use Utf.
* config/aarch64/aarch64-tuning-flags.def
(SLOW_REGOFFSET_QUADWORD_STORE): New.
* config/aarch64/aarch64.c (qdf24xx_tunings): Add
SLOW_REGOFFSET_QUADWORD_STORE to tuning flags.
(aarch64_movti_target_operand_p): New.
* config/aarch64/aarch64.md (movti_aarch64): Use Utf.
(movtf_aarch64): Likewise.
* config/aarch64/constraints.md (Utf): New.

gcc/testsuite
* gcc.target/aarch64/pr82533.c: New test case.
---
 gcc/config/aarch64/aarch64-protos.h |  1 +
 gcc/config/aarch64/aarch64-simd.md  |  4 ++--
 gcc/config/aarch64/aarch64-tuning-flags.def |  4 
 gcc/config/aarch64/aarch64.c| 14 +-
 gcc/config/aarch64/aarch64.md   |  8 
 gcc/config/aarch64/constraints.md   |  6 ++
 gcc/testsuite/gcc.target/aarch64/pr82533.c  | 11 +++
 7 files changed, 41 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr82533.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index cda2895d28e..5a0323deb1e 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -433,6 +433,7 @@ bool aarch64_simd_mem_operand_p (rtx);
 bool aarch64_sve_ld1r_operand_p (rtx);
 bool aarch64_sve_ldr_operand_p (rtx);
 bool aarch64_sve_struct_memory_operand_p (rtx);
+bool aarch64_movti_target_operand_p (rtx);
 rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool);
 rtx aarch64_tls_get_addr (void);
 tree aarch64_fold_builtin (tree, int, tree *, bool);
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 3d1f6a01cb7..f7daac3e28d 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -131,9 +131,9 @@
 
 (define_insn "*aarch64_simd_mov"
   [(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, Umq,  m,  w, ?r, ?w, ?r, w")
+   "=w, Umq, Utf,  w, ?r, ?w, ?r, w")
(match_operand:VQ 1 "general_operand"
-   "m,  Dz, w,  w,  w,  r,  r, Dn"))]
+   "m,  Dz,w,  w,  w,  r,  r, Dn"))]
   "TARGET_SIMD
&& (register_operand (operands[0], mode)
|| aarch64_simd_reg_or_zero (operands[1], mode))"
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index ea9ead234cb..04baf5b6de6 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -41,4 +41,8 @@ AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", 
SLOW_UNALIGNED_LDPW)
are not considered cheap.  */
 AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
 
+/* Don't use a register offset in a memory address for a quad-word store.  */
+AARCH64_EXTRA_TUNING_OPTION ("slow_regoffset_quadword_store",
+SLOW_REGOFFSET_QUADWORD_STORE)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 228fd1b908d..c0a05598415 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -875,7 +875,7 @@ static const struct tune_params qdf24xx_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   

[PATCH] Avoid BSWAP with floating point modes on rs6000 (PR target/84226)

2018-02-08 Thread Jakub Jelinek
Hi!

BSWAP is documented as:

@item (bswap:@var{m} @var{x})
Represents the value @var{x} with the order of bytes reversed, carried out
in mode @var{m}, which must be a fixed-point machine mode.
The mode of @var{x} must be @var{m} or @code{VOIDmode}.

The fixed-point is something used widely in rtl.texi and is very confusing
now that we have FIXED_POINT_TYPE floats, I assume it talks about integral
modes or, because it is also used with vector modes, about INTEGRAL_MODE_P
modes.  My understanding is that bswap on a vector integral mode is meant to
be bswap of each element individually.

The rs6000 backend uses bswap not just on scalar integral modes and
vector integral modes, but also on V4SF and V2DF, but ICEs in simplify-rtx.c
where we don't expect bswap to be used on SF/DFmode (vector bswap is handled
as bswap of each element).

The following patch adjusts the rs6000 backend to use well defined bswaps
on corresponding integral modes instead (and also does what we've done in
i386 backend years ago, avoid subreg on the lhs because it breaks combine
attempts to optimize it).

Or do we want to change documentation and simplify-rtx.c and whatever else
in the middle-end to also make floating point bswaps well defined?  How
exactly, as bswaps of the underlying bits, i.e. for simplify-rtx.c as
subreg of the constant to a corresponding integral mode, bswap in it and
subreg back?  IMHO changing the rs6000 backend is easier and defining what
exactly means a floating point bswap may be hard to understand.

Bootstrapped/regtested on powerpc64{,le}-linux, on powerpc64-linux including
-m64/-m32, ok for trunk?

2018-02-09  Jakub Jelinek  

PR target/84226
* config/rs6000/vsx.md (p9_xxbrq_v16qi): Change input operand
constraint from =wa to wa.  Avoid a subreg on the output operand,
instead use a pseudo and subreg it in a move.
(p9_xxbrd_): Changed to ...
(p9_xxbrd_v2di): ... this insn, without VSX_D iterator.
(p9_xxbrd_v2df): New expander.
(p9_xxbrw_): Changed to ...
(p9_xxbrw_v4si): ... this insn, without VSX_W iterator.
(p9_xxbrw_v4sf): New expander.

* gcc.target/powerpc/pr84226.c: New test.

--- gcc/config/rs6000/vsx.md.jj 2018-01-22 23:57:21.299779544 +0100
+++ gcc/config/rs6000/vsx.md2018-02-08 17:21:13.197642776 +0100
@@ -5311,35 +5311,60 @@ (define_insn "p9_xxbrq_v1ti"
 
 (define_expand "p9_xxbrq_v16qi"
   [(use (match_operand:V16QI 0 "vsx_register_operand" "=wa"))
-   (use (match_operand:V16QI 1 "vsx_register_operand" "=wa"))]
+   (use (match_operand:V16QI 1 "vsx_register_operand" "wa"))]
   "TARGET_P9_VECTOR"
 {
-  rtx op0 = gen_lowpart (V1TImode, operands[0]);
+  rtx op0 = gen_reg_rtx (V1TImode);
   rtx op1 = gen_lowpart (V1TImode, operands[1]);
   emit_insn (gen_p9_xxbrq_v1ti (op0, op1));
+  emit_move_insn (operands[0], gen_lowpart (V16QImode, op0));
   DONE;
 })
 
 ;; Swap all bytes in each 64-bit element
-(define_insn "p9_xxbrd_"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
-   (bswap:VSX_D (match_operand:VSX_D 1 "vsx_register_operand" "wa")))]
+(define_insn "p9_xxbrd_v2di"
+  [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa")
+   (bswap:V2DI (match_operand:V2DI 1 "vsx_register_operand" "wa")))]
   "TARGET_P9_VECTOR"
   "xxbrd %x0,%x1"
   [(set_attr "type" "vecperm")])
 
+(define_expand "p9_xxbrd_v2df"
+  [(use (match_operand:V2DF 0 "vsx_register_operand" "=wa"))
+   (use (match_operand:V2DF 1 "vsx_register_operand" "wa"))]
+  "TARGET_P9_VECTOR"
+{
+  rtx op0 = gen_reg_rtx (V2DImode);
+  rtx op1 = gen_lowpart (V2DImode, operands[1]);
+  emit_insn (gen_p9_xxbrd_v2di (op0, op1));
+  emit_move_insn (operands[0], gen_lowpart (V2DFmode, op0));
+  DONE;
+})
+
 ;; Swap all bytes in each 32-bit element
-(define_insn "p9_xxbrw_"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
-   (bswap:VSX_W (match_operand:VSX_W 1 "vsx_register_operand" "wa")))]
+(define_insn "p9_xxbrw_v4si"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa")
+   (bswap:V4SI (match_operand:V4SI 1 "vsx_register_operand" "wa")))]
   "TARGET_P9_VECTOR"
   "xxbrw %x0,%x1"
   [(set_attr "type" "vecperm")])
 
+(define_expand "p9_xxbrw_v4sf"
+  [(use (match_operand:V4SF 0 "vsx_register_operand" "=wa"))
+   (use (match_operand:V4SF 1 "vsx_register_operand" "wa"))]
+  "TARGET_P9_VECTOR"
+{
+  rtx op0 = gen_reg_rtx (V4SImode);
+  rtx op1 = gen_lowpart (V4SImode, operands[1]);
+  emit_insn (gen_p9_xxbrw_v4si (op0, op1));
+  emit_move_insn (operands[0], gen_lowpart (V4SFmode, op0));
+  DONE;
+})
+
 ;; Swap all bytes in each element of vector
 (define_expand "revb_"
-  [(set (match_operand:VEC_REVB 0 "vsx_register_operand")
-   (bswap:VEC_REVB (match_operand:VEC_REVB 1 "vsx_register_operand")))]
+  [(use (match_operand:VEC_REVB 0 "vsx_register_operand"))
+   (use (match_operand:VEC_REVB 1 "vsx_register_operand"))]
   ""
 {
   if (TARGET_P9_VECTOR)
--- 

Re: [PATCH] Fix driver -fsanitize= handling with -static (PR sanitizer/84285)

2018-02-08 Thread Richard Biener
On February 9, 2018 7:27:57 AM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>When linking with -static-lib{a,t,l,ub}san -fsanitize=whatever, we link
>-l{a,t,l,ub}san statically into the binary and add the set of libraries
>needed by the lib{a,t,l,ub}san.a afterwards (-lpthread, -ldl etc.).
>When doing -static -fsanitize=whatever link, we link the sanitizer
>library
>statically too due to -static, so we need to also ensure we append
>the libraries needed by that.
>
>Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

OK. 

Richard. 

>2018-02-09  Jakub Jelinek  
>
>   PR sanitizer/84285
>   * gcc.c (STATIC_LIBASAN_LIBS, STATIC_LIBTSAN_LIBS,
>   STATIC_LIBLSAN_LIBS, STATIC_LIBUBSAN_LIBS): Handle -static like
>   -static-lib*san.
>
>--- gcc/gcc.c.jj   2018-01-09 09:01:39.017042679 +0100
>+++ gcc/gcc.c  2018-02-08 15:56:13.361836160 +0100
>@@ -684,7 +684,7 @@ proper position among the other output f
> 
> #ifndef LIBASAN_SPEC
> #define STATIC_LIBASAN_LIBS \
>-  " %{static-libasan:%:include(libsanitizer.spec)%(link_libasan)}"
>+  "
>%{static-libasan|static:%:include(libsanitizer.spec)%(link_libasan)}"
> #ifdef LIBASAN_EARLY_SPEC
> #define LIBASAN_SPEC STATIC_LIBASAN_LIBS
> #elif defined(HAVE_LD_STATIC_DYNAMIC)
>@@ -702,7 +702,7 @@ proper position among the other output f
> 
> #ifndef LIBTSAN_SPEC
> #define STATIC_LIBTSAN_LIBS \
>-  " %{static-libtsan:%:include(libsanitizer.spec)%(link_libtsan)}"
>+  "
>%{static-libtsan|static:%:include(libsanitizer.spec)%(link_libtsan)}"
> #ifdef LIBTSAN_EARLY_SPEC
> #define LIBTSAN_SPEC STATIC_LIBTSAN_LIBS
> #elif defined(HAVE_LD_STATIC_DYNAMIC)
>@@ -720,7 +720,7 @@ proper position among the other output f
> 
> #ifndef LIBLSAN_SPEC
> #define STATIC_LIBLSAN_LIBS \
>-  " %{static-liblsan:%:include(libsanitizer.spec)%(link_liblsan)}"
>+  "
>%{static-liblsan|static:%:include(libsanitizer.spec)%(link_liblsan)}"
> #ifdef LIBLSAN_EARLY_SPEC
> #define LIBLSAN_SPEC STATIC_LIBLSAN_LIBS
> #elif defined(HAVE_LD_STATIC_DYNAMIC)
>@@ -738,7 +738,7 @@ proper position among the other output f
> 
> #ifndef LIBUBSAN_SPEC
> #define STATIC_LIBUBSAN_LIBS \
>-  " %{static-libubsan:%:include(libsanitizer.spec)%(link_libubsan)}"
>+  "
>%{static-libubsan|static:%:include(libsanitizer.spec)%(link_libubsan)}"
> #ifdef HAVE_LD_STATIC_DYNAMIC
> #define LIBUBSAN_SPEC "%{static-libubsan:" LD_STATIC_OPTION \
>"} -lubsan %{static-libubsan:" LD_DYNAMIC_OPTION "}" \
>
>   Jakub



Re: [PATCH] Tweak ssa-dom-cse-2.c testcase (PR tree-optimization/84232)

2018-02-08 Thread Richard Biener
On February 9, 2018 7:31:33 AM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>As mentioned in the PR, DOM/SLP can only handle the case when the
>stores
>of the vector are in the same chunks as the reads from it, so the
>testcase
>has lots of xfails for targets where this doesn't happen.
>On x86 in the generic and many other tunings the vector is stored in
>the
>same chunks as read, but when testing e.g. with -march=silvermont, the
>test
>fails.  Until DOM/SLP is extended to handle this and all the xfails can
>be
>removed, this patch forces -mtune=generic so that the test doesn't fail
>with some tunings.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 

Richard. 

>2018-02-09  Jakub Jelinek  
>
>   PR tree-optimization/84232
>   * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add -mtune-generic on x86.
>
>--- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c.jj   2018-01-30
>12:30:26.394360763 +0100
>+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c  2018-02-08
>16:21:41.506236052 +0100
>@@ -2,8 +2,10 @@
>/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized
>--param sra-max-scalarization-size-Ospeed=32" } */
>/* System Z needs hardware vector support for this to work (the
>optimization
>gets too complex without it.
>-   { dg-additional-options "-march=z13" { target { s390x-*-* } } } */
>-
>+   { dg-additional-options "-march=z13" { target s390x-*-* } } */
>+/* Use generic tuning on x86 for the same reasons as why alpha,
>powerpc etc. are
>+   xfailed below.
>+   { dg-additional-options "-mtune=generic" { target i?86-*-*
>x86_64-*-* } } */
> 
> int
> foo ()
>
>   Jakub



[PATCH] Tweak ssa-dom-cse-2.c testcase (PR tree-optimization/84232)

2018-02-08 Thread Jakub Jelinek
Hi!

As mentioned in the PR, DOM/SLP can only handle the case when the stores
of the vector are in the same chunks as the reads from it, so the testcase
has lots of xfails for targets where this doesn't happen.
On x86 in the generic and many other tunings the vector is stored in the
same chunks as read, but when testing e.g. with -march=silvermont, the test
fails.  Until DOM/SLP is extended to handle this and all the xfails can be
removed, this patch forces -mtune=generic so that the test doesn't fail
with some tunings.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-02-09  Jakub Jelinek  

PR tree-optimization/84232
* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add -mtune-generic on x86.

--- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c.jj2018-01-30 
12:30:26.394360763 +0100
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c   2018-02-08 
16:21:41.506236052 +0100
@@ -2,8 +2,10 @@
 /* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param 
sra-max-scalarization-size-Ospeed=32" } */
 /* System Z needs hardware vector support for this to work (the optimization
gets too complex without it.
-   { dg-additional-options "-march=z13" { target { s390x-*-* } } } */
-
+   { dg-additional-options "-march=z13" { target s390x-*-* } } */
+/* Use generic tuning on x86 for the same reasons as why alpha, powerpc etc. 
are
+   xfailed below.
+   { dg-additional-options "-mtune=generic" { target i?86-*-* x86_64-*-* } } */
 
 int
 foo ()

Jakub


[PATCH] Fix driver -fsanitize= handling with -static (PR sanitizer/84285)

2018-02-08 Thread Jakub Jelinek
Hi!

When linking with -static-lib{a,t,l,ub}san -fsanitize=whatever, we link
-l{a,t,l,ub}san statically into the binary and add the set of libraries
needed by the lib{a,t,l,ub}san.a afterwards (-lpthread, -ldl etc.).
When doing -static -fsanitize=whatever link, we link the sanitizer library
statically too due to -static, so we need to also ensure we append
the libraries needed by that.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-02-09  Jakub Jelinek  

PR sanitizer/84285
* gcc.c (STATIC_LIBASAN_LIBS, STATIC_LIBTSAN_LIBS,
STATIC_LIBLSAN_LIBS, STATIC_LIBUBSAN_LIBS): Handle -static like
-static-lib*san.

--- gcc/gcc.c.jj2018-01-09 09:01:39.017042679 +0100
+++ gcc/gcc.c   2018-02-08 15:56:13.361836160 +0100
@@ -684,7 +684,7 @@ proper position among the other output f
 
 #ifndef LIBASAN_SPEC
 #define STATIC_LIBASAN_LIBS \
-  " %{static-libasan:%:include(libsanitizer.spec)%(link_libasan)}"
+  " %{static-libasan|static:%:include(libsanitizer.spec)%(link_libasan)}"
 #ifdef LIBASAN_EARLY_SPEC
 #define LIBASAN_SPEC STATIC_LIBASAN_LIBS
 #elif defined(HAVE_LD_STATIC_DYNAMIC)
@@ -702,7 +702,7 @@ proper position among the other output f
 
 #ifndef LIBTSAN_SPEC
 #define STATIC_LIBTSAN_LIBS \
-  " %{static-libtsan:%:include(libsanitizer.spec)%(link_libtsan)}"
+  " %{static-libtsan|static:%:include(libsanitizer.spec)%(link_libtsan)}"
 #ifdef LIBTSAN_EARLY_SPEC
 #define LIBTSAN_SPEC STATIC_LIBTSAN_LIBS
 #elif defined(HAVE_LD_STATIC_DYNAMIC)
@@ -720,7 +720,7 @@ proper position among the other output f
 
 #ifndef LIBLSAN_SPEC
 #define STATIC_LIBLSAN_LIBS \
-  " %{static-liblsan:%:include(libsanitizer.spec)%(link_liblsan)}"
+  " %{static-liblsan|static:%:include(libsanitizer.spec)%(link_liblsan)}"
 #ifdef LIBLSAN_EARLY_SPEC
 #define LIBLSAN_SPEC STATIC_LIBLSAN_LIBS
 #elif defined(HAVE_LD_STATIC_DYNAMIC)
@@ -738,7 +738,7 @@ proper position among the other output f
 
 #ifndef LIBUBSAN_SPEC
 #define STATIC_LIBUBSAN_LIBS \
-  " %{static-libubsan:%:include(libsanitizer.spec)%(link_libubsan)}"
+  " %{static-libubsan|static:%:include(libsanitizer.spec)%(link_libubsan)}"
 #ifdef HAVE_LD_STATIC_DYNAMIC
 #define LIBUBSAN_SPEC "%{static-libubsan:" LD_STATIC_OPTION \
 "} -lubsan %{static-libubsan:" LD_DYNAMIC_OPTION "}" \

Jakub


[PATCH] Fix handling of variable length fields in structures (PR c/82210)

2018-02-08 Thread Jakub Jelinek
Hi!

When placing a variable length field into a structure, we need to update
rli->offset_align for the next field.  We do:
rli->offset_align = MIN (rli->offset_align, desired_align);
which updates it according to the start of that VLA field, the problem is
that if the field doesn't have a size that is a multiple of this alignment
rli->offset_align will not reflect properly the alignment of the end of that
field.  E.g. on the testcase, we have a VLA array aligned as a whole (the
field itself) to 16 bytes / 128 bits, so rli->offset_align remains 128.
The array has element size 2 bytes / 16 bits, times function argument,
so the end of the field is worst case aligned just to 16 bits; if we keep
rli->offset_align as 128 for the next field, then DECL_OFFSET_ALIGN is too
large. DECL_FIELD_OFFSET documented as:
/* In a FIELD_DECL, this is the field position, counting in bytes, of the
   DECL_OFFSET_ALIGN-bit-sized word containing the bit closest to the beginning
   of the structure.  */
and when gimplifying COMPONENT_REFs with that field we:
  tree offset = unshare_expr (component_ref_field_offset (t));
  tree field = TREE_OPERAND (t, 1);
  tree factor
= size_int (DECL_OFFSET_ALIGN (field) / BITS_PER_UNIT);
   
  /* Divide the offset by its alignment.  */
  offset = size_binop_loc (loc, EXACT_DIV_EXPR, offset, factor);
and later on multiply it again by DECL_OFFSET_ALIGN.  The EXACT_DIV_EXPR
isn't exact.

Fixed by lowering the rli->offset_align if the size isn't a multiple of
the align.  We don't have a multiple_of_p variant that would compute
highest power of two number the expression is known to be a multiple of,
so I'm just checking the most common case, where the size is a multiple
of the starting alignment, and otherwise just compute it very
conservatively.  This will be lower than necessary say for
  __attribute__((aligned (16))) short field[2 * size];
- just 16 bits instead of 32.  In theory we could do a binary search
on power of two numbers in between that high initial rli->offset_align
for which the first multiple_of_p failed, and the conservative guess
we do to improve it.  If you think it is worth it, I can code it up.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-02-09  Jakub Jelinek  

PR c/82210
* stor-layout.c (place_field): For variable length fields, adjust
offset_align afterwards not just based on the field's alignment,
but also on the size.

* gcc.c-torture/execute/pr82210.c: New test.

--- gcc/stor-layout.c.jj2018-01-16 16:07:57.0 +0100
+++ gcc/stor-layout.c   2018-02-08 13:48:32.380582662 +0100
@@ -1622,6 +1622,30 @@ place_field (record_layout_info rli, tre
= size_binop (PLUS_EXPR, rli->offset, DECL_SIZE_UNIT (field));
   rli->bitpos = bitsize_zero_node;
   rli->offset_align = MIN (rli->offset_align, desired_align);
+
+  if (!multiple_of_p (bitsizetype, DECL_SIZE (field),
+ bitsize_int (rli->offset_align)))
+   {
+ tree type = strip_array_types (TREE_TYPE (field));
+ /* The above adjusts offset_align just based on the start of the
+field.  The field might not have a size that is a multiple of
+that offset_align though.  If the field is an array of fixed
+sized elements, assume there can be any multiple of those
+sizes.  If it is a variable length aggregate or array of
+variable length aggregates, assume worst that the end is
+just BITS_PER_UNIT aligned.  */
+ if (TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
+   {
+ if (TREE_INT_CST_LOW (TYPE_SIZE (type)))
+   {
+ unsigned HOST_WIDE_INT sz
+   = least_bit_hwi (TREE_INT_CST_LOW (TYPE_SIZE (type)));
+ rli->offset_align = MIN (rli->offset_align, sz);
+   }
+   }
+ else
+   rli->offset_align = MIN (rli->offset_align, BITS_PER_UNIT);
+   }
 }
   else if (targetm.ms_bitfield_layout_p (rli->t))
 {
--- gcc/testsuite/gcc.c-torture/execute/pr82210.c.jj2018-02-08 
13:59:37.247901958 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr82210.c   2018-02-08 
13:59:14.185912469 +0100
@@ -0,0 +1,26 @@
+/* PR c/82210 */
+
+void
+foo (int size)
+{
+  int i;
+  struct S {
+__attribute__((aligned (16))) struct T { short c; } a[size];
+int b[size];
+  } s;
+
+  for (i = 0; i < size; i++)
+s.a[i].c = 0x1234;
+  for (i = 0; i < size; i++)
+s.b[i] = 0;
+  for (i = 0; i < size; i++)
+if (s.a[i].c != 0x1234 || s.b[i] != 0)
+  __builtin_abort ();
+}
+
+int
+main ()
+{
+  foo (15);
+  return 0;
+}

Jakub


PR84300, ICE in dwarf2cfi on ppc64le

2018-02-08 Thread Alan Modra
This PR is one of those with a really obvious cause, and fix.  There's
nothing in the unspec rtl to say the insn needs lr!
Bootstrapped and regression tested powerpc64le-linux.  OK?

PR target/84300
gcc/
* config/rs6000/rs6000.md (split_stack_return): Use LR.
gcc/testsuite/
* gcc.dg/pr84300.c: New.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 33f0d95..287461f 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -13359,7 +13359,8 @@ (define_insn "load_split_stack_limit_si"
 ;; Use r0 to stop regrename twiddling with lr restore insns emitted
 ;; after the call to __morestack.
 (define_insn "split_stack_return"
-  [(unspec_volatile [(use (reg:SI 0))] UNSPECV_SPLIT_STACK_RETURN)]
+  [(unspec_volatile [(use (reg:SI 0)) (use (reg:SI LR_REGNO))]
+   UNSPECV_SPLIT_STACK_RETURN)]
   ""
   "blr"
   [(set_attr "type" "jmpreg")])
diff --git a/gcc/testsuite/gcc.dg/pr84300.c b/gcc/testsuite/gcc.dg/pr84300.c
new file mode 100644
index 000..6016799
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr84300.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target split_stack } */
+/* { dg-options "-g -O2 -fsplit-stack -fno-omit-frame-pointer" } */
+
+void trap () { __builtin_trap (); }

-- 
Alan Modra
Australia Development Lab, IBM


Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers

2018-02-08 Thread Jeff Law
On 02/08/2018 08:53 PM, Alan Modra wrote:
> On Fri, Feb 09, 2018 at 01:21:27AM -0200, Alexandre Oliva wrote:
>> Here's what I checked in, right after the LVU patch.
>>
>> [IEPM] Introduce inline entry point markers
> 
> One of these two patches breaks ppc64le bootstrap with the assembler
> complaining "Error: view number mismatch" when compiling
> libdecnumber.
> 
I've just passed along a similar failure (.i, .s and command line
options) to Alex for ppc64 (be) building glibc.

Jeff


Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers

2018-02-08 Thread Alan Modra
On Fri, Feb 09, 2018 at 01:21:27AM -0200, Alexandre Oliva wrote:
> Here's what I checked in, right after the LVU patch.
> 
> [IEPM] Introduce inline entry point markers

One of these two patches breaks ppc64le bootstrap with the assembler
complaining "Error: view number mismatch" when compiling
libdecnumber.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views

2018-02-08 Thread Alexandre Oliva
On Feb  8, 2018, Jason Merrill  wrote:

> On Thu, Feb 8, 2018 at 7:56 AM, Alexandre Oliva  wrote:
>> On Feb  7, 2018, Jason Merrill  wrote:
>> 
>>> OK, that makes sense.  But I'm still uncomfortable with choosing an
>>> existing opcode for that purpose, which previously would have been
>>> chosen just for reasons of encoding complexity and size.
>> 
>> Well, there's a good reason we didn't used to output this opcode: it's
>> nearly always the case that you're better off using a special opcode or
>> DW_LNS_advance_pc, that encodes the offset as uleb128 instead of a fixed
>> size.  The only exceptions I can think of are offsets that have the most
>> significant bits set in the representable range for
>> DW_LNS_fixed_advance_pc (the uleb128 representation for
>> DW_LNS_advance_pc would end up taking an extra byte if insns don't get
>> more than byte alignment), and VLIW machines, in which the
>> DW_LNS_advance_pc operand needs to be multiplied by the ops-per-insns
>> (but also divided by the min-insn-length).  So, unless you're creating a
>> gap of 16KiB to 64KiB in the middle of a function on an ISA such as
>> x86*, that has insns as small as 1 byte, you'll only use
>> DW_LNS_fixed_advance_pc when the assembler can't encode uleb128 offsets,
>> as stated in the DWARF specs.

> Which is often true of non-gas assemblers, isn't it?

Uhh...  I don't know.  Anyway, without gas, we probably won't have view
supports in .loc directives either, and then we'll have to emit the line
number program ourselves, in which case we could (if we have accurate
insn lengths) compute offsets and output special opcodes or
byte-expanded uleb128 literals, even if the assembler doesn't support
leb128.  Sure enough, if we don't have accurate insn lengths, we'll have
to resort to something else; for small increments without assembler
support for uleb128, DW_LNS_fixed_advance_pc would probably be a more
compact option than DW_LNS_set_address, if we're sure the increment is
small enough.

Even then, if we were to do something along these lines, when we know
the offset is nonzero (i.e., we want to reset the view), we could output
DW_LNS_fixed_advance_pc with the desired offset minus 1, followed by a
special opcode that advances PC (resetting the view count) and emitting
the line table entry, rather than using DW_LNS_fixed_advance_pc with the
offset, followed by DW_LNS_copy to output the line number table without
resetting the view.  Same size, with the choice of resetting or not the
view counter.  As for when we don't know whether the offset could be
zero, we have only one choice: not resetting the view counter.

> So, if I've got this right: The most conservative approach to updating
> the address is DW_LNE_set_address, but we definitely want that to
> reset the view because it's used for e.g. starting a new function.

Yup.

> And if it resets the view, we need to be careful not to use it more
> than once for the same address.

I guess it's ok if we do that e.g. once as the one-past-the-end of a
function, and then again as the entry of another function, but other
than that, yeah, it shouldn't happen.

  [...] 
> And since we don't know whether the increment will be zero, we
> don't want it to reset the view.

Yup

> OK, that makes sense.  Though I expect this will come up again when
> the DWARF committee looks at the proposal.

Most likely.  The reasoning that led me down this path was quite
intricate indeed.

Please let it be known that I'd be glad to join the committee's
conversation when they're to discuss this proposal, if that would be of
any help.

Thanks,

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [SFN+LVU+IEPM v4 9/9] [IEPM] Introduce inline entry point markers

2018-02-08 Thread Alexandre Oliva
On Jan 25, 2018, Alexandre Oliva  wrote:

> On Jan 24, 2018, Jakub Jelinek  wrote:
>> I think this asks for
>> if (flag_checking)
>> gcc_assert (block_within_block_p (block,
>> DECL_INITIAL (current_function_decl),
>> true));

> 'k, changed.

>> Otherwise the patch looks reasonable to me, but I think it depends on the
>> 7/9.

> Thanks, yeah, it very much does.  It *might* be possible to split out
> the dependency, but...  it would just take most of the LVU patch with it
> ;-)

Here's what I checked in, right after the LVU patch.

[IEPM] Introduce inline entry point markers

Output DW_AT_entry_pc based on markers.

Introduce DW_AT_GNU_entry_view as a DWARF extension.

If views are enabled are we're not in strict compliance mode, output
DW_AT_GNU_entry_view if it might be nonzero.

This patch depends on SFN and LVU patchsets, and on the IEPM patch that
introduces the inline_entry debug hook.

for  include/ChangeLog

* dwarf2.def (DW_AT_GNU_entry_view): New.

for  gcc/ChangeLog

* cfgexpand.c (expand_gimple_basic_block): Handle inline entry
markers.
* dwarf2out.c (dwarf2_debug_hooks): Enable inline_entry hook.
(BLOCK_INLINE_ENTRY_LABEL): New.
(dwarf2out_var_location): Disregard inline entry markers.
(inline_entry_data): New struct.
(inline_entry_data_hasher): New hashtable type.
(inline_entry_data_hasher::hash): New.
(inline_entry_data_hasher::equal): New.
(inline_entry_data_table): New variable.
(add_high_low_attributes): Add DW_AT_entry_pc and
DW_AT_GNU_entry_view attributes if a pending entry is found
in inline_entry_data_table.  Add old entry_pc attribute only
if debug nonbinding markers are disabled.
(gen_inlined_subroutine_die): Set BLOCK_DIE if nonbinding
markers are enabled.
(block_within_block_p, dwarf2out_inline_entry): New.
(dwarf2out_finish): Check that no entries remained in
inline_entry_data_table.
* final.c (reemit_insn_block_notes): Handle inline entry notes.
(final_scan_insn, notice_source_line): Likewise.
(rest_of_clean_state): Skip inline entry markers.
* gimple-pretty-print.c (dump_gimple_debug): Handle inline entry
markers.
* gimple.c (gimple_build_debug_inline_entry): New.
* gimple.h (enum gimple_debug_subcode): Add
GIMPLE_DEBUG_INLINE_ENTRY.
(gimple_build_debug_inline_entry): Declare.
(gimple_debug_inline_entry_p): New.
(gimple_debug_nonbind_marker_p): Adjust.
* insn-notes.def (INLINE_ENTRY): New.
* print-rtl.c (rtx_writer::print_rtx_operand_code_0): Handle
inline entry marker notes.
(print_insn): Likewise.
* rtl.h (NOTE_MARKER_P): Add INLINE_ENTRY support.
(INSN_DEBUG_MARKER_KIND): Likewise.
(GEN_RTX_DEBUG_MARKER_INLINE_ENTRY_PAT): New.
* tree-inline.c (expand_call_inline): Build and insert
debug_inline_entry stmt.
* tree-ssa-live.c (remove_unused_scope_block_p): Preserve
inline entry blocks early, if nonbind markers are enabled.
(dump_scope_block): Dump fragment info.
* var-tracking.c (reemit_marker_as_note): Handle inline entry note.
* doc/gimple.texi (gimple_debug_inline_entry_p): New.
(gimple_build_debug_inline_entry): New.
* doc/invoke.texi (gstatement-frontiers, gno-statement-frontiers):
Enable/disable inline entry points too.
* doc/rtl.texi (NOTE_INSN_INLINE_ENTRY): New.
(DEBUG_INSN): Describe inline entry markers.
---
 gcc/cfgexpand.c   |9 ++
 gcc/doc/gimple.texi   |   18 
 gcc/doc/rtl.texi  |   24 -
 gcc/dwarf2out.c   |  199 -
 gcc/final.c   |   26 ++
 gcc/gimple-pretty-print.c |   13 +++
 gcc/gimple.c  |   21 +
 gcc/gimple.h  |   18 
 gcc/insn-notes.def|4 +
 gcc/print-rtl.c   |5 +
 gcc/rtl.h |7 +-
 gcc/tree-inline.c |7 ++
 gcc/tree-ssa-live.c   |   27 +-
 gcc/var-tracking.c|1 
 include/dwarf2.def|1 
 15 files changed, 364 insertions(+), 16 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 2ee6fbac2e30..deab9296001a 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5731,6 +5731,15 @@ expand_gimple_basic_block (basic_block bb, bool 
disable_tail_calls)
goto delink_debug_stmt;
  else if (gimple_debug_begin_stmt_p (stmt))
val = GEN_RTX_DEBUG_MARKER_BEGIN_STMT_PAT ();
+ else if (gimple_debug_inline_entry_p (stmt))
+   {
+ tree block = gimple_block (stmt);
+
+ if (block)
+   val = GEN_RTX_DEBUG_MARKER_INLINE_ENTRY_PAT ();
+ else
+   goto delink_debug_stmt;
+

Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views

2018-02-08 Thread Alexandre Oliva
On Feb  8, 2018, Jason Merrill  wrote:

> On 02/07/2018 02:36 AM, Alexandre Oliva wrote:
>> +/* Output symbol LAB1 as an unsigned LEB128 quantity.  */

> Let's mention here that the value of LAB1 must be an assemble-time
> constant (such as a view counter), since we can't have LEB128
> relocations.

/* Output symbol LAB1 as an unsigned LEB128 quantity.  LAB1 should be
   an assembler-computed constant, e.g. a view number, because we
   can't have relocations in LEB128 quantities.  */

> With that, the patch looks OK.

Thanks, here's what I checked in.


[LVU] Introduce location views

This patch introduces an option to enable the generation of location
views along with location lists.  The exact format depends on the
DWARF version: it can be a separate attribute (DW_AT_GNU_locviews) or
(DW_LLE_view_pair) entries in DWARF5+ loclists.

Line number tables are also affected.  If the assembler is found, at
compiler build time, to support .loc views, we use them and
assembler-computed view labels, otherwise we output compiler-generated
line number programs with conservatively-computed view labels.  In
either case, we output view information next to line number changes
when verbose assembly output is requested.

This patch requires an LVU patch that modifies the exported API of
final_scan_insn.  It also expects the entire SFN patchset to be
installed first, although SFN is not a requirement for LVU.

for  include/ChangeLog

* dwarf2.def (DW_AT_GNU_locviews): New.
* dwarf2.h (enum dwarf_location_list_entry_type): Add
DW_LLE_GNU_view_pair.
(DW_LLE_view_pair): Define.

for  gcc/ChangeLog

* common.opt (gvariable-location-views): New.
(gvariable-location-views=incompat5): New.
* config.in: Rebuilt.
* configure: Rebuilt.
* configure.ac: Test assembler for view support.
* dwarf2asm.c (dw2_asm_output_symname_uleb128): New.
* dwarf2asm.h (dw2_asm_output_symname_uleb128): Declare.
* dwarf2out.c (var_loc_view): New typedef.
(struct dw_loc_list_struct): Add vl_symbol, vbegin, vend.
(dwarf2out_locviews_in_attribute): New.
(dwarf2out_locviews_in_loclist): New.
(dw_val_equal_p): Compare val_view_list of dw_val_class_view_lists.
(enum dw_line_info_opcode): Add LI_adv_address.
(struct dw_line_info_table): Add view.
(RESET_NEXT_VIEW, RESETTING_VIEW_P): New macros.
(DWARF2_ASM_VIEW_DEBUG_INFO): Define default.
(zero_view_p): New variable.
(ZERO_VIEW_P): New macro.
(output_asm_line_debug_info): New.
(struct var_loc_node): Add view.
(add_AT_view_list, AT_loc_list): New.
(add_var_loc_to_decl): Add view param.  Test it against last.
(new_loc_list): Add view params.  Record them.
(AT_loc_list_ptr): Handle loc and view lists.
(view_list_to_loc_list_val_node): New.
(print_dw_val): Handle dw_val_class_view_list.
(size_of_die): Likewise.
(value_format): Likewise.
(loc_list_has_views): New.
(gen_llsym): Set vl_symbol too.
(maybe_gen_llsym, skip_loc_list_entry): New.
(dwarf2out_maybe_output_loclist_view_pair): New.
(output_loc_list): Output view list or entries too.
(output_view_list_offset): New.
(output_die): Handle dw_val_class_view_list.
(output_dwarf_version): New.
(output_compilation_unit_header): Use it.
(output_skeleton_debug_sections): Likewise.
(output_rnglists, output_line_info): Likewise.
(output_pubnames, output_aranges): Update version comments.
(output_one_line_info_table): Output view numbers in asm comments.
(dw_loc_list): Determine current endview, pass it to new_loc_list.
Call maybe_gen_llsym.
(loc_list_from_tree_1): Adjust.
(add_AT_location_description): Create view list attribute if
needed, check it's absent otherwise.
(convert_cfa_to_fb_loc_list): Adjust.
(maybe_emit_file): Call output_asm_line_debug_info for test.
(dwarf2out_var_location): Reset views as needed.  Precompute
add_var_loc_to_decl args.  Call get_attr_min_length only if we have the
attribute.  Set view.
(new_line_info_table): Reset next view.
(set_cur_line_info_table): Call output_asm_line_debug_info for test.
(dwarf2out_source_line): Likewise.  Output view resets and labels to
the assembler, or select appropriate line info opcodes.
(prune_unused_types_walk_attribs): Handle dw_val_class_view_list.
(optimize_string_length): Catch it.  Adjust.
(resolve_addr): Copy vl_symbol along with ll_symbol.  Handle
dw_val_class_view_list, and remove it if no longer needed.
(hash_loc_list): Hash view numbers.
(loc_list_hasher::equal): Compare them.
(optimize_location_lists): Check whether a view list symbol is
needed, and 

Re: [PATCH] correct -Wrestrict handling of arrays of arrays (PR 84095)

2018-02-08 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00076.html

On 02/01/2018 04:45 PM, Martin Sebor wrote:

The previous patch didn't resolve all the false positives
in the Linux kernel.  The attached is an update that fixes
the remaining one having to do with multidimensional array
members:

  struct S { char a[2][4]; };

  void f (struct S *p, int i)
  {
strcpy (p->a[0], "012");
strcpy (p->a[i] + 1, p->a[0]);   // false positive here
  }

In the process of fixing this I also made a couple of minor
restructuring changes to the builtin_memref constructor to
in order to make the code easier to follow: I broke it out
into a couple of helper functions and called those.

As with the first revision of the patch, this one is also
meant to be applied on top of

  https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01488.html

Sorry about the late churn.  Even though I tested the original
implementation with the Linux kernel the bugs were only exposed
non-default configurations that I didn't build.

Jakub, you had concerns about the code in the constructor
and about interpreting the offsets in the diagnostics.
I tried to address those in the patch.  Please review
the changes and let me know if you have any further comments.

Thanks
Martin

On 01/30/2018 04:19 PM, Martin Sebor wrote:

Testing GCC 8 with recent Linux kernel sources has uncovered
a bug in the handling of arrays of arrays by the -Wrestrict
checker where it fails to take references to different array
elements into consideration, issuing false positives.

The attached patch corrects this mistake.

In addition, to make warnings involving excessive offset bounds
more meaningful (less confusing), I've made a cosmetic change
to constrain them to the bounds of the accessed object.  I've
done this in response to multiple comments indicating that
the warnings are hard to interpret.  This change is meant to
be applied on top of the patch for bug 83698 (submitted mainly
to improve the readability of the offsets):

  https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01488.html

Martin






[C++] [PR84231] overload on cond_expr in template

2018-02-08 Thread Alexandre Oliva
A non-type-dependent COND_EXPR within a template is reconstructed with
the original operands, after one with non-dependent proxies is built to
determine its result type.  This is problematic because the operands of
a COND_EXPR determined to be an rvalue may have been converted to denote
their rvalue nature.  The reconstructed one, however, won't have such
conversions, so lvalue_kind may not recognize it as an rvalue, which may
lead to e.g. incorrect overload resolution decisions.

If we mistake such a COND_EXPR for an lvalue, overload resolution might
regard a conversion sequence that binds it to a non-const reference as
viable, and then select that over one that binds it to a const
reference.  Only after template substitution would we rebuild the
COND_EXPR, realize it is an rvalue, and conclude the reference binding
is ill-formed, but at that point we'd have long discarded any alternate
candidates we could have used.

This patch verifies that, if a non-type-dependent COND_EXPRs is
recognized as an rvalues, so is the to-be-template-substituted one
created in its stead, so that overload resolution selects the correct
alternative.

Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?

for  gcc/cp/ChangeLog

PR c++/84231
* typeck.c (build_x_conditional_expr): Make sure the
to-be-tsubsted expr is an rvalue when it should.

for  gcc/testsuite/g++.dg/ChangeLog

PR c++/84231
* pr84231.C: New.
---
 gcc/cp/typeck.c|   16 +++-
 gcc/testsuite/g++.dg/pr84231.C |   29 +
 2 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/pr84231.C

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 83e767829986..25ac44e57772 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -6560,12 +6560,26 @@ build_x_conditional_expr (location_t loc, tree ifexp, 
tree op1, tree op2,
   expr = build_conditional_expr (loc, ifexp, op1, op2, complain);
   if (processing_template_decl && expr != error_mark_node)
 {
+  bool rval = !glvalue_p (expr);
   tree min = build_min_non_dep (COND_EXPR, expr,
orig_ifexp, orig_op1, orig_op2);
+  bool mrval = !glvalue_p (min);
   /* Remember that the result is an lvalue or xvalue.  */
-  if (glvalue_p (expr) && !glvalue_p (min))
+  if (!rval && mrval)
TREE_TYPE (min) = cp_build_reference_type (TREE_TYPE (min),
   !lvalue_p (expr));
+  else if (rval && !mrval)
+   {
+ /* If it was supposed to be an rvalue but it's not, adjust
+one of the operands so that any overload resolution
+taking this COND_EXPR as an operand makes the correct
+decisions.  See c++/84231.  */
+ TREE_OPERAND (min, 2) = build1_loc (loc, NON_LVALUE_EXPR,
+ TREE_TYPE (min),
+ TREE_OPERAND (min, 2));
+ EXPR_LOCATION_WRAPPER_P (TREE_OPERAND (min, 2)) = 1;
+ gcc_checking_assert (!glvalue_p (min));
+   }
   expr = convert_from_reference (min);
 }
   return expr;
diff --git a/gcc/testsuite/g++.dg/pr84231.C b/gcc/testsuite/g++.dg/pr84231.C
new file mode 100644
index ..de7c89a2ab69
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr84231.C
@@ -0,0 +1,29 @@
+// PR c++/84231 - overload resolution with cond_expr in a template
+
+// { dg-do compile }
+
+struct format {
+  template format& operator%(const T&) { return *this; }
+  template format& operator%(T&) { return *this; }
+};
+
+format f;
+
+template 
+void function_template(bool b)
+{
+  // Compiles OK with array lvalue:
+  f % (b ? "x" : "x");
+
+  // Used to fails with pointer rvalue:
+  f % (b ? "" : "x");
+}
+
+void normal_function(bool b)
+{
+  // Both cases compile OK in non-template function:
+  f % (b ? "x" : "x");
+  f % (b ? "" : "x");
+
+  function_template(b);
+}

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: Please accept this commit for the trunk

2018-02-08 Thread Mike Stump
On Feb 8, 2018, at 12:36 PM, Segher Boessenkool  
wrote:
> 
> On Wed, Feb 07, 2018 at 03:52:27PM -0800, Mike Stump wrote:
>> I dusted the pointed to patch off and check it in.  Let us know how it goes.
> 
> I wanted to test this on the primary and secondary powerpc targets as
> well, but okay.

I reviewed it, and it seemed to only trigger for darwin.  Certainly doesn't 
hurt to run a regression run and ensure that is the case.

>> Does this resolve all of PR84113?  If so, I can push the bug along.
> 
> It makes bootstrap work.  We don't know if it is correct otherwise.

So, would be nice if someone could run a regression test.  I'd do it by the 
version just before the breakage, and then drop in the patch, and test again.  
This minimizes all the other changes.

>> What PR was the attachment url from?
> 
> It is not from a PR, and it has never been sent to gcc-patches; it is
> from https://gcc.gnu.org/ml/gcc-testresults/2017-01/msg02971.html
> (attachment #2).

Ah, that explains it.

Sounds like 1, 3 and 4 also likely need to go it to make things nice.  If 
someone could regression test and let us know, that's likely the gating factor.



Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-08 Thread Michael Meissner
On Thu, Feb 08, 2018 at 06:10:31PM -0500, Hans-Peter Nilsson wrote:
> On Wed, 7 Feb 2018, Segher Boessenkool wrote:
> > Hi Mike,
> >
> > On Tue, Feb 06, 2018 at 04:34:08PM -0500, Michael Meissner wrote:
> > > Here is the patch reworked.  It bootstraps on both little/big endian 
> > > power8,
> > > and all of the tests run.  Can I install this into trunk now, and into 
> > > GCC 7
> > > after a soak period (along with the previous patch)?
> >
> > > +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR
> >
> > "If we have"?
> >
> > > +  [(set (match_operand:QHSI 0 "memory_operand" "=Z")
> > > + (any_fix:QHSI (match_operand:SFDF 1 "gpc_reg_operand" "wa")))
> > > +   (clobber (match_scratch:SI 2 "=wa"))]
> > > +  "((mode == SImode && TARGET_P8_VECTOR)
> > > +|| (mode != SImode && TARGET_P9_VECTOR))"
> >
> > This is the same as
> >
> >   "(mode == SImode && TARGET_P8_VECTOR) || TARGET_P9_VECTOR"
> 
> Umm, sorry for chiming in here with zero rs6000 knowledge and I
> might be missing something trivial but...what wouldn't that misfire for
>  "mode == SImode && ! TARGET_P8_VECTOR && TARGET_P9_VECTOR" ?
> 
> (Is that invalid or not applicable or don't care or something?)

TARGET_P9_VECTOR requires TARGET_P8_VECTOR.

Basically when we are converting SF/DFmode to SImode, we want to allow it on
ISA 2.07 (-mcpu=power8).  If we are converting to SF/DFmode to HI/QImode, we
require ISA 3.0 (-mcpu=power9).

The reason is that we don't have the instructions to do 32-bit integer store
and 32-bit integer sign/zero extended load instructions to all of the vector
and floating point registers until ISA 2.07.  Because of that, we don't allow
SImode in the vector and floating point registers until ISA 2.07.  In
processors before power8, we had to do a store 64-bit integer on the stack and
then load up the 32-bit value into the GPR registers.

However, ISA 2.07 does not have instructions to store or load 8/16-bit values
that can be conveniently used.  ISA 3.0 added 8/16-bit store, 8/16-bit zero
extended load, and 8/16-bit sign extend instructions.  So in ISA 3.0, we allow
QI/HImode to go in vector registers.

In ISA 2.06 (-mcpu=power7) we had to use UNSPECs to hide the convert floating
point scalar to 32-bit signed/unsigned instruction instructions because we
didn't allow the base type into the register.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH, rs6000] PR84220 fix altivec_vec_sld and vec_sldw intrinsic definitions

2018-02-08 Thread Segher Boessenkool
Hi!

On Wed, Feb 07, 2018 at 09:14:59AM -0600, Will Schmidt wrote:
>   Our VEC_SLD definitions were mistakenly allowing the third argument to be
> of an invalid type, triggering an ICE (on invalid code) later in the build
> process.  This fixes those definitions.  The nearby VEC_SLDW definitions have
> the same issue, those have been fixed as part of this patch too.
> Testcases have been added to ensure we generate the 'invalid intrinsic'
> message as is appropriate, instead of ICEing.
> Giving proper credit, this was found by Peter Bergner while working a
> different issue. :-)
> 
> Sniff-tests passed on P8.  Doing larger reg-test across power systems now.
> OK for trunk?
> And,.. do we want this one backported too?

> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index a68be51..26f9990 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -3654,39 +3654,39 @@ const struct altivec_builtin_types 
> altivec_overloaded_builtins[] = {
>{ ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_16QI,
>  RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, 
> RS6000_BTI_bool_V16QI },
>{ ALTIVEC_BUILTIN_VEC_SEL, ALTIVEC_BUILTIN_VSEL_16QI,
>  RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, 
> RS6000_BTI_unsigned_V16QI },
>{ ALTIVEC_BUILTIN_VEC_SLD, ALTIVEC_BUILTIN_VSLDOI_4SF,
> -RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_NOT_OPAQUE 
> },
> +RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_INTSI },

It isn't clear to me what RS6000_BTI_NOT_OPAQUE means...  rs6000-c.c says:

/* For arguments after the last, we have RS6000_BTI_NOT_OPAQUE in
   the opX fields.  */

(whatever that means!), and the following code seems to allow anything in
such args?  If you understand it, please update some comments somewhere?

>{ VSX_BUILTIN_VEC_XXPERMDI, VSX_BUILTIN_XXPERMDI_2DF,

XXPERMDI is the only other builtin that uses NOT_OPAQUE, does that suffer
from the same problem?  If so, you can completely delete NOT_OPAQUE it
seems?

So what is/was it for, that is what I wonder.

Your patch looks fine if you can clear that up :-)


Segher


Re: [PATCH] Fix var-tracking ICE introduced in poly_int conversion (PR debug/84252)

2018-02-08 Thread Jeff Law
On 02/07/2018 03:43 PM, Jakub Jelinek wrote:
> Hi!
> 
> As mentioned in the PR, the vt_get_decl_and_offset function verifies
> incoming PARALLEL is usable for tracking, but if it fails, we retry
> vt_get_decl_and_offset on DECL_RTL and there we check only that a memory
> isn't larger than 16 bytes (to make sure it doesn't have more than 16
> parts), but if DECL_RTL is a REG e.g. with OImode as in the testcase on
> aarch64, we just go on.  For the OImode argument passed in V4SImode v0
> and V4SImode v1 parts we are lucky and used to get away with it, as it only
> had 2 parts (and only dwarf2out threw it away as we can't really do vector
> debug info well right now), but generally if vt_get_decl_and_offset doesn't
> check the PARALLEL incoming, it is better to punt right away, we don't know
> if all PARALLEL operands are REGs, have REG_ATTRS, the same underlying decl
> and usable offsets, which we all rely on and only vt_get_decl_and_offset
> verifies.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux and Richard S. tested
> it on aarch64-linux, ok for trunk?
> 
> 2018-02-07  Jakub Jelinek  
> 
>   PR debug/84252
>   * var-tracking.c (vt_add_function_parameter): Punt for non-onepart
>   PARALLEL incoming that failed vt_get_decl_and_offset check.
> 
>   * gcc.target/aarch64/pr84252.c: New test.
OK.  And that's about all my brain can handle today.  I'm fried.
jeff


Re: [PATCH] Allow (again) const variables with zero (or no) initializers in .bss* named sections (PR middle-end/84237)

2018-02-08 Thread Jeff Law
On 02/06/2018 01:44 PM, Jakub Jelinek wrote:
> Hi!
> 
> The last year's bss_initializer_p change apparently broke xen.  While it is
> reasonable not to put const variables into .bss* sections by default,
> refusing to put them when the user asks for it through using section
> attribute seems unnecessary.  If users screws up, he can get section flag
> conflicts diagnosed, if not, such as in this case when there is a separate
> .bss.page_aligned.const section, it should work as before.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2018-02-06  Jakub Jelinek  
> 
>   PR middle-end/84237
>   * output.h (bss_initializer_p): Add NAMED argument, defaulted to false.
>   * varasm.c (bss_initializer_p): Add NAMED argument, if true, ignore
>   TREE_READONLY bit.
>   (get_variable_section): For decls in named .bss* sections pass true as
>   second argument to bss_initializer_p.
> 
>   * gcc.dg/pr84237.c: New test.
OK.
jeff


Re: [PATCH] PowerPC PR target/84154, fix floating point to small integer conversion regression

2018-02-08 Thread Hans-Peter Nilsson
On Wed, 7 Feb 2018, Segher Boessenkool wrote:
> Hi Mike,
>
> On Tue, Feb 06, 2018 at 04:34:08PM -0500, Michael Meissner wrote:
> > Here is the patch reworked.  It bootstraps on both little/big endian power8,
> > and all of the tests run.  Can I install this into trunk now, and into GCC 7
> > after a soak period (along with the previous patch)?
>
> > +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR
>
> "If we have"?
>
> > +  [(set (match_operand:QHSI 0 "memory_operand" "=Z")
> > +   (any_fix:QHSI (match_operand:SFDF 1 "gpc_reg_operand" "wa")))
> > +   (clobber (match_scratch:SI 2 "=wa"))]
> > +  "((mode == SImode && TARGET_P8_VECTOR)
> > +|| (mode != SImode && TARGET_P9_VECTOR))"
>
> This is the same as
>
>   "(mode == SImode && TARGET_P8_VECTOR) || TARGET_P9_VECTOR"

Umm, sorry for chiming in here with zero rs6000 knowledge and I
might be missing something trivial but...what wouldn't that misfire for
 "mode == SImode && ! TARGET_P8_VECTOR && TARGET_P9_VECTOR" ?

(Is that invalid or not applicable or don't care or something?)

brgds, H-P


Re: [PR bootstrap/56750] implement --disable-stage1-static-libs

2018-02-08 Thread Jeff Law
On 02/08/2018 04:42 AM, Aldy Hernandez wrote:
> In this PR, the reporter is complaining that forcing -static-libstdc++
> and -static-libgcc during stage1 will also force it down to all
> subdirectories (gdb for instance).
> 
> There is some back and forth in the PR whether this is good or not.  I'm
> indifferent, but an alternative is to provide a flag
> --disable-stage1-static-libs to disable this behavior.
> 
> Tested on an x86-64 Linux system with static libraries and verifying
> that with --disable-stage1-static-libs we get an xgcc linked against
> shared libraries of libstdc++ and libgcc:
> 
> $ ldd xgcc
> linux-vdso.so.1 (0x7ffe92084000)
> libstdc++.so.6 => /lib64/libstdc++.so.6 (0x7fec11a06000)
> libm.so.6 => /lib64/libm.so.6 (0x7fec116fd000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fec114e6000)
> libc.so.6 => /lib64/libc.so.6 (0x7fec1112)
> /lib64/ld-linux-x86-64.so.2 (0x557117206000)
> 
> I also verified that without the flag or with
> --enable-stage1-static-libs we get no such shared libraries.
> 
> Again, I'm agnostic here.  We can just as easily close the PR and tell
> users to specify --with-stage1-libs to override the static linking, as
> I've mentioned in the PR.
> 
> OK for trunk?
> 
> curr.patch
> 
> 
> 
>   PR bootstrap/56750
>   * configure.ac (stage1-static-libs): New option.
>   * configure: Regenerate.
> 
> gcc/
> 
>   PR bootstrap/56750
>   * doc/install.texi (--enable-stage1-static-libs): New.
You are a brave soul.

Every time I look at 56750 I shake my head and say it's not worth the
pain to untangle (sorry Mike).

There's no single solution here that will satisfy everyone that I'm
aware of.   Given that there is a workaround that I think will work for
the problems Mike is trying to address, I think this is a WONTFIX.
Another configury option just adds more complexity here that we're not
likely to test and is likely to bitrot over time.

Go ahead with WONTFIX and if Mike complains, point him at me :-)

jeff



Re: [PATCH][i386][3/3] PR target/84164: Make *cmpqi_ext_ patterns accept more zero_extract modes

2018-02-08 Thread Uros Bizjak
On Thu, Feb 8, 2018 at 6:11 PM, Kyrill  Tkachov
 wrote:
> Hi all,
>
> This patch fixes some fallout in the i386 testsuite that occurs after the
> simplification in patch [1/3] [1].
> The gcc.target/i386/extract-2.c FAILs because it expects to match:
> (set (reg:CC 17 flags)
> (compare:CC (subreg:QI (zero_extract:SI (reg:HI 98)
> (const_int 8 [0x8])
> (const_int 8 [0x8])) 0)
> (const_int 4 [0x4])))
>
> which is the *cmpqi_ext_2 pattern in i386.md but with the new simplification
> the combine/simplify-rtx
> machinery produces:
> (set (reg:CC 17 flags)
> (compare:CC (subreg:QI (zero_extract:HI (reg:HI 98)
> (const_int 8 [0x8])
> (const_int 8 [0x8])) 0)
> (const_int 4 [0x4])))
>
> Notice that the zero_extract now has HImode like the register source rather
> than SImode.
> The existing *cmpqi_ext_ patterns however explicitly demand an SImode on
> the zero_extract.
> I'm not overly familiar with the i386 port but I think that's too
> restrictive.
> The RTL documentation says:
> For (zero_extract:m loc size pos) "The mode m is the same as the mode that
> would be used for loc if it were a register."
> I'm not sure if that means that the mode of the zero_extract and the source
> register must always match (as is the
> case after patch [1/3]) but in any case it shouldn't matter semantically
> since we're taking a QImode subreg of the whole
> thing anyway.
>
> So the proposed solution in this patch is to allow HI, SI and DImode
> zero_extracts in these patterns as these are the
> modes that the ext_register_operand predicate accepts, so that the patterns
> can match the new form above.
>
> With this patch the aforementioned test passes again and bootstrap and
> testing on x86_64-unknown-linux-gnu shows
> no regressions.
>
> Is this ok for trunk if the first patch is accepted?

Huh, there are many other zero-extract patterns besides cmpqi_ext_*
with QImode subreg of SImode zero_extract in i386.md, used to access
high QImode register of HImode pair. A quick grep shows these that
have _ext_ in their name:

(define_insn "*cmpqi_ext_1"
(define_insn "*cmpqi_ext_2"
(define_expand "cmpqi_ext_3"
(define_insn "*cmpqi_ext_3"
(define_insn "*cmpqi_ext_4"
(define_insn "addqi_ext_1"
(define_insn "*addqi_ext_2"
(define_expand "testqi_ext_1_ccno"
(define_insn "*testqi_ext_1"
(define_insn "*testqi_ext_2"
(define_insn_and_split "*testqi_ext_3"
(define_insn "andqi_ext_1"
(define_insn "*andqi_ext_1_cc"
(define_insn "*andqi_ext_2"
(define_insn "*qi_ext_1"
(define_insn "*qi_ext_2"
(define_expand "xorqi_ext_1_cc"
(define_insn "*xorqi_ext_1_cc"

There are also relevant splitters and peephole2 patterns.

IIRC, SImode zero_extract was enough to catch all high-register uses.
There will be a pattern explosion if we want to handle all other
integer modes here. However, I'm not a RTL expert, so someone will
have to say what is the correct RTX form here.

Uros.


Re: [PATCH, i386] PR target/83008: Fix for SKX cost model

2018-02-08 Thread Uros Bizjak
On Thu, Feb 8, 2018 at 12:48 PM, Shalnov, Sergey
 wrote:
> Hi,
> This patch contain cost model change for SKX and closes PR target/83008 
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008)
>
> It provides following performance scores in geomean:
> SPEC CPU2017 intrate +0.6%
> SPEC CPU2017 fprate +1.5%
> SPEC 2006 [int|fp] no changes out of noise
>
> I found a regression and solve it with 
> https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00320.html
>
> Could you please merge the patch (and patch in the link 
> https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00320.html)
> to the main trunk?
>
> Thank you
> Sergey
>
> 2018-02-06  Sergey Shalnov  
>
> gcc/
> PR target/83008
> * config/i386/x86-tune-costs.h (struct processor_costs): Fixed
> 256 and 512 aligned store costs and integer stores according PR83008
>
> gcc/testsuite/
> * gcc.target/i386/pr83008.c: New test.
>

Both patches approved and committed to mainline SVN.

Thanks,
Uros.


[PATCH/RFC] tree-if-conv.c: fix two ICEs seen with -fno-tree-forwprop (PR tree-optimization/84178)

2018-02-08 Thread David Malcolm
PR tree-optimization/84178 reports a couple of source files that ICE inside
ifcvt when compiled with -03 -fno-tree-forwprop (trunk and gcc 7).

Both cases involve problems with ifcvt's per-BB gimplified predicates.

Testcase 1 fails this assertion within release_bb_predicate during cleanup:

283   if (flag_checking)
284 for (gimple_stmt_iterator i = gsi_start (stmts);
285  !gsi_end_p (i); gsi_next ())
286   gcc_assert (! gimple_use_ops (gsi_stmt (i)));

The testcase contains a division in the loop, which leads to
if_convertible_loop_p returning false (due to gimple_could_trap_p being true
for the division).  This happens *after* the per-BB gimplified predicates
have been created in predicate_bbs (loop).
Hence tree_if_conversion bails out to "cleanup", but the gimplified predicates
exist and make use of SSA names; for example this conjunction for two BB
conditions:

  _4 = h4.1_112 != 0;
  _175 = (signed char) _117;
  _176 = _175 >= 0;
  _174 = _4 & _176;

is using SSA names.

This assertion was added in r236498 (aka 
c3deca2519d97c55876869c57cf11ae1e5c6cf8b):

2016-05-20  Richard Biener  

* tree-if-conv.c (add_bb_predicate_gimplified_stmts): Use
gimple_seq_add_seq_without_update.
(release_bb_predicate): Assert we have no operands to free.
(if_convertible_loop_p_1): Calculate post dominators later.
Do not free BB predicates here.
(combine_blocks): Do not recompute BB predicates.
(version_loop_for_if_conversion): Save BB predicates around
loop versioning.

* gcc.dg/tree-ssa/ifc-cd.c: Adjust.

The following patch fixes this by removing the assertion, and reinstating the
cleanup of the operands.

Testcase 2 segfaults inside update_ssa when called from
version_loop_for_if_conversion when a loop is versioned.

During loop_version, some blocks are duplicated, and this can duplicate
SSA names, leading to the duplicated SSA names being added to
tree-into-ssa.c's old_ssa_names.

For example, _117 is an SSA name defined in one of these to-be-duplicated
blocks, and hence is added to old_ssa_names when duplicated.

One of the BBs has this gimplified predicate (again, these stmts are *not*
yet in a BB):
  _173 = h4.1_112 != 0;
  _171 = (signed char) _117;
  _172 = _171 >= 0;
  _170 = ~_172;
  _169 = _170 & _173;

Note the reference to SSA name _117 in the above.

When update_ssa runs later on in version_loop_for_if_conversion,
SSA name _117 is in the old_ssa_names bitmap, and thus has
prepare_use_sites_for called on it:

2771  /* If an old name is in NAMES_TO_RELEASE, we cannot remove it from
2772 OLD_SSA_NAMES, but we have to ignore its definition site.  */
2773  EXECUTE_IF_SET_IN_BITMAP (old_ssa_names, 0, i, sbi)
2774{
2775  if (names_to_release == NULL || !bitmap_bit_p (names_to_release, 
i))
2776prepare_def_site_for (ssa_name (i), insert_phi_p);
2777  prepare_use_sites_for (ssa_name (i), insert_phi_p);
2778}

prepare_use_sites_for iterates over the immediate users, which includes
the:
  _171 = (signed char) _117;
in the gimplified predicate.

This tried to call "mark_block_for_update" on the BB of this def_stmt,
leading to a read-through-NULL segfault, since this statement isn't in a
BB yet.

With the caveat that this is at the limit of my understanding of the
innards of gimple, I'm wondering how this ever works: we have gimplified
predicates that aren't in a BB yet, which typically refer to
SSA names in the CFG proper, and we're attempting non-trivial manipulations
of that CFG that can e.g. duplicate those SSA names.

The following patch fixes the 2nd ICE by inserting the gimplified predicates
earlier: immediately before the possible version_loop_for_if_conversion,
rather than within combine_blocks.  That way they're in their BBs before
we attempt any further manipulation of the CFG.

This fixes the ICE, though it introduces a regression of
  gcc.target/i386/avx2-vec-mask-bit-not.c
which no longer vectorizes for some reason (I haven't investigated
why yet).

Successfully bootstrapped on x86_64-pc-linux-gnu.

Thoughts?  Does this analysis sound sane?

Dave

gcc/ChangeLog:
PR tree-optimization/84178
* tree-if-conv.c (release_bb_predicate): Reinstate the
free_stmt_operands loop removed in r236498, removing
the assertion that the stmts have NULL use_ops.
(combine_blocks): Move the call to insert_gimplified_predicates...
(tree_if_conversion): ...to here, immediately before attempting
to version the loop.

gcc/testsuite/ChangeLog:
PR tree-optimization/84178
* gcc.c-torture/compile/pr84178-1.c: New test.
* gcc.c-torture/compile/pr84178-2.c: New test.
---
 gcc/testsuite/gcc.c-torture/compile/pr84178-1.c | 18 ++
 gcc/testsuite/gcc.c-torture/compile/pr84178-2.c | 18 ++
 gcc/tree-if-conv.c   

Re: [RFC PATCH] avoid applying attributes to explicit specializations (PR 83871)

2018-02-08 Thread Martin Sebor

__attribute__ ((nothrow))?  The patch includes a test case with
wrong-code due to inheriting the attribute.  With exception
specifications having to match between the primary and its
specializations it's the only way to make them different.
I've left this unchanged but let me know if I'm missing
something.


Yeah, I think you're right.  But I notice that the existing code (and
thus your patch) touches TREE_NOTHROW in two places, and the first one
seems wrong; we only want it inside the FUNCTION_DECL section.


I think I see what you mean.  I don't follow all the details
of all the code here but hopefully I got it right.


Rather than pass them down into register_specialization and
duplicate_decls, check_explicit_specialization could compare the
attribute list to the attributes on the template itself.


I took me a while to find DECL_TEMPLATE_RESULT.  Hopefully
that's the right way to get the primary from a TEMPLATE_DECL.


+  if (!merge_attr)
+{
+  /* Remove the two function attributes that are, in fact,
+ treated as (quasi) type attributes.  */
+  tree attrs = TYPE_ATTRIBUTES (newtype);
+  tree newattrs = remove_attribute ("nonnull", attrs);
+  newattrs = remove_attribute ("returns_nonnull", attrs);
+  if (newattrs != attrs)
+TYPE_ATTRIBUTES (newtype) = newattrs;
+}


Instead of this, we should avoid calling merge_types and just use
TREE_TYPE (newdecl) for newtype.


Ah, great, thanks.  That works and fixes the outstanding FAILs
in the tests.



 /* Merge the data types specified in the two decls.  */
-newtype = merge_types (TREE_TYPE (newdecl), TREE_TYPE (olddecl));
+newtype = TREE_TYPE (newdecl);


I meant to avoid merging only when !merge_attr; we should still merge if
merge_attr is true.


Doh!  Of course.  Silly mistake.  Sorry.


Attached is an updated patch.  It hasn't gone through full
testing yet but please let me know if you'd like me to make
some changes.



+  const char* const whitelist[] = {
+"error", "noreturn", "warning"
+  };


Why whitelist noreturn?  I would expect to want that to be consistent.


I expect noreturn to be used on a primary whose definition
is provided but that's not meant to be used the way the API
is otherwise expected to be.  As in:

  template 
  T [[noreturn]] foo () { throw "not implemented"; }

  template <> int foo();   // implemented elsewhere

Beyond that, noreturn can only be paired with a small number
of the attributes on the black list (just format and nonnull),
otherwise it or the other one is ignored (and -Wattributes
is issued).  I suppose noreturn would make sense together
with format on a template that formatted a message before
throwing or printing it and exiting.  But format is almost
never used in templates so it seems like a stretch.  If
you think it's important or if you have a use case in mind
that I'm not thinking of let me know.

Attached is the updated patch, this time bootstrapped and
regtested on x86_64-linux.

Martin
PR c++/83871 - wrong code for attribute const and pure on distinct template specializations
PR c++/83503 - [8 Regression] bogus -Wattributes for const and pure on function template specialization

gcc/ChangeLog:

	PR c++/83871
	* gcc/doc/invoke.texi (-Wmissing-attribute): New option.

gcc/c-family/ChangeLog:

	PR c++/83871
	* c.opt (-Wmissing-attribute): New option.

gcc/cp/ChangeLog:

	PR c++/83871
	PR c++/83503
	* cp-tree.h (warn_spec_missing_attributes): New function.
	((check_explicit_specialization): Add an argument.  Call the above
	function.
	* decl.c (duplicate_decls): Avoid applying primary function template's
	attributes to its explicit specializations.

gcc/testsuite/ChangeLog:

	PR c++/83871
	PR c++/83503
	* g++.dg/ext/attr-const-pure.C: New test.
	* g++.dg/ext/attr-malloc.C: New test.
	* g++.dg/ext/attr-nonnull.C: New test.
	* g++.dg/ext/attr-nothrow.C: New test.
	* g++.dg/ext/attr-nothrow-2.C: New test.
	* g++.dg/ext/attr-returns-nonnull.C: New test.
	* g++.dg/Wmissing-attribute.C: New test.

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 9c71726..a4d5e61 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -781,6 +781,11 @@ Wtemplates
 C++ ObjC++ Var(warn_templates) Warning
 Warn on primary template declaration.
 
+Wmissing-attribute
+C ObjC C++ ObjC++ Var(warn_missing_attribute) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall)
+Warn about declarations of entities that may be missing attributes
+that related entities have been declared with it.
+
 Wmissing-format-attribute
 C ObjC C++ ObjC++ Warning Alias(Wsuggest-attribute=format)
 ;
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index a53f4fd..87b7916 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6470,7 +6470,8 @@ extern void end_specialization			(void);
 extern void begin_explicit_instantiation	(void);
 extern void end_explicit_instantiation		(void);
 extern void check_unqualified_spec_or_inst	(tree, location_t);
-extern tree check_explicit_specialization	(tree, tree, 

Re: [PR tree-optimization/84047] missing -Warray-bounds on an out-of-bounds index

2018-02-08 Thread Martin Sebor

On 02/08/2018 03:38 AM, Richard Biener wrote:

On Thu, Feb 1, 2018 at 6:42 PM, Aldy Hernandez  wrote:

Since my patch isn't the easy one liner I wanted it to be, perhaps we
should concentrate on Martin's patch, which is more robust, and has
testcases to boot!  His patch from last week also fixes a couple other
PRs.

Richard, would this be acceptable?  That is, could you or Jakub review
Martin's all-encompassing patch?  If so, I'll drop mine.


Sorry, no - this one looks way too complicated.


Presumably the complication is in he loop that follows SSA_NAMEs
and the offsets in:

  const char *s0 = "12";
  const char *s1 = s0 + 1;
  const char *s2 = s1 + 1;
  const char *s3 = s2 + 1;

  int i = *s0 + *s1 + *s2 + *s3;

?

I don't know if this used to be diagnosed and is also part of
the regression.  If it isn't it could be removed for GCC 8 and
then added for GCC 9.  If this isn't your concern can you be
more specific about what is?

I should note (again) that this patch doesn't fix the whole
regression.  There are remaining cases (involving arrays) that
used to be handled but no longer are.  It's tedious (and hacky)
to limit the fix to just the subset of the regression while at
the same time preserving the pre-existing limitations (or bugs,
depending on one's point of view).


Also, could someone pontificate on whether we want to fix
-Warray-bounds regressions for this release cycle?


Remove bogus ones?  Yes.  Add "missing ones"?  No.


Can you please explain how to interpret the Target Milestone
then?  Why is it set it to 6.5 when the bug is not meant to
be fixed?

If it's meant to be fixed in 6.5 (and presumably also 7.4) but
not in 8.1, do we expect to fix it in 8.2?  More to the point,
how can we tell which it is?

Thanks
Martin



Richard.


Thanks.

On Wed, Jan 31, 2018 at 6:05 AM, Richard Biener
 wrote:

On Tue, Jan 30, 2018 at 11:11 PM, Aldy Hernandez  wrote:

Hi!

[Note: Jakub has mentioned that missing -Warray-bounds regressions should be
punted to GCC 9.  I think this particular one is easy pickings, but if this
and/or the rest of the -Warray-bounds regressions should be marked as GCC 9
material, please let me know so we can adjust all relevant PRs.]

This is a -Warray-bounds regression that happens because the IL now has an
MEM_REF instead on ARRAY_REF.

Previously we had an ARRAY_REF we could diagnose:

  D.2720_5 = "12345678"[1073741824];

But now this is represented as:

  _1 = MEM[(const char *)"12345678" + 1073741824B];

I think we can just allow check_array_bounds() to handle MEM_REF's and
everything should just work.

The attached patch fixes both regressions mentioned in the PR.

Tested on x86-64 Linux.

OK?


This doesn't look correct.  You lump MEM_REF handling together with
ADDR_EXPR handling but for the above case you want to diagnose
_dereferences_ not address-taking.

For the dereference case you need to amend the ARRAY_REF case, for example
via

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 257181)
+++ gcc/tree-vrp.c  (working copy)
@@ -5012,6 +5012,13 @@ check_array_bounds (tree *tp, int *walk_
   if (TREE_CODE (t) == ARRAY_REF)
 vrp_prop->check_array_ref (location, t, false /*ignore_off_by_one*/);

+  else if (TREE_CODE (t) == MEM_REF
+  && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR
+  && TREE_CODE (TREE_OPERAND (TREE_OPERAND (t, 0), 0)) == STRING_CST)
+{
+  call factored part of check_array_ref passing in STRING_CST and offset
+}
+
   else if (TREE_CODE (t) == ADDR_EXPR)
 {
   vrp_prop->search_for_addr_array (t, location);

note your patch will fail to warn for "1"[1] because taking that
address is valid but not
dereferencing it.

Richard.




[PATCH, rs6000, committed] Fix PR81143

2018-02-08 Thread Peter Bergner
I have committed the following obvious testsuite patch to fix PR81143.
The "bug" is that __ORDER_LITTLE_ENDIAN__ is always defined for both
little and big endian compiles.  I checked and this is the only use
of this in the gcc.target/powerpc/ directory.

Peter

PR target/81143
* gcc.target/powerpc/pr79799-2.c: Use __LITTLE_ENDIAN__.

Index: gcc/testsuite/gcc.target/powerpc/pr79799-2.c
===
--- gcc/testsuite/gcc.target/powerpc/pr79799-2.c(revision 257503)
+++ gcc/testsuite/gcc.target/powerpc/pr79799-2.c(revision 257504)
@@ -8,7 +8,7 @@
 /* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode if N is the 
default
scalar position.  */
 
-#if __ORDER_LITTLE_ENDIAN__
+#if __LITTLE_ENDIAN__
 #define ELE 2
 #else
 #define ELE 1



Re: [C++ Patch] PR 83806 ("[6/7/8 Regression] Spurious -Wunused-but-set-parameter with nullptr")

2018-02-08 Thread Jason Merrill
On Thu, Feb 8, 2018 at 1:21 PM, Paolo Carlini  wrote:
> Hi,
>
> On 08/02/2018 18:38, Jason Merrill wrote:
>>
>> On Thu, Feb 8, 2018 at 6:22 AM, Paolo Carlini 
>> wrote:
>>>
>>> Hi,
>>>
>>> this one should be rather straightforward. As noticed by Jakub, we
>>> started
>>> emitting the spurious warning with the fix for c++/69257, which, among
>>> other
>>> things, fixed decay_conversion wrt mark_rvalue_use and mark_lvalue_use
>>> calls. In particular it removed the mark_rvalue_use call at the very
>>> beginning of the function, thus now a PARM_DECL with NULLPTR_TYPE as
>>> type,
>>> being handled specially at the beginning of the function, doesn't get the
>>> mark_rvalue_use treatment - which, for example, POINTER_TYPE now gets
>>> later.
>>> I'm finishing testing on x86_64-linux the below. Ok if it passes?
>>
>> A future -Wunused-but-set-variable might warn about the dead store to
>> exp; let's just discard the result of mark_rvalue_use.  OK with that
>> change.
>
> Agreed, thanks. By the way, maybe it's the right occasion to voice that I
> find myself often confused about this topic, that is which specific
> functions are modifying their arguments and there isn't a specific reason to
> assign the return value. I ask myself: why then we return something instead
> of void? Maybe just convenience while writing some expressions? Would make
> sense. Then, would it make sense to find a way to "mark" the functions which
> are modifying their arguments? Sometimes isn't immediately obvious because
> we are of course passing around pointers and using typedefs to obfuscate the
> whole thing. Maybe we should just put real C++ to good use ;)

Well, usually we want to assign the result of mark_rvalue_use, it's
just that in this case we're returning nullptr_node instead of exp.

Jason


Re: Please accept this commit for the trunk

2018-02-08 Thread Segher Boessenkool
On Wed, Feb 07, 2018 at 03:52:27PM -0800, Mike Stump wrote:
> I dusted the pointed to patch off and check it in.  Let us know how it goes.

I wanted to test this on the primary and secondary powerpc targets as
well, but okay.

> Does this resolve all of PR84113?  If so, I can push the bug along.

It makes bootstrap work.  We don't know if it is correct otherwise.

> What PR was the attachment url from?

It is not from a PR, and it has never been sent to gcc-patches; it is
from https://gcc.gnu.org/ml/gcc-testresults/2017-01/msg02971.html
(attachment #2).

It is also PR80865 btw (I'll take care of it).

Thanks,


Segher


> 2018-02-07  Iain Sandoe  
> 
>   * config/rs6000/altivec.md (*restore_world): Remove LR use.
>   * config/rs6000/predicates.md (restore_world_operation): Adjust op
>   count, remove one USE.
> 
> Index: gcc/config/rs6000/altivec.md
> ===
> --- gcc/config/rs6000/altivec.md  (revision 257471)
> +++ gcc/config/rs6000/altivec.md  (working copy)
> @@ -419,7 +419,6 @@
>  (define_insn "*restore_world"
>   [(match_parallel 0 "restore_world_operation"
>[(return)
> -(use (reg:SI LR_REGNO))
> (use (match_operand:SI 1 "call_operand" "s"))
> (clobber (match_operand:SI 2 "gpc_reg_operand" "=r"))])]
>   "TARGET_MACHO && (DEFAULT_ABI == ABI_DARWIN) && TARGET_32BIT"
> Index: gcc/config/rs6000/predicates.md
> ===
> --- gcc/config/rs6000/predicates.md   (revision 257471)
> +++ gcc/config/rs6000/predicates.md   (working copy)
> @@ -1295,13 +1295,12 @@
>rtx elt;
>int count = XVECLEN (op, 0);
>  
> -  if (count != 59)
> +  if (count != 58)
>  return 0;
>  
>index = 0;
>if (GET_CODE (XVECEXP (op, 0, index++)) != RETURN
>|| GET_CODE (XVECEXP (op, 0, index++)) != USE
> -  || GET_CODE (XVECEXP (op, 0, index++)) != USE
>|| GET_CODE (XVECEXP (op, 0, index++)) != CLOBBER)
>  return 0;
>  



Re: [PATCH][AArch64][1/3] PR target/84164: Simplify subreg + redundant AND-immediate

2018-02-08 Thread Richard Sandiford
Thanks for doing this.

Kyrill  Tkachov  writes:
> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
> index 
> 2e7aa5c12952ab1a9b49b5adaf23710327e577d3..af06d7502cebac03cefc689b2646874b8397e767
>  100644
> --- a/gcc/simplify-rtx.c
> +++ b/gcc/simplify-rtx.c
> @@ -6474,6 +6474,18 @@ simplify_subreg (machine_mode outermode, rtx op,
>return NULL_RTX;
>  }
>  
> +  /* Simplify (subreg:QI (and:SI (reg:SI) (const_int 0x)) 0)
> + into (subreg:QI (reg:SI) 0).  */
> +  scalar_int_mode int_outermode, int_innermode;
> +  if (!paradoxical_subreg_p (outermode, innermode)
> +  && is_a  (outermode, _outermode)
> +  && is_a  (innermode, _innermode)
> +  && GET_CODE (op) == AND && CONST_INT_P (XEXP (op, 1))
> +  && known_eq (subreg_lowpart_offset (outermode, innermode), byte)
> +  && (~INTVAL (XEXP (op, 1)) & GET_MODE_MASK (int_outermode)) == 0
> +  && validate_subreg (outermode, innermode, XEXP (op, 0), byte))
> +return gen_rtx_SUBREG (outermode, XEXP (op, 0), byte);
> +
>/* A SUBREG resulting from a zero extension may fold to zero if
>   it extracts higher bits that the ZERO_EXTEND's source bits.  */
>if (GET_CODE (op) == ZERO_EXTEND && SCALAR_INT_MODE_P (innermode))

I think it'd be better to do this in simplify_truncation (shared
by the subreg code and the TRUNCATE code).  The return would then
be simplify_gen_unary (TRUNCATE, ...), which will become a subreg
if TRULY_NOOP_TRUNCATION.

Thanks,
A different Richard


Re: [PATCH] S/390: Disable prediction of indirect branches

2018-02-08 Thread H.J. Lu
On Thu, Feb 8, 2018 at 11:57 AM, H.J. Lu  wrote:
> On Thu, Feb 8, 2018 at 4:17 AM, Andreas Krebbel
>  wrote:
>> On 02/08/2018 12:33 PM, Richard Biener wrote:
>>> On Wed, Feb 7, 2018 at 1:01 PM, Andreas Krebbel
>>>  wrote:
 This patch implements GCC support for mitigating vulnerability
 CVE-2017-5715 known as Spectre #2 on IBM Z.

 In order to disable prediction of indirect branches the implementation
 makes use of an IBM Z specific feature - the execute instruction.
 Performing an indirect branch via execute prevents the branch from
 being subject to dynamic branch prediction.

 The implementation tries to stay close to the x86 solution regarding
 user interface.

 x86 style options supported (without thunk-inline):

 -mindirect-branch=(keep|thunk|thunk-extern)
 -mfunction-return=(keep|thunk|thunk-extern)

 IBM Z specific options:

 -mindirect-branch-jump=(keep|thunk|thunk-extern|thunk-inline)
 -mindirect-branch-call=(keep|thunk|thunk-extern)
 -mfunction-return-reg=(keep|thunk|thunk-extern)
 -mfunction-return-mem=(keep|thunk|thunk-extern)

 These options allow us to enable/disable the branch conversion at a
 finer granularity.

 -mindirect-branch sets the value of -mindirect-branch-jump and
  -mindirect-branch-call.

 -mfunction-return sets the value of -mfunction-return-reg and
  -mfunction-return-mem.

 All these options are supported on GCC command line as well as
 function attributes.

 'thunk' triggers the generation of out of line thunks (expolines) and
 replaces the formerly indirect branch with a direct branch to the
 thunk.  Depending on the -march= setting two different types of thunks
 are generated.  With -march=z10 or higher exrl (execute relative long)
 is being used while targeting older machines makes use of larl/ex
 instead.  From a security perspective the exrl variant is preferable.

 'thunk-extern' does the branch replacement like 'thunk' but does not
 emit the thunks.

 'thunk-inline' is only available for indirect jumps.  It should be used
 in environments where correct CFI is important - known as user space.

 Additionally the patch introduces the -mindirect-branch-table option
 which generates tables pointing to the locations which have been
 modified.  This is supposed to allow reverting the changes without
 re-compilation in situations where it isn't required. The sections are
 split up into one section per option.

 I plan to commit the patch tomorrow.
>>>
>>> Do you also plan to backport this to the GCC 7 branch?
>>
>> Yes, I'm working on it.
>>
>
> This breaks glibc build:
>
> /export/gnu/import/git/toolchain/build/compilers/s390x-linux-gnu/glibc/s390x-linux-gnu/libc_pic.os:
> In function `__cmsg_nxthdr':
> /export/ssd/git/toolchain/build/compilers/s390x-linux-gnu/glibc-src/s390x-linux-gnu/socket/../sysdeps/unix/sysv/linux/cmsg_nxthdr.c:39:
> undefined reference to `__s390_indirect_jump_r1use_r14'
> /export/ssd/git/toolchain/build/compilers/s390x-linux-gnu/glibc-src/s390x-linux-gnu/socket/../sysdeps/unix/sysv/linux/cmsg_nxthdr.c:39:
> undefined reference to `__s390_indirect_jump_r1use_r14'
> collect2: error: ld returned 1 exit status
> make[4]: *** [../Makerules:765:
> /export/gnu/import/git/toolchain/build/compilers/s390x-linux-gnu/glibc/s390x-linux-gnu/libc.so]
> Error 1
> make[4]: Leaving directory
> '/export/ssd/git/toolchain/build/compilers/s390x-linux-gnu/glibc-src/s390x-linux-gnu/elf'
> make[3]: *** [Makefile:215: elf/subdir_lib] Error 2
> make[3]: Leaving directory
> '/export/ssd/git/toolchain/build/compilers/s390x-linux-gnu/glibc-src/s390x-linux-gnu'
> make[2]: *** [Makefile:9: all] Error 2
> make[2]: Leaving directory
> '/export/ssd/git/toolchain/build/compilers/s390x-linux-gnu/glibc/s390x-linux-gnu'

I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84295

-- 
H.J.


Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views

2018-02-08 Thread Jason Merrill

On 02/07/2018 02:36 AM, Alexandre Oliva wrote:

+/* Output symbol LAB1 as an unsigned LEB128 quantity.  */


Let's mention here that the value of LAB1 must be an assemble-time 
constant (such as a view counter), since we can't have LEB128 relocations.


With that, the patch looks OK.

Jason


Re: [PATCH] S/390: Disable prediction of indirect branches

2018-02-08 Thread H.J. Lu
On Thu, Feb 8, 2018 at 4:17 AM, Andreas Krebbel
 wrote:
> On 02/08/2018 12:33 PM, Richard Biener wrote:
>> On Wed, Feb 7, 2018 at 1:01 PM, Andreas Krebbel
>>  wrote:
>>> This patch implements GCC support for mitigating vulnerability
>>> CVE-2017-5715 known as Spectre #2 on IBM Z.
>>>
>>> In order to disable prediction of indirect branches the implementation
>>> makes use of an IBM Z specific feature - the execute instruction.
>>> Performing an indirect branch via execute prevents the branch from
>>> being subject to dynamic branch prediction.
>>>
>>> The implementation tries to stay close to the x86 solution regarding
>>> user interface.
>>>
>>> x86 style options supported (without thunk-inline):
>>>
>>> -mindirect-branch=(keep|thunk|thunk-extern)
>>> -mfunction-return=(keep|thunk|thunk-extern)
>>>
>>> IBM Z specific options:
>>>
>>> -mindirect-branch-jump=(keep|thunk|thunk-extern|thunk-inline)
>>> -mindirect-branch-call=(keep|thunk|thunk-extern)
>>> -mfunction-return-reg=(keep|thunk|thunk-extern)
>>> -mfunction-return-mem=(keep|thunk|thunk-extern)
>>>
>>> These options allow us to enable/disable the branch conversion at a
>>> finer granularity.
>>>
>>> -mindirect-branch sets the value of -mindirect-branch-jump and
>>>  -mindirect-branch-call.
>>>
>>> -mfunction-return sets the value of -mfunction-return-reg and
>>>  -mfunction-return-mem.
>>>
>>> All these options are supported on GCC command line as well as
>>> function attributes.
>>>
>>> 'thunk' triggers the generation of out of line thunks (expolines) and
>>> replaces the formerly indirect branch with a direct branch to the
>>> thunk.  Depending on the -march= setting two different types of thunks
>>> are generated.  With -march=z10 or higher exrl (execute relative long)
>>> is being used while targeting older machines makes use of larl/ex
>>> instead.  From a security perspective the exrl variant is preferable.
>>>
>>> 'thunk-extern' does the branch replacement like 'thunk' but does not
>>> emit the thunks.
>>>
>>> 'thunk-inline' is only available for indirect jumps.  It should be used
>>> in environments where correct CFI is important - known as user space.
>>>
>>> Additionally the patch introduces the -mindirect-branch-table option
>>> which generates tables pointing to the locations which have been
>>> modified.  This is supposed to allow reverting the changes without
>>> re-compilation in situations where it isn't required. The sections are
>>> split up into one section per option.
>>>
>>> I plan to commit the patch tomorrow.
>>
>> Do you also plan to backport this to the GCC 7 branch?
>
> Yes, I'm working on it.
>

This breaks glibc build:

/export/gnu/import/git/toolchain/build/compilers/s390x-linux-gnu/glibc/s390x-linux-gnu/libc_pic.os:
In function `__cmsg_nxthdr':
/export/ssd/git/toolchain/build/compilers/s390x-linux-gnu/glibc-src/s390x-linux-gnu/socket/../sysdeps/unix/sysv/linux/cmsg_nxthdr.c:39:
undefined reference to `__s390_indirect_jump_r1use_r14'
/export/ssd/git/toolchain/build/compilers/s390x-linux-gnu/glibc-src/s390x-linux-gnu/socket/../sysdeps/unix/sysv/linux/cmsg_nxthdr.c:39:
undefined reference to `__s390_indirect_jump_r1use_r14'
collect2: error: ld returned 1 exit status
make[4]: *** [../Makerules:765:
/export/gnu/import/git/toolchain/build/compilers/s390x-linux-gnu/glibc/s390x-linux-gnu/libc.so]
Error 1
make[4]: Leaving directory
'/export/ssd/git/toolchain/build/compilers/s390x-linux-gnu/glibc-src/s390x-linux-gnu/elf'
make[3]: *** [Makefile:215: elf/subdir_lib] Error 2
make[3]: Leaving directory
'/export/ssd/git/toolchain/build/compilers/s390x-linux-gnu/glibc-src/s390x-linux-gnu'
make[2]: *** [Makefile:9: all] Error 2
make[2]: Leaving directory
'/export/ssd/git/toolchain/build/compilers/s390x-linux-gnu/glibc/s390x-linux-gnu'


-- 
H.J.


Re: [PING] [PATCH] [MSP430] PR79242: Implement Complex Partial Integers

2018-02-08 Thread DJ Delorie

The msp43-specific parts look OK to me, but obviously they're kinda
useless without the core changes :-)


Re: [PATCH] -Wformat: fix nonsensical "wide character" message (PR c/84258)

2018-02-08 Thread David Malcolm
On Thu, 2018-02-08 at 17:18 +, Joseph Myers wrote:
> On Wed, 7 Feb 2018, David Malcolm wrote:
> 
> > gcc/c-family/ChangeLog:
> > PR c/84258
> > * c-format.c (struct format_check_results): Add field
> > "number_non_char".
> > (check_format_info): Initialize it, and warn if encountered.
> > (check_format_arg): Distinguish between wide char and
> > everything else when detecting arrays of non-char.
> > 
> > gcc/testsuite/ChangeLog:
> > PR c/84258
> > * c-c++-common/Wformat-pr84258.c: New test.
> 
> OK.

Thanks.

Release managers: this isn't a regression, but seems relatively low-
risk.

Is this OK for trunk now, or should I queue this for next stage 1?

Context:
  https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00380.html

Dave


Merge from trunk to gccgo branch

2018-02-08 Thread Ian Lance Taylor
I merged trunk revision 257495 to the gccgo branch.

Ian


Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins

2018-02-08 Thread Peter Bergner
On 2/8/18 10:38 AM, Peter Bergner wrote:
>   * gcc.target/powerpc/builtins-1-be.c: Filter out gimple folding disabled
>   message.  Fix test for running in 32-bit mode.

As we talked about offline, here's a bigger change to builtins-1-be.c that
cleans up the test a little more, since we generate xxlor in more cases
than just the __builtin_vec_or() call, so this change adds the -dp option
and we match the pattern name to verify we are getting as many as we expect
from that and that alone.  This also splits the xxland and xxlandc into
their own matches, which match the source test cases use of vec_and() and
vec_andc().

Peter


Index: gcc/testsuite/gcc.target/powerpc/builtins-1-be.c
===
--- gcc/testsuite/gcc.target/powerpc/builtins-1-be.c(revision 257390)
+++ gcc/testsuite/gcc.target/powerpc/builtins-1-be.c(working copy)
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc64-*-* } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
-/* { dg-options "-mcpu=power8 -O0 -mno-fold-gimple" } */
+/* { dg-options "-mcpu=power8 -O0 -mno-fold-gimple -dp" } */
+/* { dg-prune-output "gimple folding of rs6000 builtins has been disabled." } 
*/
 
 /* Test that a number of newly added builtin overloads are accepted
by the compiler.  */
@@ -22,10 +23,10 @@
vec_ctfxvmuldp 
vec_cts xvcvdpsxds, vctsxs
vec_ctu   xvcvdpuxds, vctuxs
-   vec_div   divd, divdu
+   vec_div   divd, divdu | __divdi3(), __udivdi3()
vec_mergel vmrghb, vmrghh, xxmrghw
vec_mergeh  xxmrglw, vmrglh
-   vec_mul mulld
+   vec_mul mulld | mullw, mulhwu
vec_nor xxlnor
vec_or xxlor
vec_packsu vpksdus
@@ -36,34 +37,39 @@
vec_rsqrt  xvrsqrtesp
vec_rsqrte xvrsqrtesp  */
 
-/* { dg-final { scan-assembler-times "vcmpequd." 4 } } */
-/* { dg-final { scan-assembler-times "vcmpgtud." 8 } } */
-/* { dg-final { scan-assembler-times "xxland" 29 } } */
-/* { dg-final { scan-assembler-times "vclzb" 2 } } */
-/* { dg-final { scan-assembler-times "vclzb" 2 } } */
-/* { dg-final { scan-assembler-times "vclzw" 2 } } */
-/* { dg-final { scan-assembler-times "vclzh" 2 } } */
-/* { dg-final { scan-assembler-times "xvcpsgnsp" 1 } } */
-/* { dg-final { scan-assembler-times "xvmuldp" 6 } } */
-/* { dg-final { scan-assembler-times "xvcvdpsxds" 1 } } */
-/* { dg-final { scan-assembler-times "vctsxs" 1 } } */
-/* { dg-final { scan-assembler-times "xvcvdpuxds" 1 } } */
-/* { dg-final { scan-assembler-times "vctuxs" 1 } } */
-/* { dg-final { scan-assembler-times "divd" 4 } } */
-/* { dg-final { scan-assembler-times "divdu" 2 } } */
-/* { dg-final { scan-assembler-times "vmrghb" 0 } } */
-/* { dg-final { scan-assembler-times "vmrghh" 3 } } */
-/* { dg-final { scan-assembler-times "xxmrghw" 1 } } */
-/* { dg-final { scan-assembler-times "xxmrglw" 4 } } */
-/* { dg-final { scan-assembler-times "vmrglh" 4 } } */
-/* { dg-final { scan-assembler-times "mulld" 4 } } */
-/* { dg-final { scan-assembler-times "xxlnor" 19 } } */
-/* { dg-final { scan-assembler-times "xxlor" 14 } } */
-/* { dg-final { scan-assembler-times "vpksdus" 1 } } */
-/* { dg-final { scan-assembler-times "vperm" 2 } } */
-/* { dg-final { scan-assembler-times "xvrdpi" 1 } } */
-/* { dg-final { scan-assembler-times "xxsel" 6 } } */
-/* { dg-final { scan-assembler-times "xxlxor" 6 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequd\M\.} 4 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtud\M\.} 8 } } */
+/* { dg-final { scan-assembler-times {\mxxland\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mxxlandc\M} 13 } } */
+/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvclzb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvclzw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvclzh\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxvcpsgnsp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvmuldp\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxvcvdpsxds\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvctsxs\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxvcvdpuxds\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvctuxs\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvmrghb\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mvmrghh\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mxxmrghw\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxmrglw\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mvmrglh\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M[^\n]*\mboolv4si3_internal\M} 6 
} } */
+/* { dg-final { scan-assembler-times {\mvpksdus\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvperm\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxvrdpi\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxsel\M} 6 } } */
+/* { dg-final { scan-assembler-times 

Re: [C++ Patch] PR 83806 ("[6/7/8 Regression] Spurious -Wunused-but-set-parameter with nullptr")

2018-02-08 Thread Paolo Carlini

Hi,

On 08/02/2018 18:38, Jason Merrill wrote:

On Thu, Feb 8, 2018 at 6:22 AM, Paolo Carlini  wrote:

Hi,

this one should be rather straightforward. As noticed by Jakub, we started
emitting the spurious warning with the fix for c++/69257, which, among other
things, fixed decay_conversion wrt mark_rvalue_use and mark_lvalue_use
calls. In particular it removed the mark_rvalue_use call at the very
beginning of the function, thus now a PARM_DECL with NULLPTR_TYPE as type,
being handled specially at the beginning of the function, doesn't get the
mark_rvalue_use treatment - which, for example, POINTER_TYPE now gets later.
I'm finishing testing on x86_64-linux the below. Ok if it passes?

A future -Wunused-but-set-variable might warn about the dead store to
exp; let's just discard the result of mark_rvalue_use.  OK with that
change.
Agreed, thanks. By the way, maybe it's the right occasion to voice that 
I find myself often confused about this topic, that is which specific 
functions are modifying their arguments and there isn't a specific 
reason to assign the return value. I ask myself: why then we return 
something instead of void? Maybe just convenience while writing some 
expressions? Would make sense. Then, would it make sense to find a way 
to "mark" the functions which are modifying their arguments? Sometimes 
isn't immediately obvious because we are of course passing around 
pointers and using typedefs to obfuscate the whole thing. Maybe we 
should just put real C++ to good use ;)

PS: sorry Jason, I have to re-send separately to the mailing list because
some HTML crept in again. Grrr.

Hmm, I thought the mailing lists had been adjusted to allow some HTML.
Yes, I noticed an exchange on gcc@ a while ago but didn't really follow 
the details. For your curiosity, the messages you got in your own 
mailbox definitely bounced with something like:


:
Invalid mime type "text/html" detected in message text or
attachment.  Please send plain text messages only.
Seehttp://sourceware.org/lists.html#sourceware-list-info  for more information.
contactgcc-patches-ow...@gcc.gnu.org  if you have questions about this. (#5.7.2)

Paolo.



Re: PATCH to fix bogus warning with -Wstringop-truncation -g (PR tree-optimization/84228)

2018-02-08 Thread Martin Sebor

On 02/08/2018 07:39 AM, Richard Biener wrote:

On Thu, Feb 8, 2018 at 6:35 AM, Jeff Law  wrote:

On 02/06/2018 05:57 AM, Jakub Jelinek wrote:

On Tue, Feb 06, 2018 at 01:46:21PM +0100, Marek Polacek wrote:

--- gcc/testsuite/c-c++-common/Wstringop-truncation-3.c
+++ gcc/testsuite/c-c++-common/Wstringop-truncation-3.c
@@ -0,0 +1,20 @@
+/* PR tree-optimization/84228 */
+/* { dg-do compile } */
+/* { dg-options "-Wstringop-truncation -O2 -g" } */
+
+char *strncpy (char *, const char *, __SIZE_TYPE__);
+struct S
+{
+  char arr[64];
+};
+
+int
+foo (struct S *p1, const char *a)
+{
+  if (a)
+goto err;
+  strncpy (p1->arr, a, sizeof p1->arr); /* { dg-bogus "specified bound" } */


Just curious, what debug stmt is in between those?
Wouldn't it be better to force them a couple of debug stmts?
Say
  int b = 5, c = 6, d = 7;
at the start of the function and
  b = 8; c = 9; d = 10;
in between strncpy and the '\0' store?


+  p1->arr[3] = '\0';
+err:
+  return 0;
+}
diff --git gcc/tree-ssa-strlen.c gcc/tree-ssa-strlen.c
index c3cf432a921..f0f6535017b 100644
--- gcc/tree-ssa-strlen.c
+++ gcc/tree-ssa-strlen.c
@@ -1849,7 +1849,7 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree 
src, tree cnt)

   /* Look for dst[i] = '\0'; after the stxncpy() call and if found
  avoid the truncation warning.  */
-  gsi_next ();
+  gsi_next_nondebug ();
   gimple *next_stmt = gsi_stmt (gsi);

   if (!gsi_end_p (gsi) && is_gimple_assign (next_stmt))


Ok for trunk, though generally looking at just next stmt is very fragile, might 
be
better to look at the strncpy's vuse immediate uses if they are within the
same basic block and either don't alias with it, or are the store it is
looking for, or something similar.

Martin and I wandered down this approach a bit and ultimately decided
against it.  While yes it could avoid a false positive by looking at the
immediate uses, but I'm not sure avoiding the false positive in those
cases is actually good!

THe larger the separation between the strcpy and the truncation the more
likely it is that the code is wrong or at least poorly written and
deserves a developer looksie.


But it's the next _GIMPLE_ stmt it looks at.  Make it

char d[3];

  void f (const char *s, int x)
  {
char d[x];
__builtin_strncpy (d, s, sizeof d);
d[x-1] = 0;
  }

and I bet it will again warn since x-1 is a separate GIMPLE stmt.


It doesn't but only because it doesn't know how to handle VLAs.


The patch is of course ok but still...  simply looking at all
immediate uses of the VDEF of the strncpy call, stopping
at the single stmt with the "next" VDEF should be better
(we don't have a next_vdef () helper).


IIRC, one of the problems I ran into was how to handle code
like this:

  void f (const char *s)
  {
char d[8];
char *p = strncpy (d, s, sizeof d);

foo (p);

d[7] = 0;
  }

I.e., having to track all pointers to d between the call to
strncpy and the assignment of the nul and make sure none of
them ends up used in a string function.  It didn't seem
the additional complexity would have been worth the effort
(or the likely false negatives).

Martin



PING: [PATCH] i386: Add __x86_indirect_thunk_nt_reg for -fcf-protection -mcet

2018-02-08 Thread H.J. Lu
On Fri, Feb 2, 2018 at 8:54 AM, H.J. Lu  wrote:
> nocf_check attribute can be used with -fcf-protection -mcet to disable
> control-flow check by adding NOTRACK prefix before indirect branch.
> When -mindirect-branch=thunk-extern -mindirect-branch-register is added,
> indirect branch via register, "notrack call/jmp reg", is converted to
>
> call/jmp __x86_indirect_thunk_nt_reg
>
> When running on machines with CET enabled, __x86_indirect_thunk_nt_reg
> can be updated to
>
> notrack jmp reg
>
> at run-time to restore NOTRACK prefix in the original indirect branch.
>
> Since we don't support -mindirect-branch=thunk-extern, CET and MPX at
> the same time, -mindirect-branch=thunk-extern is disallowed with
> -fcf-protection=branch and -fcheck-pointer-bounds.
>
> Tested on i686 and x86-64.  OK for trunk?
>
> Thanks.
>
> H.J.
> ---
> gcc/
>
> PR target/84176
> * config/i386/i386.c (ix86_set_indirect_branch_type): Issue an
> error when -mindirect-branch=thunk-extern, -fcf-protection=branch
> and -fcheck-pointer-bounds are used together.
> (indirect_thunk_prefix): New enum.
> (indirect_thunk_need_prefix): New function.
> (indirect_thunk_name): Replace need_bnd_p with need_prefix.  Use
> "_nt" instead of "_bnd" for NOTRACK prefix.
> (output_indirect_thunk): Replace need_bnd_p with need_prefix.
> (output_indirect_thunk_function): Likewise.
> (): Likewise.
> (ix86_code_end): Update output_indirect_thunk_function calls.
> (ix86_output_indirect_branch_via_reg): Replace
> ix86_bnd_prefixed_insn_p with indirect_thunk_need_prefix.
> (ix86_output_indirect_branch_via_push): Likewise.
> (ix86_output_function_return): Likewise.
> * doc/invoke.texi: Document -mindirect-branch=thunk-extern is
> incompatible with -fcf-protection=branch and
> -fcheck-pointer-bounds.
>
> gcc/testsuite/
>
> PR target/84176
> * gcc.target/i386/indirect-thunk-11.c: New test.
> * gcc.target/i386/indirect-thunk-12.c: Likewise.
> * gcc.target/i386/indirect-thunk-attr-12.c: Likewise.
> * gcc.target/i386/indirect-thunk-attr-13.c: Likewise.
> * gcc.target/i386/indirect-thunk-attr-14.c: Likewise.
> * gcc.target/i386/indirect-thunk-attr-15.c: Likewise.
> * gcc.target/i386/indirect-thunk-attr-16.c: Likewise.
> * gcc.target/i386/indirect-thunk-extern-10.c: Likewise.
> * gcc.target/i386/indirect-thunk-extern-8.c: Likewise.
> * gcc.target/i386/indirect-thunk-extern-9.c: Likewise.

Hi Jan,

Can you review it:

https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00113.html

Thanks.


-- 
H.J.


PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-08 Thread H.J. Lu
On Sun, Jan 28, 2018 at 11:56 AM, H.J. Lu  wrote:
> On Sat, Jan 27, 2018 at 2:12 PM, H.J. Lu  wrote:
>> For
>>
>> ---
>> struct C {
>>   virtual ~C();
>>   virtual void f();
>> };
>>
>> void
>> f (C *p)
>> {
>>   p->f();
>>   p->f();
>> }
>> ---
>>
>> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
>>
>> _Z1fP1C:
>> .LFB0:
>> .cfi_startproc
>> pushq   %rbx
>> .cfi_def_cfa_offset 16
>> .cfi_offset 3, -16
>> movq(%rdi), %rax
>> movq%rdi, %rbx
>> jmp .LIND1
>> .LIND0:
>> pushq   16(%rax)
>> jmp __x86_indirect_thunk
>> .LIND1:
>> call.LIND0
>> movq(%rbx), %rax
>> movq%rbx, %rdi
>> popq%rbx
>> .cfi_def_cfa_offset 8
>> movq16(%rax), %rax
>> jmp __x86_indirect_thunk_rax
>> .cfi_endproc
>>
>> x86-64 is supposed to have asynchronous unwind tables by default, but
>> there is nothing that reflects the change in the (relative) frame
>> address after .LIND0.  That region really has to be moved outside of
>> the .cfi_startproc/.cfi_endproc bracket.
>>
>> This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
>> branch via register when -mindirect-branch=thunk-extern is used.  Now,
>> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
>>
>> _Z1fP1C:
>> .LFB0:
>> .cfi_startproc
>> pushq   %rbx
>> .cfi_def_cfa_offset 16
>> .cfi_offset 3, -16
>> movq(%rdi), %rax
>> movq%rdi, %rbx
>> movq16(%rax), %rax
>> call__x86_indirect_thunk_rax
>> movq(%rbx), %rax
>> movq%rbx, %rdi
>> popq%rbx
>> .cfi_def_cfa_offset 8
>> movq16(%rax), %rax
>> jmp __x86_indirect_thunk_rax
>> .cfi_endproc
>>
>> Now "-mindirect-branch=thunk-extern" is equivalent to
>> "-mindirect-branch=thunk-extern -mindirect-branch-register", which is
>> used by Linux kernel.
>>
>> Tested on i686 and x86-64.  OK for trunk?
>>
>> Thanks.
>>
>> H.J.
>> 
>> gcc/
>>
>> PR target/84039
>> * config/i386/constraints.md (Bs): Replace
>> ix86_indirect_branch_register with
>> TARGET_INDIRECT_BRANCH_REGISTER.
>> (Bw): Likewise.
>> * config/i386/i386.md (indirect_jump): Likewise.
>> (tablejump): Likewise.
>> (*sibcall_memory): Likewise.
>> (*sibcall_value_memory): Likewise.
>> Peepholes of indirect call and jump via memory: Likewise.
>> * config/i386/i386.opt: Likewise.
>> * config/i386/predicates.md (indirect_branch_operand): Likewise.
>> (GOT_memory_operand): Likewise.
>> (call_insn_operand): Likewise.
>> (sibcall_insn_operand): Likewise.
>> (GOT32_symbol_operand): Likewise.
>> * config/i386/i386.h (TARGET_INDIRECT_BRANCH_REGISTER): New.
>>
>
> Here is the updated patch  to disallow *sibcall_GOT_32 and 
> *sibcall_value_GOT_32
> for TARGET_INDIRECT_BRANCH_REGISTER.
>
> Tested on i686 and x86-64.  OK for trunk?
>

Hi Jan,

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02233.html

Is OK for trunk?

Thanks.

-- 
H.J.


Fwd: Please accept this commit for the trunk

2018-02-08 Thread Douglas Mencken
> Does this resolve all of PR84113?  If so, I can push the bug along

Yep, as I wrote in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84113#c35

> What PR was the attachment url from?

It took me about one minute to figure out and find the bug with patch
mentioned by Segher in https://gcc.gnu.org/bugzilla/
show_bug.cgi?id=84113#c25

Please test https://gcc.gnu.org/ml/gcc-testresults/2017-01/txtnZhWiDkC4z.txtg
 


On Wed, Feb 7, 2018 at 6:52 PM, Mike Stump  wrote:

> On Feb 5, 2018, at 8:42 AM, Douglas Mencken  wrote:
> >
> > I’m about
> >
> > “ [PATCH 2/4] [Darwin,PPC] Remove uses of LR in
> > restore_world ” https://gcc.gnu.org/bugzilla/attachment.cgi?id=42304
> >
> > look at bug #84113 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84113
> for
> > more info
> >
> > “ One important question ’s yet: Why this patch has been ignored despite
> > it’s been made just in time? ”
>
> I dusted the pointed to patch off and check it in.  Let us know how it
> goes.
>
> Does this resolve all of PR84113?  If so, I can push the bug along.
>
> What PR was the attachment url from?
>
> Thanks for your help.
>
>


Re: [C++ Patch] PR 83806 ("[6/7/8 Regression] Spurious -Wunused-but-set-parameter with nullptr")

2018-02-08 Thread Jason Merrill
On Thu, Feb 8, 2018 at 6:22 AM, Paolo Carlini  wrote:
> Hi,
>
> this one should be rather straightforward. As noticed by Jakub, we started
> emitting the spurious warning with the fix for c++/69257, which, among other
> things, fixed decay_conversion wrt mark_rvalue_use and mark_lvalue_use
> calls. In particular it removed the mark_rvalue_use call at the very
> beginning of the function, thus now a PARM_DECL with NULLPTR_TYPE as type,
> being handled specially at the beginning of the function, doesn't get the
> mark_rvalue_use treatment - which, for example, POINTER_TYPE now gets later.
> I'm finishing testing on x86_64-linux the below. Ok if it passes?

A future -Wunused-but-set-variable might warn about the dead store to
exp; let's just discard the result of mark_rvalue_use.  OK with that
change.

> PS: sorry Jason, I have to re-send separately to the mailing list because
> some HTML crept in again. Grrr.

Hmm, I thought the mailing lists had been adjusted to allow some HTML.

Jason


Re: [C++ Patch] Use INDIRECT_REF_P in a few more places

2018-02-08 Thread Jason Merrill
OK.

On Thu, Feb 8, 2018 at 4:19 AM, Paolo Carlini  wrote:
> Hi,
>
> yesterday I noticed these and I regression tested the change together with
> my fix for 83204. Can definitely wait, but seems very safe...
>
> Cheers, Paolo.
>
> ///
>


Re: [PATCH/RFC] Fix ICE in find_taken_edge_computed_goto (PR 84136)

2018-02-08 Thread Joseph Myers
On Wed, 7 Feb 2018, Jeff Law wrote:

> Ideally we'd tighten the extension's language so that we could issue an
> error out of the front-end.

It seems to me to be the sort of thing that's only undefined at execution 
time - it's perfectly valid to store the address of a label in a global, 
and later jump to it if you haven't left the function execution within 
which you took the address.  I.e., if you detect this case and wish to 
diagnose it, it should be a warning plus generating a call to 
__builtin_trap, not an error.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] -Wformat: fix nonsensical "wide character" message (PR c/84258)

2018-02-08 Thread Joseph Myers
On Wed, 7 Feb 2018, David Malcolm wrote:

> gcc/c-family/ChangeLog:
>   PR c/84258
>   * c-format.c (struct format_check_results): Add field
>   "number_non_char".
>   (check_format_info): Initialize it, and warn if encountered.
>   (check_format_arg): Distinguish between wide char and
>   everything else when detecting arrays of non-char.
> 
> gcc/testsuite/ChangeLog:
>   PR c/84258
>   * c-c++-common/Wformat-pr84258.c: New test.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH][i386][3/3] PR target/84164: Make *cmpqi_ext_ patterns accept more zero_extract modes

2018-02-08 Thread Kyrill Tkachov

Hi all,

This patch fixes some fallout in the i386 testsuite that occurs after the 
simplification in patch [1/3] [1].
The gcc.target/i386/extract-2.c FAILs because it expects to match:
(set (reg:CC 17 flags)
(compare:CC (subreg:QI (zero_extract:SI (reg:HI 98)
(const_int 8 [0x8])
(const_int 8 [0x8])) 0)
(const_int 4 [0x4])))

which is the *cmpqi_ext_2 pattern in i386.md but with the new simplification 
the combine/simplify-rtx
machinery produces:
(set (reg:CC 17 flags)
(compare:CC (subreg:QI (zero_extract:HI (reg:HI 98)
(const_int 8 [0x8])
(const_int 8 [0x8])) 0)
(const_int 4 [0x4])))

Notice that the zero_extract now has HImode like the register source rather 
than SImode.
The existing *cmpqi_ext_ patterns however explicitly demand an SImode on the 
zero_extract.
I'm not overly familiar with the i386 port but I think that's too restrictive.
The RTL documentation says:
For (zero_extract:m loc size pos) "The mode m is the same as the mode that would be 
used for loc if it were a register."
I'm not sure if that means that the mode of the zero_extract and the source 
register must always match (as is the
case after patch [1/3]) but in any case it shouldn't matter semantically since 
we're taking a QImode subreg of the whole
thing anyway.

So the proposed solution in this patch is to allow HI, SI and DImode 
zero_extracts in these patterns as these are the
modes that the ext_register_operand predicate accepts, so that the patterns can 
match the new form above.

With this patch the aforementioned test passes again and bootstrap and testing 
on x86_64-unknown-linux-gnu shows
no regressions.

Is this ok for trunk if the first patch is accepted?

Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00443.html

2018-02-07  Kyrylo Tkachov  

PR target/84164
* config/i386/i386.md (*cmpqi_ext_1): Rename to...
(*cmpqi_ext_1): ... This.  Use SWI248 mode iterator
for zero_extract.
(*cmpqi_ext_2): Rename to...
(*cmpqi_ext_2): ... This.  Use SWI248 mode iterator
for zero_extract.
(*cmpqi_ext_3): Rename to...
(*cmpqi_ext_3): ... This.  Use SWI248 mode iterator
for zero_extract.
(*cmpqi_ext_4): Rename to...
(*cmpqi_ext_4): ... This.  Use SWI248 mode iterator
for zero_extract.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a4832bf696f321e8ee5aad71fa946ca198d9d689..ced9a3e823ae6c4586be510a782d354f4d364daa 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1328,12 +1328,12 @@ (define_insn "*cmp_minus_1"
   [(set_attr "type" "icmp")
(set_attr "mode" "")])
 
-(define_insn "*cmpqi_ext_1"
+(define_insn "*cmpqi_ext_1"
   [(set (reg FLAGS_REG)
 	(compare
 	  (match_operand:QI 0 "nonimmediate_operand" "QBc,m")
 	  (subreg:QI
-	(zero_extract:SI
+	(zero_extract:SWI248
 	  (match_operand 1 "ext_register_operand" "Q,Q")
 	  (const_int 8)
 	  (const_int 8)) 0)))]
@@ -1343,11 +1343,11 @@ (define_insn "*cmpqi_ext_1"
(set_attr "type" "icmp")
(set_attr "mode" "QI")])
 
-(define_insn "*cmpqi_ext_2"
+(define_insn "*cmpqi_ext_2"
   [(set (reg FLAGS_REG)
 	(compare
 	  (subreg:QI
-	(zero_extract:SI
+	(zero_extract:SWI248
 	  (match_operand 0 "ext_register_operand" "Q")
 	  (const_int 8)
 	  (const_int 8)) 0)
@@ -1368,11 +1368,11 @@ (define_expand "cmpqi_ext_3"
 	  (const_int 8)) 0)
 	  (match_operand:QI 1 "const_int_operand")))])
 
-(define_insn "*cmpqi_ext_3"
+(define_insn "*cmpqi_ext_3"
   [(set (reg FLAGS_REG)
 	(compare
 	  (subreg:QI
-	(zero_extract:SI
+	(zero_extract:SWI248
 	  (match_operand 0 "ext_register_operand" "Q,Q")
 	  (const_int 8)
 	  (const_int 8)) 0)
@@ -1383,16 +1383,16 @@ (define_insn "*cmpqi_ext_3"
(set_attr "type" "icmp")
(set_attr "mode" "QI")])
 
-(define_insn "*cmpqi_ext_4"
+(define_insn "*cmpqi_ext_4"
   [(set (reg FLAGS_REG)
 	(compare
 	  (subreg:QI
-	(zero_extract:SI
+	(zero_extract:SWI248
 	  (match_operand 0 "ext_register_operand" "Q")
 	  (const_int 8)
 	  (const_int 8)) 0)
 	  (subreg:QI
-	(zero_extract:SI
+	(zero_extract:SWI248
 	  (match_operand 1 "ext_register_operand" "Q")
 	  (const_int 8)
 	  (const_int 8)) 0)))]


[PATCH][AArch64][1/3] PR target/84164: Simplify subreg + redundant AND-immediate (was: PR target/84164: Relax predicate in *aarch64__reg_di3_mask2)

2018-02-08 Thread Kyrill Tkachov


On 06/02/18 11:32, Richard Earnshaw (lists) wrote:

On 02/02/18 15:43, Kyrill Tkachov wrote:

Hi Richard,

On 02/02/18 15:25, Richard Earnshaw (lists) wrote:

On 02/02/18 15:10, Kyrill Tkachov wrote:

Hi all,

In this [8 Regression] PR we ICE because we can't recognise the insn:
(insn 59 58 43 7 (set (reg:DI 124)
  (rotatert:DI (reg:DI 125 [ c ])
  (subreg:QI (and:SI (reg:SI 128)
  (const_int 65535 [0x])) 0)))

Aren't we heading off down the wrong path here?

(subreg:QI (and:SI (reg:SI 128) (const_int 65535 [0x])) 0))

can be simplified to

(subreg:QI (reg:SI 128) 0)

since the AND operation is redundant as we're only looking at the bottom
8 bits.

I have tried implementing such a transformation in combine [1]
but it was not clear that it was universally beneficial.
See the linked thread and PR 70119 for the discussion (the thread
continues into the next month).

Is that really the same thing?  The example there was using a mask that
was narrower than the subreg and thus not redundant.  The case here is
where the mask is completely redundant because we are only looking at
the bottom 8 bits of the result (which are not changed by the AND
operation).


I think you're right Richard, we can teach simplify-rtx to handle this.
This would make the fix a bit more involved.  The attached patch implements it.
It adds a simplification rule to simplify_subreg to collapse subregs
of AND-immediate operations when the mask covers the mode mask.
This allows us to simplify (subreg:QI (and:SI (reg:SI) (const_int 0xff)) 0)
into (subreg:QI (reg:SI) 0).
We cannot completely remove the creation of SUBREG-AND-immediate RTXes in the
aarch64 splitters because there are cases where we still need that construct,
in particular when the mask covers the just the mode size, for example:
(subreg:QI (and:SI (reg:SI) (const_int 31 [0x1F])) 0).
In this case we cannot simplify this to (subreg:QI (reg:SI) 0) in the midend
and we cannot rely on the aarch64-specific behaviour of the integer LSR, LSL, 
ROR
instructions to truncate the register shift amount by the mode width because
these patterns may match the integer SISD alternatives (USHR and friends) that 
don't
perform an implicit truncation.

This patch passes bootstrap and test on arm-none-linux-gnueabihf and 
aarch64-none-linux-gnu.
There is a regression on aarch64 in the gcc.target/aarch64/bfxil_1.c testcase 
that I address
with a separate patch. There is also an i386 regression that I address 
separately too.

Is this version preferable? I'll ping the midend maintainers for the 
simplify-rtx.c change if so.
Thanks,
Kyrill

2018-02-08  Kyrylo Tkachov  

PR target/84164
* simplify-rtx.c (simplify_subreg): Simplify subreg of masking
operation when mask covers the outermode bits.
* config/aarch64/aarch64.md (*aarch64_reg_3_neg_mask2):
Use simplify_gen_subreg when creating a SUBREG.
(*aarch64_reg_3_minus_mask): Likewise.
(*aarch64__reg_di3_mask2): Use const_int_operand predicate
for operand 3.

2018-02-08  Kyrylo Tkachov  

PR target/84164
* gcc.c-torture/compile/pr84164.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 0d13c356d2a9b86c701c15b818e95df1a00abfc5..d9b4a405c9a1e5c09278b2329e5d73f12b0b963d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4278,8 +4278,10 @@ (define_insn_and_split "*aarch64_reg_3_neg_mask2"
 emit_insn (gen_negsi2 (tmp, operands[2]));
 
 rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]);
-rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
- SUBREG_BYTE (operands[4]));
+rtx subreg_tmp = simplify_gen_subreg (GET_MODE (operands[4]), and_op,
+	   SImode, SUBREG_BYTE (operands[4]));
+
+gcc_assert (subreg_tmp);
 emit_insn (gen_3 (operands[0], operands[1], subreg_tmp));
 DONE;
   }
@@ -4305,9 +4307,10 @@ (define_insn_and_split "*aarch64_reg_3_minus_mask"
 emit_insn (gen_negsi2 (tmp, operands[3]));
 
 rtx and_op = gen_rtx_AND (SImode, tmp, operands[4]);
-rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[5]), and_op,
- SUBREG_BYTE (operands[5]));
+rtx subreg_tmp = simplify_gen_subreg (GET_MODE (operands[5]), and_op,
+	   SImode, SUBREG_BYTE (operands[5]));
 
+gcc_assert (subreg_tmp);
 emit_insn (gen_ashl3 (operands[0], operands[1], subreg_tmp));
 DONE;
   }
@@ -4318,9 +4321,9 @@ (define_insn "*aarch64__reg_di3_mask2"
 	(SHIFT:DI
 	  (match_operand:DI 1 "register_operand" "r")
 	  (match_operator 4 "subreg_lowpart_operator"
-	   [(and:SI (match_operand:SI 2 "register_operand" "r")
-		 (match_operand 3 "aarch64_shift_imm_di" "Usd"))])))]
-  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (DImode)-1)) == 0)"
+	[(and:SI (match_operand:SI 2 "register_operand" "r")
+		 (match_operand 3 "const_int_operand" "n"))])))]
+  "((~INTVAL (operands[3]) & 

[PATCH][AArch64][2/3] PR target/84164: Add ZERO_EXTRACT + LSHIFTRT pattern for BFXIL instruction

2018-02-08 Thread Kyrill Tkachov

Hi all,

This is a followup to the other PR target/84164 patch [1] that fixes the 
testsuite regression
gcc.target/aarch64/bfxil_1.c.
The regression is that with the new subreg+masking simplification we no longer 
match the
pattern for BFXIL that has the form:
(set (zero_extract:DI (reg/v:DI 76 [ a ])
(const_int 8 [0x8])
(const_int 0 [0]))
(zero_extract:DI (reg/v:DI 76 [ a ])
(const_int 8 [0x8])
(const_int 16 [0x10])))

This is now instead represented as:
(set (zero_extract:DI (reg/v:DI 93 [ a ])
(const_int 8 [0x8])
(const_int 0 [0]))
(lshiftrt:DI (reg/v:DI 93 [ a ])
(const_int 16 [0x10])))

As far as I can see the two are equivalent semantically and the LSHIFTRT form 
is a bit
simpler, so I think the simplified form is valid, but we have no pattern to 
match it.
This patch adds that pattern to catch this form as well.
This fixes the aforementioned regression and bootstrap and testing on 
aarch64-none-linux-gnu
shows no problem.

Is this ok for trunk if the first patch goes in?
Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00443.html

2018-02-08  Kyrylo Tkachov  

PR target/84164
* config/aarch64/aarch64.md (*extr_insv_lower_reg_lshiftrt):
New pattern.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 62a4f8262a316087894aeb555c609fbe75885203..2c6343a363c161ed503ca1ddd752e21b5c941eb8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4803,6 +4803,19 @@ (define_insn "*extr_insv_lower_reg"
   [(set_attr "type" "bfm")]
 )
 
+(define_insn "*extr_insv_lower_reg_lshiftrt"
+  [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
+			  (match_operand 1 "const_int_operand" "n")
+			  (const_int 0))
+	(lshiftrt:GPI (match_operand:GPI 2 "register_operand" "r")
+			(match_operand 3 "const_int_operand" "n")))]
+  "!(UINTVAL (operands[1]) == 0
+ || (UINTVAL (operands[3]) + UINTVAL (operands[1])
+	 > GET_MODE_BITSIZE (mode)))"
+  "bfxil\\t%0, %2, %3, %1"
+  [(set_attr "type" "bfm")]
+)
+
 (define_insn "*_shft_"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(ashift:GPI (ANY_EXTEND:GPI


[Committed] S/390: Disable branch prediction for indirect branches

2018-02-08 Thread Andreas Krebbel
gcc/ChangeLog:

2018-02-08  Andreas Krebbel  

Backport from mainline
2018-02-08  Andreas Krebbel  

* config/s390/s390-opts.h (enum indirect_branch): Define.
* config/s390/s390-protos.h (s390_return_addr_from_memory)
(s390_indirect_branch_via_thunk)
(s390_indirect_branch_via_inline_thunk): Add function prototypes.
(enum s390_indirect_branch_type): Define.
* config/s390/s390.c (struct s390_frame_layout, struct
machine_function): Remove.
(indirect_branch_prez10thunk_mask, indirect_branch_z10thunk_mask)
(indirect_branch_table_label_no, indirect_branch_table_name):
Define variables.
(INDIRECT_BRANCH_NUM_OPTIONS): Define macro.
(enum s390_indirect_branch_option): Define.
(s390_return_addr_from_memory): New function.
(s390_handle_string_attribute): New function.
(s390_attribute_table): Add new attribute handler.
(s390_execute_label): Handle UNSPEC_EXECUTE_JUMP patterns.
(s390_indirect_branch_via_thunk): New function.
(s390_indirect_branch_via_inline_thunk): New function.
(s390_function_ok_for_sibcall): When jumping via thunk disallow
sibling call optimization for non z10 compiles.
(s390_emit_call): Force indirect branch target to be a single
register.  Add r1 clobber for non-z10 compiles.
(s390_emit_epilogue): Emit return jump via return_use expander.
(s390_reorg): Handle JUMP_INSNs as execute targets.
(s390_option_override_internal): Perform validity checks for the
new command line options.
(s390_indirect_branch_attrvalue): New function.
(s390_indirect_branch_settings): New function.
(s390_set_current_function): Invoke s390_indirect_branch_settings.
(s390_output_indirect_thunk_function):  New function.
(s390_code_end): Implement target hook.
(s390_case_values_threshold): Implement target hook.
(TARGET_ASM_CODE_END, TARGET_CASE_VALUES_THRESHOLD): Define target
macros.
* config/s390/s390.h (struct s390_frame_layout)
(struct machine_function): Move here from s390.c.
(TARGET_INDIRECT_BRANCH_NOBP_RET)
(TARGET_INDIRECT_BRANCH_NOBP_JUMP)
(TARGET_INDIRECT_BRANCH_NOBP_JUMP_THUNK)
(TARGET_INDIRECT_BRANCH_NOBP_JUMP_INLINE_THUNK)
(TARGET_INDIRECT_BRANCH_NOBP_CALL)
(TARGET_DEFAULT_INDIRECT_BRANCH_TABLE)
(TARGET_INDIRECT_BRANCH_THUNK_NAME_EXRL)
(TARGET_INDIRECT_BRANCH_THUNK_NAME_EX)
(TARGET_INDIRECT_BRANCH_TABLE): Define macros.
* config/s390/s390.md (UNSPEC_EXECUTE_JUMP)
(INDIRECT_BRANCH_THUNK_REGNUM): Define constants.
(mnemonic attribute): Add values which aren't recognized
automatically.
("*cjump_long", "*icjump_long", "*basr", "*basr_r"): Disable
pattern for branch conversion.  Fix mnemonic attribute.
("*c", "*sibcall_br", "*sibcall_value_br", "*return"): Emit
indirect branch via thunk if requested.
("indirect_jump", ""): Expand patterns for branch conversion.
("*indirect_jump"): Disable for branch conversion using out of
line thunks.
("indirect_jump_via_thunk_z10")
("indirect_jump_via_thunk")
("indirect_jump_via_inlinethunk_z10")
("indirect_jump_via_inlinethunk", "*casesi_jump")
("casesi_jump_via_thunk_z10", "casesi_jump_via_thunk")
("casesi_jump_via_inlinethunk_z10")
("casesi_jump_via_inlinethunk", "*basr_via_thunk_z10")
("*basr_via_thunk", "*basr_r_via_thunk_z10")
("*basr_r_via_thunk", "return_prez10"): New pattern.
("*indirect2_jump"): Disable for branch conversion.
("casesi_jump"): Turn into expander and expand patterns for branch
conversion.
("return_use"): New expander.
("*return"): Emit return via thunk and rename it to ...
("*return"): ... this one.
* config/s390/s390.opt: Add new options and and enum for the
option values.

gcc/testsuite/ChangeLog:

2018-02-08  Andreas Krebbel  

Backport from mainline
2018-02-08  Andreas Krebbel  

* gcc.target/s390/nobp-function-pointer-attr.c: New test.
* gcc.target/s390/nobp-function-pointer-nothunk.c: New test.
* gcc.target/s390/nobp-function-pointer-z10.c: New test.
* gcc.target/s390/nobp-function-pointer-z900.c: New test.
* gcc.target/s390/nobp-indirect-jump-attr.c: New test.
* gcc.target/s390/nobp-indirect-jump-inline-attr.c: New test.
* gcc.target/s390/nobp-indirect-jump-inline-z10.c: New test.
* gcc.target/s390/nobp-indirect-jump-inline-z900.c: New test.
* gcc.target/s390/nobp-indirect-jump-nothunk.c: New test.
* gcc.target/s390/nobp-indirect-jump-z10.c: New test.

Re: RFA: Fix PR 68028: LTO error when compiling PowerPC binaries with single precision floating point

2018-02-08 Thread Segher Boessenkool
Hi Nick,

On Thu, Feb 08, 2018 at 03:49:38PM +, Nick Clifton wrote:
>   I should note that Richard Guenther feels that there is a better way
>   to solve the problem - by only initializing the values once - but I
>   still like my solution, so I am offering it here.

I thought you were going to do a patch like the following, to make the
e500 cores less special (they are not):


>From 44c3b661ef75e57415b10ca2cec5ac6427bbed8f Mon Sep 17 00:00:00 2001
Message-Id: 
<44c3b661ef75e57415b10ca2cec5ac6427bbed8f.1518109128.git.seg...@kernel.crashing.org>
From: Segher Boessenkool 
Date: Thu, 8 Feb 2018 16:58:33 +
Subject: [PATCH] 68028

---
 gcc/config/rs6000/rs6000.c | 19 ---
 1 file changed, 19 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 5adccd2..73de617 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4809,25 +4809,6 @@ rs6000_option_override_internal (bool global_init_p)
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
 rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
-  /* For the E500 family of cores, reset the single/double FP flags to let us
- check that they remain constant across attributes or pragmas.  */
-
-  switch (rs6000_cpu)
-{
-case PROCESSOR_PPC8540:
-case PROCESSOR_PPC8548:
-case PROCESSOR_PPCE500MC:
-case PROCESSOR_PPCE500MC64:
-case PROCESSOR_PPCE5500:
-case PROCESSOR_PPCE6500:
-  rs6000_single_float = 0;
-  rs6000_double_float = 0;
-  break;
-
-default:
-  break;
-}
-
   if (main_target_opt)
 {
   if (main_target_opt->x_rs6000_single_float != rs6000_single_float)
-- 
1.8.3.1



[PING] [PATCH] [MSP430] PR79242: Implement Complex Partial Integers

2018-02-08 Thread Jozef Lawrynowicz
ping x1

Complex Partial Integers are unimplemented, resulting in an ICE when
attempting to use them. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79242
This results in GCC7/8 for msp430-elf failing to build.

typedef _Complex __int20 C;

C
foo (C x, C y)
{
  return x + y;
}

(Thanks Jakub - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79242#c2)

../../gcc/testsuite/gcc.target/msp430/pr79242.c: In function 'foo':
../../gcc/testsuite/gcc.target/msp430/pr79242.c:8:1: internal compiler
error: in make_decl_rtl, at varasm.c:1304
 foo (C x, C y)
 ^~~
0xc07b29 make_decl_rtl(tree_node*)
../../gcc/varasm.c:1303
0x67523c set_parm_rtl(tree_node*, rtx_def*)
../../gcc/cfgexpand.c:1274
0x79ffb9 expand_function_start(tree_node*)
../../gcc/function.c:5166
0x6800e1 execute
../../gcc/cfgexpand.c:6250

The attached patch defines a new complex mode for PARTIAL_INT.
You may notice that genmodes.c:complex_class returns MODE_COMPLEX_INT for
MODE_PARTIAL_INT rather than MODE_COMPLEX_PARTIAL_INT. I reviewed the uses of
MODE_COMPLEX_INT and it doesn't looked like a Complex Partial Int requires any
different behaviour to MODE_COMPLEX_INT.

msp430_hard_regno_nregs now returns 2 for CPSImode, but I feel like this may be
better handled in the front-end. PSImode is already defined to only use 1
register, so for a CPSI shouldn't the front-end should be able to work out that
double the amount of registers are required? Thoughts?
Without the definition for CPSI in msp430_hard_regno_nregs,
rtlanal.c:subreg_get_info thinks that a CPSI requires 4 registers of size 2,
instead of 2 registers of size 4.

Successfully bootstrapped and tested for c,c++,fortran,lto,objc on
x86_64-pc-linux-gnu with no regressions on gcc-7-branch.

With this patch gcc-7-branch now builds for msp430-elf. A further bug prevents
trunk from building for msp430-elf.

If the attached patch is acceptable, I would appreciate if someone would commit
it for me (to trunk and gcc-7-branch), as I do not have write access.
From 31d8554ebb6afeb2d8f235cf3d3c262236aa5e32 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Fri, 12 Jan 2018 13:23:40 +
Subject: [PATCH] Add support for Complex Partial Integers - CPSImode

2018-01-XX  Jozef Lawrynowicz 

gcc/
  PR target/79242
  * machmode.def: Define a complex mode for PARTIAL_INT.
  * genmodes.c (complex_class): Return MODE_COMPLEX_INT for
MODE_PARTIAL_INT.
  * doc/rtl.texi: Document CSPImode.
  * config/msp430/msp430.c (msp430_hard_regno_nregs): Add CPSImode
handling.
(msp430_hard_regno_nregs_with_padding): Likewise.

gcc/testsuite/
  PR target/79242
  * gcc.target/msp430/pr79242.c: New test.
---
 gcc/config/msp430/msp430.c|  4 
 gcc/doc/rtl.texi  |  5 +++--
 gcc/genmodes.c|  1 +
 gcc/machmode.def  |  1 +
 gcc/testsuite/gcc.target/msp430/pr79242.c | 11 +++
 5 files changed, 20 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/msp430/pr79242.c

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index 710a97b..c1f0d5b 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -905,6 +905,8 @@ msp430_hard_regno_nregs (int regno ATTRIBUTE_UNUSED,
 {
   if (mode == PSImode && msp430x)
 return 1;
+  if (mode == CPSImode && msp430x)
+return 2;
   return ((GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1)
 	  / UNITS_PER_WORD);
 }
@@ -927,6 +929,8 @@ msp430_hard_regno_nregs_with_padding (int regno ATTRIBUTE_UNUSED,
 {
   if (mode == PSImode)
 return 2;
+  if (mode == CPSImode)
+return 4;
   return msp430_hard_regno_nregs (regno, mode);
 }
 
diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index b02e5a1..ebe2a63 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -1291,10 +1291,11 @@ point values.  The floating point values are in @code{QFmode},
 @findex CDImode
 @findex CTImode
 @findex COImode
-@item CQImode, CHImode, CSImode, CDImode, CTImode, COImode
+@findex CPSImode
+@item CQImode, CHImode, CSImode, CDImode, CTImode, COImode, CPSImode
 These modes stand for a complex number represented as a pair of integer
 values.  The integer values are in @code{QImode}, @code{HImode},
-@code{SImode}, @code{DImode}, @code{TImode}, and @code{OImode},
+@code{SImode}, @code{DImode}, @code{TImode}, @code{OImode}, and @code{PSImode},
 respectively.
 
 @findex BND32mode
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index e56c08b..2af6556 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -116,6 +116,7 @@ complex_class (enum mode_class c)
   switch (c)
 {
 case MODE_INT: return MODE_COMPLEX_INT;
+case MODE_PARTIAL_INT: return MODE_COMPLEX_INT;
 case MODE_FLOAT: return MODE_COMPLEX_FLOAT;
 default:
   error ("no complex class for class %s", mode_class_names[c]);
diff --git a/gcc/machmode.def b/gcc/machmode.def
index afe6851..6c84488 100644
--- a/gcc/machmode.def
+++ 

Re: [PATCH, rs6000] Fix PR83926, ICE using __builtin_vsx_{div,udiv,mul}_2di builtins

2018-02-08 Thread Peter Bergner
On 2/6/18 10:36 AM, Peter Bergner wrote:
> On 2/6/18 10:20 AM, David Edelsohn wrote:
>> Do the gen_XXXdi3 calls work if you use SDI iterator instead of GPR
>> iterator, as Segher suggested?
> 
> Well it works _if_ we use the first patch that changes the gen_*
> patterns.  If we go this route, I agree we should use the SDI
> iterator instead of GPR.

Actually, my bad.  While bootstrapping this on a BE system, we get an
error when we attempt a 64-bit multiply in 32-bit mode.  In this case,
the gen_muldi3() pattern calls expand_mult(DImode, ...) and the automatic
expand machinery notices the gen_muldi3() now allows DImode in the
!TARGET_POWERPC64 case and then calls gen_muldi3() to emit the multiply
and we go into infinite recursion.  We don't have that problem in the
div/udiv case, because we call out to the lib routines, so no recursion.
Given this, I think we should probably go with the patch that modifies
vsx.md and guards the calls to gen_{div,udiv,mul}di3() with a TARGET_POWERPC64
test.

For completeness, that patch again is below with one testsuite addition.
The builtins-1-be.c test case must never have been tested in 32-bit mode,
since it was always ICEing from the beginning.  I've fixed it to run in
both 32-bit and 64-bit modes and in 32-bit mode, it now correctly scans
for the 64-bit div/udiv/mul cases this patch generates.

Again, this passed bootstrap and regtesting on powerpc64le-linux as well
as on powerpc64-linux and running the testsuite in both 32-bit and 64-bit
modes.  Ok for trunk?

Peter


gcc/
PR target/83926
* config/rs6000/vsx.md (vsx_mul_v2di): Handle generating a 64-bit
multiply in 32-bit mode.
(vsx_div_v2di): Handle generating a 64-bit signed divide in 32-bit mode.
(vsx_udiv_v2di): Handle generating a 64-bit unsigned divide in 32-bit
mode.

gcc/testsuite/
PR target/83926
* gcc.target/powerpc/pr83926.c: New test.
* gcc.target/powerpc/builtins-1-be.c: Filter out gimple folding disabled
message.  Fix test for running in 32-bit mode.

Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 257390)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -1650,10 +1650,22 @@
   rtx op5 = gen_reg_rtx (DImode);
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  emit_insn (gen_muldi3 (op5, op3, op4));
+  if (TARGET_POWERPC64)
+emit_insn (gen_muldi3 (op5, op3, op4));
+  else
+{
+  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+  emit_move_insn (op5, ret);
+}
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  emit_insn (gen_muldi3 (op3, op3, op4));
+  if (TARGET_POWERPC64)
+emit_insn (gen_muldi3 (op3, op3, op4));
+  else
+{
+  rtx ret = expand_mult (DImode, op3, op4, NULL, 0, false);
+  emit_move_insn (op3, ret);
+}
   emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }"
@@ -1688,10 +1700,30 @@
   rtx op5 = gen_reg_rtx (DImode);
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  emit_insn (gen_divdi3 (op5, op3, op4));
+  if (TARGET_POWERPC64)
+emit_insn (gen_divdi3 (op5, op3, op4));
+  else
+{
+  rtx libfunc = optab_libfunc (sdiv_optab, DImode);
+  rtx target = emit_library_call_value (libfunc,
+   op5, LCT_NORMAL, DImode,
+   op3, DImode,
+   op4, DImode);
+  emit_move_insn (op5, target);
+}
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1)));
-  emit_insn (gen_divdi3 (op3, op3, op4));
+  if (TARGET_POWERPC64)
+emit_insn (gen_divdi3 (op3, op3, op4));
+  else
+{
+  rtx libfunc = optab_libfunc (sdiv_optab, DImode);
+  rtx target = emit_library_call_value (libfunc,
+   op3, LCT_NORMAL, DImode,
+   op3, DImode,
+   op4, DImode);
+  emit_move_insn (op3, target);
+}
   emit_insn (gen_vsx_concat_v2di (op0, op5, op3));
   DONE;
 }"
@@ -1716,10 +1748,30 @@
   rtx op5 = gen_reg_rtx (DImode);
   emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0)));
   emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0)));
-  emit_insn (gen_udivdi3 (op5, op3, op4));
+  if (TARGET_POWERPC64)
+emit_insn (gen_udivdi3 (op5, op3, op4));
+  else
+{
+  rtx libfunc = optab_libfunc (udiv_optab, DImode);
+  rtx target = emit_library_call_value (libfunc,
+   op5, LCT_NORMAL, DImode,
+   op3, DImode,
+   

Re: PATCH to fix ICE with -Wstringop-overflow and VLA (PR tree-optimization/84238)

2018-02-08 Thread Jakub Jelinek
On Tue, Feb 06, 2018 at 05:14:16PM +0100, Marek Polacek wrote:
> Here we ICE because get_range_strlen's result might not be an array of
> two integer constants -- for a VLA the array might contain a non-constant.
> So beef up the check before converting to wide_int.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2018-02-06  Marek Polacek  
> 
>   PR tree-optimization/84238
>   * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Verify the result of
>   get_range_strlen.
> 
>   * gcc.dg/Wstringop-overflow-3.c: New test.

Ok, thanks.

Jakub


Re: C++ PATCH to fix ICE with vector expr folding (PR c++/83659)

2018-02-08 Thread Jakub Jelinek
On Wed, Feb 07, 2018 at 04:21:43PM -0500, Jason Merrill wrote:
> On Wed, Feb 7, 2018 at 4:14 PM, Jakub Jelinek  wrote:
> > On Wed, Feb 07, 2018 at 03:52:39PM -0500, Jason Merrill wrote:
> >> > E.g. the constexpr function uses 
> >> > same_type_ignoring_top_level_qualifiers_p
> >> > instead of == type comparisons, the COMPONENT_REF stuff, ...
> >>
> >> > For poly_* stuff, I think Richard S. wants to introduce it into the FEs 
> >> > at
> >> > some point, but I could be wrong; certainly it hasn't been done yet and
> >> > generally, poly*int seems to be a nightmare to deal with.
> >>
> >> Yes, I understand how we got to this point, but having the functions
> >> diverge because of this guideline seems like a mistake.  And there
> >> seem to be two ways to avoid the divergence: make an exception to the
> >> guideline, or move the function.
> >
> > Functionally, I think the following patch should turn fold_indirect_ref_1
> > to be equivalent to the patched constexpr.c version (with the known
> > documented differences), so if this is the obstackle for the acceptance
> > of the patch, I can test this.
> >
> > Otherwise, I must say I have no idea how to share the code,
> > same_type_ignoring_qualifiers is only a C++ FE function, so the middle-end
> > can't use it even conditionally, and similarly with the TBAA issues.
> 
> Again, can we make an exception and use poly_int in this function
> because it's mirroring a middle-end function?

So like this if it passes bootstrap/regtest?  It is kind of bidirectional
merge of changes between the 2 functions, except for intentional differences
(e.g. the same_type_ignoring_top_level_qualifiers_p vs. ==, in_gimple_form
stuff in fold-const.c, the C++ specific empty class etc. handling in
constexpr.c etc.).

2018-01-26  Marek Polacek  
Jakub Jelinek  

PR c++/83659
* fold-const.c (fold_indirect_ref_1): Use VECTOR_TYPE_P macro.
Formatting fixes.  Verify first that tree_fits_poly_int64_p (op01).
Sync some changes from cxx_fold_indirect_ref.

* constexpr.c (cxx_fold_indirect_ref): Sync some changes from
fold_indirect_ref_1, including poly_*int64.  Verify first that
tree_fits_poly_int64_p (op01).  Formatting fixes.

* g++.dg/torture/pr83659.C: New test.

--- gcc/fold-const.c.jj 2018-01-26 12:43:23.140922419 +0100
+++ gcc/fold-const.c2018-02-08 12:43:50.654727317 +0100
@@ -14115,6 +14115,7 @@ fold_indirect_ref_1 (location_t loc, tre
 {
   tree op = TREE_OPERAND (sub, 0);
   tree optype = TREE_TYPE (op);
+
   /* *_DECL -> to the value of the const decl.  */
   if (TREE_CODE (op) == CONST_DECL)
return DECL_INITIAL (op);
@@ -14148,12 +14149,13 @@ fold_indirect_ref_1 (location_t loc, tre
   && type == TREE_TYPE (optype))
return fold_build1_loc (loc, REALPART_EXPR, type, op);
   /* *(foo *) => BIT_FIELD_REF */
-  else if (TREE_CODE (optype) == VECTOR_TYPE
+  else if (VECTOR_TYPE_P (optype)
   && type == TREE_TYPE (optype))
{
  tree part_width = TYPE_SIZE (type);
  tree index = bitsize_int (0);
- return fold_build3_loc (loc, BIT_FIELD_REF, type, op, part_width, 
index);
+ return fold_build3_loc (loc, BIT_FIELD_REF, type, op, part_width,
+ index);
}
 }
 
@@ -14171,8 +14173,17 @@ fold_indirect_ref_1 (location_t loc, tre
  op00type = TREE_TYPE (op00);
 
  /* ((foo*))[1] => BIT_FIELD_REF */
- if (TREE_CODE (op00type) == VECTOR_TYPE
- && type == TREE_TYPE (op00type))
+ if (VECTOR_TYPE_P (op00type)
+ && type == TREE_TYPE (op00type)
+ /* POINTER_PLUS_EXPR second operand is sizetype, unsigned,
+but we want to treat offsets with MSB set as negative.
+For the code below negative offsets are invalid and
+TYPE_SIZE of the element is something unsigned, so
+check whether op01 fits into poly_int64, which implies
+it is from 0 to INTTYPE_MAXIMUM (HOST_WIDE_INT), and
+then just use poly_uint64 because we want to treat the
+value as unsigned.  */
+ && tree_fits_poly_int64_p (op01))
{
  tree part_width = TYPE_SIZE (type);
  poly_uint64 max_offset
@@ -14199,16 +14210,16 @@ fold_indirect_ref_1 (location_t loc, tre
   && type == TREE_TYPE (op00type))
{
  tree type_domain = TYPE_DOMAIN (op00type);
- tree min = size_zero_node;
+ tree min_val = size_zero_node;
  if (type_domain && TYPE_MIN_VALUE (type_domain))
-   min = TYPE_MIN_VALUE (type_domain);
+   min_val = TYPE_MIN_VALUE (type_domain);
  offset_int off = wi::to_offset (op01);
  offset_int 

Re: RFA: Sanitize deprecation messages (PR 84195)

2018-02-08 Thread Martin Sebor

On 02/08/2018 04:04 AM, Nick Clifton wrote:

Hi David,


+ /* PR 84195: Replace control characters in the message with their
+escaped equivalents.  Allow newlines if -fmessage-length has
+been set to a non-zero value.


I'm not quite sure why we allow newlines in this case, sorry.


Because the documentation for -fmessage-length says:

  Try to format error messages so that they fit on lines
  of about N characters.  If N is zero, then no
  line-wrapping is done; each error message appears on a
  single line.  This is the default for all front ends.

So with a non-zero message length, multi-line messages are allowed.

At least that was my understanding of the option.


It would be helpful to mention this somehow in the documentation
of each of the #-directives (i.e., that control characters are
escaped including newlines, subject to -fmessage-length).
Unless you want to handle that as part of the patch I'll see
about submitting a docs only change for the affected bits once
these bits have been committed.

Martin



Thanks for the patch review.  I will get onto fixing the points you
raised today.

Cheers
  Nick





[PR c++/84263] GC ICE with decltype

2018-02-08 Thread Nathan Sidwell
This patch fixes 84263, a GC breakage entirely unconnected with the 
patch of mine that exposed it.  I guess I perturbed the memory layout 
sufficiently to tickle it -- on a 32bit host.


The underlying problem is that decltype parsing has to squirrel away 
some data in the same manner to template-ids.  But it was failing to 
push and pop the deferring access stack.  That meant it stashed its own 
deferred accesses, but they also remained on the stack.  A subsequent 
deferred access could reallocate the vector of accesses there, if a 
subsequent access needed deferring.  That reallocation proceeds by 
gc_alloc/gc_free.  The gc_free frees the original set, which is what we 
had stashed away.  All is well until we try and GC things, with that 
token still live.  Boom!


Fixed by pushing and popping the access stack.  I also reorganized the 
code to use the modern construct of 'if ... else', rather the 
assembler-level use of a goto!


Martin, I've bootstrapped this on an x86_64 native setup, but i686 
native is tricky.  Are you able to give this a spin?  I modified the 
testcase slightly -- replacing an 'int' with '__SIZE_TYPE__'.


nathan

--
Nathan Sidwell
2018-02-08  Nathan Sidwell  

	PR c++/84263
	* parser.c (cp_parser_decltype): Push and pop
	deferring_access_checks.  Reorganize to avoid goto.

	* g++.dg/parse/pr84263.C: New.

Index: cp/parser.c
===
--- cp/parser.c	(revision 257496)
+++ cp/parser.c	(working copy)
@@ -14049,12 +14049,7 @@ cp_parser_decltype_expr (cp_parser *pars
 static tree
 cp_parser_decltype (cp_parser *parser)
 {
-  tree expr;
   bool id_expression_or_member_access_p = false;
-  const char *saved_message;
-  bool saved_integral_constant_expression_p;
-  bool saved_non_integral_constant_expression_p;
-  bool saved_greater_than_is_operator_p;
   cp_token *start_token = cp_lexer_peek_token (parser->lexer);
 
   if (start_token->type == CPP_DECLTYPE)
@@ -14073,77 +14068,83 @@ cp_parser_decltype (cp_parser *parser)
   if (!parens.require_open (parser))
 return error_mark_node;
 
-  /* decltype (auto) */
+  push_deferring_access_checks (dk_deferred);
+
+  tree expr = NULL_TREE;
+  
   if (cxx_dialect >= cxx14
   && cp_lexer_next_token_is_keyword (parser->lexer, RID_AUTO))
+/* decltype (auto) */
+cp_lexer_consume_token (parser->lexer);
+  else
 {
-  cp_lexer_consume_token (parser->lexer);
-  if (!parens.require_close (parser))
-	return error_mark_node;
-  expr = make_decltype_auto ();
-  AUTO_IS_DECLTYPE (expr) = true;
-  goto rewrite;
-}
+  /* decltype (expression)  */
 
-  /* Types cannot be defined in a `decltype' expression.  Save away the
- old message.  */
-  saved_message = parser->type_definition_forbidden_message;
-
-  /* And create the new one.  */
-  parser->type_definition_forbidden_message
-= G_("types may not be defined in % expressions");
-
-  /* The restrictions on constant-expressions do not apply inside
- decltype expressions.  */
-  saved_integral_constant_expression_p
-= parser->integral_constant_expression_p;
-  saved_non_integral_constant_expression_p
-= parser->non_integral_constant_expression_p;
-  parser->integral_constant_expression_p = false;
-
-  /* Within a parenthesized expression, a `>' token is always
- the greater-than operator.  */
-  saved_greater_than_is_operator_p
-= parser->greater_than_is_operator_p;
-  parser->greater_than_is_operator_p = true;
-
-  /* Do not actually evaluate the expression.  */
-  ++cp_unevaluated_operand;
-
-  /* Do not warn about problems with the expression.  */
-  ++c_inhibit_evaluation_warnings;
-
-  expr = cp_parser_decltype_expr (parser, id_expression_or_member_access_p);
-
-  /* Go back to evaluating expressions.  */
-  --cp_unevaluated_operand;
-  --c_inhibit_evaluation_warnings;
-
-  /* The `>' token might be the end of a template-id or
- template-parameter-list now.  */
-  parser->greater_than_is_operator_p
-= saved_greater_than_is_operator_p;
-
-  /* Restore the old message and the integral constant expression
- flags.  */
-  parser->type_definition_forbidden_message = saved_message;
-  parser->integral_constant_expression_p
-= saved_integral_constant_expression_p;
-  parser->non_integral_constant_expression_p
-= saved_non_integral_constant_expression_p;
+  /* Types cannot be defined in a `decltype' expression.  Save away the
+	 old message and set the new one.  */
+  const char *saved_message = parser->type_definition_forbidden_message;
+  parser->type_definition_forbidden_message
+	= G_("types may not be defined in % expressions");
+
+  /* The restrictions on constant-expressions do not apply inside
+	 decltype expressions.  */
+  bool saved_integral_constant_expression_p
+	= parser->integral_constant_expression_p;
+  bool saved_non_integral_constant_expression_p
+	= 

[C++ PATCH] initializer_list diagnostic

2018-02-08 Thread Nathan Sidwell
when working on 84263 I noticed this initializer_list diagnostic wasn't 
correctly formatted.  Fixing thusly.


nathan
--
Nathan Sidwell
2018-02-08  Nathan Sidwell  

	* class.c (finish_struct): Fix std:initializer_list diagnostic
	formatting.

	* g++.dg/cpp0x/initlist93.C: Adjust diagnostic.

Index: cp/class.c
===
--- cp/class.c	(revision 257482)
+++ cp/class.c	(working copy)
@@ -7062,7 +7062,7 @@ finish_struct (tree t, tree attributes)
   /* People keep complaining that the compiler crashes on an invalid
 	 definition of initializer_list, so I guess we should explicitly
 	 reject it.  What the compiler internals care about is that it's a
-	 template and has a pointer field followed by an integer field.  */
+	 template and has a pointer field followed by size_type field.  */
   bool ok = false;
   if (processing_template_decl)
 	{
@@ -7075,9 +7075,8 @@ finish_struct (tree t, tree attributes)
 	}
 	}
   if (!ok)
-	fatal_error (input_location,
-		 "definition of std::initializer_list does not match "
-		 "#include ");
+	fatal_error (input_location, "definition of %qD does not match "
+		 "%<#include %>", TYPE_NAME (t));
 }
 
   input_location = saved_loc;
Index: testsuite/g++.dg/cpp0x/initlist93.C
===
--- testsuite/g++.dg/cpp0x/initlist93.C	(revision 257482)
+++ testsuite/g++.dg/cpp0x/initlist93.C	(working copy)
@@ -3,7 +3,7 @@
 
 namespace std
 {
-template  class initializer_list // { dg-error "definition of std::initializer_list does not match" }
+template  class initializer_list // { dg-error "definition of .*std::initializer_list.* does not match" }
 {
   int *_M_array;
   int _M_len;


Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views

2018-02-08 Thread Jason Merrill
On Thu, Feb 8, 2018 at 7:56 AM, Alexandre Oliva  wrote:
> On Feb  7, 2018, Jason Merrill  wrote:
>
>> OK, that makes sense.  But I'm still uncomfortable with choosing an
>> existing opcode for that purpose, which previously would have been
>> chosen just for reasons of encoding complexity and size.
>
> Well, there's a good reason we didn't used to output this opcode: it's
> nearly always the case that you're better off using a special opcode or
> DW_LNS_advance_pc, that encodes the offset as uleb128 instead of a fixed
> size.  The only exceptions I can think of are offsets that have the most
> significant bits set in the representable range for
> DW_LNS_fixed_advance_pc (the uleb128 representation for
> DW_LNS_advance_pc would end up taking an extra byte if insns don't get
> more than byte alignment), and VLIW machines, in which the
> DW_LNS_advance_pc operand needs to be multiplied by the ops-per-insns
> (but also divided by the min-insn-length).  So, unless you're creating a
> gap of 16KiB to 64KiB in the middle of a function on an ISA such as
> x86*, that has insns as small as 1 byte, you'll only use
> DW_LNS_fixed_advance_pc when the assembler can't encode uleb128 offsets,
> as stated in the DWARF specs.

Which is often true of non-gas assemblers, isn't it?

> Well, now there's one more case for using
> it, and it's a rare one as well.  I didn't think it made sense to add
> yet another opcode with a fixed-size operand for that.

So, if I've got this right: The most conservative approach to updating
the address is DW_LNE_set_address, but we definitely want that to
reset the view because it's used for e.g. starting a new function.
And if it resets the view, we need to be careful not to use it more
than once for the same address.  Special opcodes are only generated by
the assembler from .loc directives, so we use symbolic views and let
the assembler decide whether or not they reset the view.  If we don't
have that assembler support, and don't want to use DW_LNE_set_address,
that leaves DW_LNS_advance_pc and DW_LNS_fixed_advance_pc.  The former
could be used by the assembler in cases when special opcodes can't
express the necessary change compactly, whereas the latter won't be,
so it's a better choice for this situation.  And since we don't know
whether the increment will be zero, we don't want it to reset the
view.

OK, that makes sense.  Though I expect this will come up again when
the DWARF committee looks at the proposal.

> But then again, even it was an opcode used more often, it wouldn't be a
> significant problem to assign it the special behavior of not resetting
> the view counter.  Views don't have to be reset for the whole thing to
> work, we just need some means for the compiler (who doesn't know all
> offsets) and debug info consumers (who do) to keep view numbers in sync.
> A single opcode for the compiler to signal to the consumer that it
> wasn't sure a smallish offset would be nonzero is enough, and
> DW_LNS_fixed_advance_pc provides us with just what we need, without any
> complications of having to compute a special opcode, or compute a
> compiler-unknown offset times min-insn-length and have that encoded in
> uleb128.
>
>> Thanks, it would be good to have this overview in a comment somewhere.
>
> You meant just these two paragraphs (like below), or the whole thing?

I meant just the two paragraphs, thanks.

Jason


RFA: Fix PR 68028: LTO error when compiling PowerPC binaries with single precision floating point

2018-02-08 Thread Nick Clifton
Hi Segher,

  OK, here is an official submission of my patch to fix PR 68028.

  I should note that Richard Guenther feels that there is a better way
  to solve the problem - by only initializing the values once - but I
  still like my solution, so I am offering it here.

  OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2018-02-08  Nick Clifton  

* config/rs6000/rs6000.c (rs6000_option_override_internal):
In LTO mode prefer function attributes over command line -mcpu
setting.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 257282)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -4834,12 +4834,25 @@
 
   if (main_target_opt)
 {
-  if (main_target_opt->x_rs6000_single_float != rs6000_single_float)
-   error ("target attribute or pragma changes single precision floating "
-  "point");
-  if (main_target_opt->x_rs6000_double_float != rs6000_double_float)
-   error ("target attribute or pragma changes double precision floating "
-  "point");
+  /* PR 68028: In LTO mode the -mcpu value is passed in as a function
+attribute rather than on the command line.  So instead of checking
+for discrepancioes, we enforce the choice determined by the
+attributes.  */
+if (in_lto_p)
+  {
+rs6000_single_float = main_target_opt->x_rs6000_single_float;
+rs6000_double_float = main_target_opt->x_rs6000_double_float;
+  }
+/* There could be an 'else' statement here but it is hardly worth
+   it as the compiler will make the optimization anyway, and this
+   way we avoid indenting the code unnecessarily.  */
+
+if (main_target_opt->x_rs6000_single_float != rs6000_single_float)
+  error ("target attribute or pragma changes single precision floating 
"
+ "point");
+if (main_target_opt->x_rs6000_double_float != rs6000_double_float)
+  error ("target attribute or pragma changes double precision floating 
"
+ "point");
 }
 
   rs6000_always_hint = (rs6000_tune != PROCESSOR_POWER4


libgo patch committed: Get missing function names from symbol table

2018-02-08 Thread Ian Lance Taylor
This libgo patch changes the traceback code to get missing function
name from the symbol table.  If we trace back through code that has no
debug info, as when calling through C code compiled with -g0, we won't
have a function name.  Try to fetch the function name using the symbol
table.

Adding the test case revealed that gotest failed to use the gccgo tag
when matching files, so add that.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 257493)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-2aa95f1499cf931ef8e95c7958463829276a0f2c
+7e94bac5676afc8188677c98ecb263c78c1a7f8d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/crash_gccgo_test.go
===
--- libgo/go/runtime/crash_gccgo_test.go(nonexistent)
+++ libgo/go/runtime/crash_gccgo_test.go(working copy)
@@ -0,0 +1,59 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build cgo,gccgo
+
+package runtime_test
+
+import (
+   "bytes"
+   "fmt"
+   "internal/testenv"
+   "os"
+   "os/exec"
+   "strings"
+   "testing"
+)
+
+func TestGccgoCrashTraceback(t *testing.T) {
+   t.Parallel()
+   got := runTestProg(t, "testprogcgo", "CrashTracebackGccgo")
+   ok := true
+   for i := 1; i <= 3; i++ {
+   if !strings.Contains(got, fmt.Sprintf("CFunction%d", i)) {
+   t.Errorf("missing C function CFunction%d", i)
+   ok = false
+   }
+   }
+   if !ok {
+   t.Log(got)
+   }
+}
+
+func TestGccgoCrashTracebackNodebug(t *testing.T) {
+   testenv.MustHaveGoBuild(t)
+   if os.Getenv("CC") == "" {
+   t.Skip("no compiler in environment")
+   }
+
+   cc := strings.Fields(os.Getenv("CC"))
+   cc = append(cc, "-x", "c++", "-")
+   out, _ := exec.Command(cc[0], cc[1:]...).CombinedOutput()
+   if bytes.Contains(out, []byte("error trying to exec 'cc1plus'")) {
+   t.Skip("no C++ compiler")
+   }
+   os.Setenv("CXX", os.Getenv("CC"))
+
+   got := runTestProg(t, "testprogcxx", "CrashTracebackNodebug")
+   ok := true
+   for i := 1; i <= 3; i++ {
+   if !strings.Contains(got, fmt.Sprintf("cxxFunction%d", i)) {
+   t.Errorf("missing C++ function cxxFunction%d", i)
+   ok = false
+   }
+   }
+   if !ok {
+   t.Log(got)
+   }
+}
Index: libgo/go/runtime/testdata/testprogcgo/traceback_gccgo.go
===
--- libgo/go/runtime/testdata/testprogcgo/traceback_gccgo.go(nonexistent)
+++ libgo/go/runtime/testdata/testprogcgo/traceback_gccgo.go(working copy)
@@ -0,0 +1,40 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build gccgo
+
+package main
+
+// This program will crash.
+// We want the stack trace to include the C functions.
+
+/*
+#cgo CFLAGS: -g -O0
+
+#include 
+
+char *p;
+
+static int CFunction3(void) {
+   *p = 0;
+   return 0;
+}
+
+static int CFunction2(void) {
+   return CFunction3();
+}
+
+static int CFunction1(void) {
+   return CFunction2();
+}
+*/
+import "C"
+
+func init() {
+   register("CrashTracebackGccgo", CrashTracebackGccgo)
+}
+
+func CrashTracebackGccgo() {
+   C.CFunction1()
+}
Index: libgo/go/runtime/testdata/testprogcxx/main.go
===
--- libgo/go/runtime/testdata/testprogcxx/main.go   (nonexistent)
+++ libgo/go/runtime/testdata/testprogcxx/main.go   (working copy)
@@ -0,0 +1,35 @@
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+import "os"
+
+var cmds = map[string]func(){}
+
+func register(name string, f func()) {
+   if cmds[name] != nil {
+   panic("duplicate registration: " + name)
+   }
+   cmds[name] = f
+}
+
+func registerInit(name string, f func()) {
+   if len(os.Args) >= 2 && os.Args[1] == name {
+   f()
+   }
+}
+
+func main() {
+   if len(os.Args) < 2 {
+   println("usage: " + os.Args[0] + " name-of-test")
+   return
+   }
+   f := cmds[os.Args[1]]
+   if f == nil {
+   println("unknown function: " + os.Args[1])
+   return
+   }
+   f()
+}
Index: 

gotools patch committed: Add options to permit tests to use C++

2018-02-08 Thread Ian Lance Taylor
I have committed this patch to gotools to add compiler options to
allow the compilers under test to pick up the C++ library.  This
permits tests to use C++ (linked into Go programs) which I used for a
test that I will commit shortly.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian


2018-02-08  Ian Lance Taylor  

* Makefile.am (check-gccgo, check-gcc): Add options to pick up
target libstdc++, to permit tests that use C++.
* Makefile.in: Rebuild.
Index: Makefile.am
===
--- Makefile.am (revision 257493)
+++ Makefile.am (working copy)
@@ -179,22 +179,22 @@ check-head:
@echo >> gotools.head
 
 # check-gccgo is a little shell script that executes gccgo with the
-# options to pick up the newly built libgo.
+# options to pick up the newly built libgo and libstdc++.
 check-gccgo: Makefile
rm -f $@ $@.tmp
echo "#!/bin/sh" > $@.tmp
abs_libgodir=`cd $(libgodir) && $(PWD_COMMAND)`; \
-   echo "$(GOCOMPILE)" '"$$@"' "-I $${abs_libgodir} -L $${abs_libgodir} -L 
$${abs_libgodir}/.libs" >> $@.tmp
+   echo "$(GOCOMPILE)" '"$$@"' "-I $${abs_libgodir} -L $${abs_libgodir} -L 
$${abs_libgodir}/.libs -B$${abs_libgodir}/../libstdc++-v3/src/.libs 
-B$${abs_libgodir}/../libstdc++-v3/libsupc++/.libs" >> $@.tmp
chmod +x $@.tmp
mv -f $@.tmp $@
 
 # check-gcc is a little shell script that executes the newly built gcc
-# with the options to pick up the newly built libgo.
+# with the options to pick up the newly built libgo and libstdc++.
 check-gcc: Makefile
rm -f $@ $@.tmp
echo "#!/bin/sh" > $@.tmp
abs_libgodir=`cd $(libgodir) && $(PWD_COMMAND)`; \
-   echo "$(GCC_FOR_TARGET)" '"$$@"' "-L $${abs_libgodir} -L 
$${abs_libgodir}/.libs" >> $@.tmp
+   echo "$(GCC_FOR_TARGET)" '"$$@"' "-L $${abs_libgodir} -L 
$${abs_libgodir}/.libs -B$${abs_libgodir}/../libstdc++-v3/src/.libs 
-B$${abs_libgodir}/../libstdc++-v3/libsupc++/.libs" >> $@.tmp
chmod +x $@.tmp
mv -f $@.tmp $@
 


Re: [RFC][PATCH] Stabilize a few qsort comparison functions

2018-02-08 Thread Martin Jambor
On Wed, Feb 07 2018, Franz Sirl wrote:
> Hi,
>
> this is the result of an attempt to minimize the differences between the
> compile results of a Linux-based and a Cygwin64-based powerpc-eabi cross
> toolchain.
> The method used was:
>
>      - find the -fverbose-asm assembler files that differ
>      - compile that file again on both platforms with
>     -O2 -g3 -fdump-tree-all-all -fdump-rtl-all -fdump-noaddr
>      - look for the first dump file with differences and check that pass
>    for qsort's
>      - stabilize the compare functions
>
> With some help on IRC to better understand the passes and some serious
> debugging of GCC I came up with this patch. On the tested codebase the
> differences in the assembler sources are now down to 0.
> If the various pass maintainers have better ideas on how to stabilize
> the compare functions, I'll be happy to verify them on the codebase.
> For the SRA patch I already have an alternate version with an additional
> ID member.
>
> Comments?

As I said on IRC, if you find this useful, I'm fine with having the SRA
hunk (but note that I cannot approve it).  In any event, IMHO this is
stage 1 material, however.

Thanks,

Martin


>
> Bootstrapped on linux-x86_64, no testsuite regressions.
>
> Franz Sirl
>
>
> 2018-02-07  Franz Sirl 
>
>      * ira-build.c (object_range_compare_func): Stabilize sort.
>      * tree-sra.c (compare_access_positions): Likewise.
>      * varasm.c (output_object_block_compare): Likewise.
>      * tree-ssa-loop-ivopts.c (group_compare_offset): Likewise.
>      (struct iv_common_cand): New member.
>      (record_common_cand): Initialize new member.
>      (common_cand_cmp): Use new member to stabilize sort.
>      * tree-vrp.c (struct assert_locus): New member.
>      (register_new_assert_for): Initialize new member.
>      (compare_assert_loc): Use new member to stabilize sort.
>


libgo patch committed: Update to 1.10rc2

2018-02-08 Thread Ian Lance Taylor
I committed a patch to update libgo to 1.10rc2.  This includes a fix
for a security problem in `go get` based on GCC and linker plugins.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 257463)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-cdc28627b7abfd73f5d552813db8eb4293b823b0
+2aa95f1499cf931ef8e95c7958463829276a0f2c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/MERGE
===
--- libgo/MERGE (revision 257415)
+++ libgo/MERGE (working copy)
@@ -1,4 +1,4 @@
-5348aed83e39bd1d450d92d7f627e994c2db6ebf
+20e228f2fdb44350c858de941dff4aea9f3127b8
 
 The first line of this file holds the git revision number of the
 last merge done from the master library sources.
Index: libgo/VERSION
===
--- libgo/VERSION   (revision 257415)
+++ libgo/VERSION   (working copy)
@@ -1 +1 @@
-go1.10rc1
+go1.10rc2
Index: libgo/go/cmd/cgo/doc.go
===
--- libgo/go/cmd/cgo/doc.go (revision 257415)
+++ libgo/go/cmd/cgo/doc.go (working copy)
@@ -45,8 +45,8 @@ For example:
// #include 
import "C"
 
-Alternatively, CPPFLAGS and LDFLAGS may be obtained via the pkg-config
-tool using a '#cgo pkg-config:' directive followed by the package names.
+Alternatively, CPPFLAGS and LDFLAGS may be obtained via the pkg-config tool
+using a '#cgo pkg-config:' directive followed by the package names.
 For example:
 
// #cgo pkg-config: png cairo
@@ -55,11 +55,21 @@ For example:
 
 The default pkg-config tool may be changed by setting the PKG_CONFIG 
environment variable.
 
+For security reasons, only a limited set of flags are allowed, notably -D, -I, 
and -l.
+To allow additional flags, set CGO_CFLAGS_ALLOW to a regular expression
+matching the new flags. To disallow flags that would otherwise be allowed,
+set CGO_CFLAGS_DISALLOW to a regular expression matching arguments
+that must be disallowed. In both cases the regular expression must match
+a full argument: to allow -mfoo=bar, use CGO_CFLAGS_ALLOW='-mfoo.*',
+not just CGO_CFLAGS_ALLOW='-mfoo'. Similarly named variables control
+the allowed CPPFLAGS, CXXFLAGS, FFLAGS, and LDFLAGS.
+
 When building, the CGO_CFLAGS, CGO_CPPFLAGS, CGO_CXXFLAGS, CGO_FFLAGS and
 CGO_LDFLAGS environment variables are added to the flags derived from
 these directives. Package-specific flags should be set using the
 directives, not the environment variables, so that builds work in
-unmodified environments.
+unmodified environments. Flags obtained from environment variables
+are not subject to the security limitations described above.
 
 All the cgo CPPFLAGS and CFLAGS directives in a package are concatenated and
 used to compile C files in that package. All the CPPFLAGS and CXXFLAGS
Index: libgo/go/cmd/cgo/gcc.go
===
--- libgo/go/cmd/cgo/gcc.go (revision 257415)
+++ libgo/go/cmd/cgo/gcc.go (working copy)
@@ -2345,12 +2345,6 @@ func (c *typeConv) FuncArg(dtype dwarf.T
break
}
 
-   // If we already know the typedef for t just use that.
-   // See issue 19832.
-   if def := typedef[t.Go.(*ast.Ident).Name]; def != nil {
-   break
-   }
-
t = c.Type(ptr, pos)
if t == nil {
return nil
Index: libgo/go/cmd/go/alldocs.go
===
--- libgo/go/cmd/go/alldocs.go  (revision 257415)
+++ libgo/go/cmd/go/alldocs.go  (working copy)
@@ -1227,17 +1227,26 @@
 // CGO_CFLAGS
 // Flags that cgo will pass to the compiler when compiling
 // C code.
-// CGO_CPPFLAGS
-// Flags that cgo will pass to the compiler when compiling
-// C or C++ code.
-// CGO_CXXFLAGS
-// Flags that cgo will pass to the compiler when compiling
-// C++ code.
-// CGO_FFLAGS
-// Flags that cgo will pass to the compiler when compiling
-// Fortran code.
-// CGO_LDFLAGS
-// Flags that cgo will pass to the compiler when linking.
+// CGO_CFLAGS_ALLOW
+// A regular expression specifying additional flags to allow
+// to appear in #cgo CFLAGS source code directives.
+// Does not apply to the CGO_CFLAGS environment variable.
+// CGO_CFLAGS_DISALLOW
+// A regular expression specifying flags that must be disallowed
+// from 

Re: Another fix for single-element permutes (PR 84265)

2018-02-08 Thread Richard Biener
On Thu, Feb 8, 2018 at 2:29 PM, Richard Sandiford
 wrote:
> PR83753 was about a case in which we ended up trying to "vectorise"
> a group of loads ore stores using single-element vectors.  The problem
> was that we were classifying the load or store as VMAT_CONTIGUOUS_PERMUTE
> rather than VMAT_CONTIGUOUS, even though it doesn't make sense to permute
> a single-element vector.
>
> In that PR it was enough to change get_group_load_store_type,
> because vectorisation ended up being unprofitable and so we didn't
> take things further.  But when vectorisation is profitable, the same
> fix is needed in vectorizable_load and vectorizable_store.
>
> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
> OK to install?

OK.

Richard.

> Richard
>
>
> 2018-02-08  Richard Sandiford  
>
> gcc/
> PR tree-optimization/84265
> * tree-vect-stmts.c (vectorizable_store): Don't treat
> VMAT_CONTIGUOUS accesses as grouped.
> (vectorizable_load): Likewise.
>
> gcc/testsuite/
> PR tree-optimization/84265
> * gcc.dg/vect/pr84265.c: New test.
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2018-01-30 09:45:27.710764075 +
> +++ gcc/tree-vect-stmts.c   2018-02-08 13:26:39.242566948 +
> @@ -6214,7 +6214,8 @@ vectorizable_store (gimple *stmt, gimple
>  }
>
>grouped_store = (STMT_VINFO_GROUPED_ACCESS (stmt_info)
> -  && memory_access_type != VMAT_GATHER_SCATTER);
> +  && memory_access_type != VMAT_GATHER_SCATTER
> +  && (slp || memory_access_type != VMAT_CONTIGUOUS));
>if (grouped_store)
>  {
>first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
> @@ -7696,7 +7697,8 @@ vectorizable_load (gimple *stmt, gimple_
>return true;
>  }
>
> -  if (memory_access_type == VMAT_GATHER_SCATTER)
> +  if (memory_access_type == VMAT_GATHER_SCATTER
> +  || (!slp && memory_access_type == VMAT_CONTIGUOUS))
>  grouped_load = false;
>
>if (grouped_load)
> Index: gcc/testsuite/gcc.dg/vect/pr84265.c
> ===
> --- /dev/null   2018-02-08 11:17:10.862716283 +
> +++ gcc/testsuite/gcc.dg/vect/pr84265.c 2018-02-08 13:26:39.240567025 +
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +
> +struct a
> +{
> +  unsigned long b;
> +  unsigned long c;
> +  int d;
> +  int *e;
> +  char f;
> +};
> +
> +struct
> +{
> +  int g;
> +  struct a h[];
> +} i;
> +
> +int j, k;
> +void l ()
> +{
> +  for (; k; k++)
> +j += (int) (i.h[k].c - i.h[k].b);
> +}


Re: Use nonzero bits to refine range in split_constant_offset (PR 81635)

2018-02-08 Thread Richard Biener
On Thu, Feb 8, 2018 at 1:09 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Fri, Feb 2, 2018 at 3:12 PM, Richard Sandiford
>>  wrote:
>>> Index: gcc/tree-data-ref.c
>>> ===
>>> --- gcc/tree-data-ref.c 2018-02-02 14:03:53.964530009 +
>>> +++ gcc/tree-data-ref.c 2018-02-02 14:03:54.184521826 +
>>> @@ -721,7 +721,13 @@ split_constant_offset_1 (tree type, tree
>>> if (TREE_CODE (tmp_var) != SSA_NAME)
>>>   return false;
>>> wide_int var_min, var_max;
>>> -   if (get_range_info (tmp_var, _min, _max) != 
>>> VR_RANGE)
>>> +   value_range_type vr_type = get_range_info (tmp_var, 
>>> _min,
>>> +  _max);
>>> +   wide_int var_nonzero = get_nonzero_bits (tmp_var);
>>> +   signop sgn = TYPE_SIGN (itype);
>>> +   if (intersect_range_with_nonzero_bits (vr_type, _min,
>>> +  _max, 
>>> var_nonzero,
>>> +  sgn) != VR_RANGE)
>>
>> Above it looks like we could go from VR_RANGE to VR_UNDEFINED.
>> I'm not sure if the original range-info might be useful in this case -
>> if it may be
>> can we simply use only the range info if it was VR_RANGE?
>
> I think we only drop to VR_UNDEFINED if we have contradictory
> information: nonzero bits says some bits must be clear, but the range
> only contains values for which the bits are set.  In that case I think
> we should either be conservative and not use the information, or be
> aggressive and say that we have undefined behaviour, so overflow is OK.
>
> It seems a bit of a fudge to go back to the old range when we know it's
> false, and use it to allow the split some times and not others.

Fine.

> Thanks,
> Richard
>
>>
>> Ok otherwise.
>> Thanks,
>> Richard.
>>
>>>   return false;
>>>
>>> /* See whether the range of OP0 (i.e. TMP_VAR + TMP_OFF)
>>> @@ -729,7 +735,6 @@ split_constant_offset_1 (tree type, tree
>>>operations done in ITYPE.  The addition must overflow
>>>at both ends of the range or at neither.  */
>>> bool overflow[2];
>>> -   signop sgn = TYPE_SIGN (itype);
>>> unsigned int prec = TYPE_PRECISION (itype);
>>> wide_int woff = wi::to_wide (tmp_off, prec);
>>> wide_int op0_min = wi::add (var_min, woff, sgn, 
>>> [0]);
>>> Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-3.c
>>> ===
>>> --- /dev/null   2018-02-02 09:03:36.168354735 +
>>> +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-3.c2018-02-02 
>>> 14:03:54.183521863 +
>>> @@ -0,0 +1,62 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-additional-options "-fno-tree-loop-vectorize" } */
>>> +/* { dg-require-effective-target vect_double } */
>>> +/* { dg-require-effective-target lp64 } */
>>> +
>>> +void
>>> +f1 (double *p, double *q, unsigned int n)
>>> +{
>>> +  p = (double *) __builtin_assume_aligned (p, sizeof (double) * 2);
>>> +  q = (double *) __builtin_assume_aligned (q, sizeof (double) * 2);
>>> +  for (unsigned int i = 0; i < n; i += 4)
>>> +{
>>> +  double a = q[i] + p[i];
>>> +  double b = q[i + 1] + p[i + 1];
>>> +  q[i] = a;
>>> +  q[i + 1] = b;
>>> +}
>>> +}
>>> +
>>> +void
>>> +f2 (double *p, double *q, unsigned int n)
>>> +{
>>> +  p = (double *) __builtin_assume_aligned (p, sizeof (double) * 2);
>>> +  q = (double *) __builtin_assume_aligned (q, sizeof (double) * 2);
>>> +  for (unsigned int i = 0; i < n; i += 2)
>>> +{
>>> +  double a = q[i] + p[i];
>>> +  double b = q[i + 1] + p[i + 1];
>>> +  q[i] = a;
>>> +  q[i + 1] = b;
>>> +}
>>> +}
>>> +
>>> +void
>>> +f3 (double *p, double *q, unsigned int n)
>>> +{
>>> +  p = (double *) __builtin_assume_aligned (p, sizeof (double) * 2);
>>> +  q = (double *) __builtin_assume_aligned (q, sizeof (double) * 2);
>>> +  for (unsigned int i = 0; i < n; i += 6)
>>> +{
>>> +  double a = q[i] + p[i];
>>> +  double b = q[i + 1] + p[i + 1];
>>> +  q[i] = a;
>>> +  q[i + 1] = b;
>>> +}
>>> +}
>>> +
>>> +void
>>> +f4 (double *p, double *q, unsigned int start, unsigned int n)
>>> +{
>>> +  p = (double *) __builtin_assume_aligned (p, sizeof (double) * 2);
>>> +  q = (double *) __builtin_assume_aligned (q, sizeof (double) * 2);
>>> +  for (unsigned int i = start & -2; i < n; i += 2)
>>> +{
>>> +  double a = q[i] + p[i];
>>> +  double b = q[i + 1] + p[i + 1];
>>> +  q[i] = a;
>>> +  q[i + 1] = b;
>>> +}
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 4 "slp1" } } 
>>> */
>>> Index: 

Re: C++ PATCH to fix ICE with vector expr folding (PR c++/83659)

2018-02-08 Thread Jason Merrill
On Thu, Feb 8, 2018 at 6:55 AM, Jakub Jelinek  wrote:
> On Wed, Feb 07, 2018 at 04:21:43PM -0500, Jason Merrill wrote:
>> On Wed, Feb 7, 2018 at 4:14 PM, Jakub Jelinek  wrote:
>> > On Wed, Feb 07, 2018 at 03:52:39PM -0500, Jason Merrill wrote:
>> >> > E.g. the constexpr function uses 
>> >> > same_type_ignoring_top_level_qualifiers_p
>> >> > instead of == type comparisons, the COMPONENT_REF stuff, ...
>> >>
>> >> > For poly_* stuff, I think Richard S. wants to introduce it into the FEs 
>> >> > at
>> >> > some point, but I could be wrong; certainly it hasn't been done yet and
>> >> > generally, poly*int seems to be a nightmare to deal with.
>> >>
>> >> Yes, I understand how we got to this point, but having the functions
>> >> diverge because of this guideline seems like a mistake.  And there
>> >> seem to be two ways to avoid the divergence: make an exception to the
>> >> guideline, or move the function.
>> >
>> > Functionally, I think the following patch should turn fold_indirect_ref_1
>> > to be equivalent to the patched constexpr.c version (with the known
>> > documented differences), so if this is the obstackle for the acceptance
>> > of the patch, I can test this.
>> >
>> > Otherwise, I must say I have no idea how to share the code,
>> > same_type_ignoring_qualifiers is only a C++ FE function, so the middle-end
>> > can't use it even conditionally, and similarly with the TBAA issues.
>>
>> Again, can we make an exception and use poly_int in this function
>> because it's mirroring a middle-end function?
>
> So like this if it passes bootstrap/regtest?  It is kind of bidirectional
> merge of changes between the 2 functions, except for intentional differences
> (e.g. the same_type_ignoring_top_level_qualifiers_p vs. ==, in_gimple_form
> stuff in fold-const.c, the C++ specific empty class etc. handling in
> constexpr.c etc.).

Looks good, thanks.

Jason


Re: PATCH to fix bogus warning with -Wstringop-truncation -g (PR tree-optimization/84228)

2018-02-08 Thread Richard Biener
On Thu, Feb 8, 2018 at 6:35 AM, Jeff Law  wrote:
> On 02/06/2018 05:57 AM, Jakub Jelinek wrote:
>> On Tue, Feb 06, 2018 at 01:46:21PM +0100, Marek Polacek wrote:
>>> --- gcc/testsuite/c-c++-common/Wstringop-truncation-3.c
>>> +++ gcc/testsuite/c-c++-common/Wstringop-truncation-3.c
>>> @@ -0,0 +1,20 @@
>>> +/* PR tree-optimization/84228 */
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-Wstringop-truncation -O2 -g" } */
>>> +
>>> +char *strncpy (char *, const char *, __SIZE_TYPE__);
>>> +struct S
>>> +{
>>> +  char arr[64];
>>> +};
>>> +
>>> +int
>>> +foo (struct S *p1, const char *a)
>>> +{
>>> +  if (a)
>>> +goto err;
>>> +  strncpy (p1->arr, a, sizeof p1->arr); /* { dg-bogus "specified bound" } 
>>> */
>>
>> Just curious, what debug stmt is in between those?
>> Wouldn't it be better to force them a couple of debug stmts?
>> Say
>>   int b = 5, c = 6, d = 7;
>> at the start of the function and
>>   b = 8; c = 9; d = 10;
>> in between strncpy and the '\0' store?
>>
>>> +  p1->arr[3] = '\0';
>>> +err:
>>> +  return 0;
>>> +}
>>> diff --git gcc/tree-ssa-strlen.c gcc/tree-ssa-strlen.c
>>> index c3cf432a921..f0f6535017b 100644
>>> --- gcc/tree-ssa-strlen.c
>>> +++ gcc/tree-ssa-strlen.c
>>> @@ -1849,7 +1849,7 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, 
>>> tree src, tree cnt)
>>>
>>>/* Look for dst[i] = '\0'; after the stxncpy() call and if found
>>>   avoid the truncation warning.  */
>>> -  gsi_next ();
>>> +  gsi_next_nondebug ();
>>>gimple *next_stmt = gsi_stmt (gsi);
>>>
>>>if (!gsi_end_p (gsi) && is_gimple_assign (next_stmt))
>>
>> Ok for trunk, though generally looking at just next stmt is very fragile, 
>> might be
>> better to look at the strncpy's vuse immediate uses if they are within the
>> same basic block and either don't alias with it, or are the store it is
>> looking for, or something similar.
> Martin and I wandered down this approach a bit and ultimately decided
> against it.  While yes it could avoid a false positive by looking at the
> immediate uses, but I'm not sure avoiding the false positive in those
> cases is actually good!
>
> THe larger the separation between the strcpy and the truncation the more
> likely it is that the code is wrong or at least poorly written and
> deserves a developer looksie.

But it's the next _GIMPLE_ stmt it looks at.  Make it

char d[3];

  void f (const char *s, int x)
  {
char d[x];
__builtin_strncpy (d, s, sizeof d);
d[x-1] = 0;
  }

and I bet it will again warn since x-1 is a separate GIMPLE stmt.

The patch is of course ok but still...  simply looking at all
immediate uses of the VDEF of the strncpy call, stopping
at the single stmt with the "next" VDEF should be better
(we don't have a next_vdef () helper).

Richard.

> jeff


Re: [PATCH/RFC] Fix ICE in find_taken_edge_computed_goto (PR 84136)

2018-02-08 Thread Richard Biener
On Thu, Feb 8, 2018 at 6:04 AM, Jeff Law  wrote:
> On 02/02/2018 02:35 PM, David Malcolm wrote:
>> On Thu, 2018-02-01 at 12:05 +0100, Richard Biener wrote:
>>> On Wed, Jan 31, 2018 at 4:39 PM, David Malcolm 
>>> wrote:
 PR 84136 reports an ICE within sccvn_dom_walker when handling a
 C/C++ source file that overuses the labels-as-values extension.
 The code in question stores a jump label into a global, and then
 jumps to it from another function, which ICEs after inlining:

 void* a;

 void foo() {
   if ((a = &))
   return;

   l:;
 }

 int main() {
   foo();
   goto *a;

   return 0;
 }

 This appears to be far beyond what we claim to support in this
 extension - but we shouldn't ICE.

 What's happening is that, after inlining, we have usage of a *copy*
 of the label, which optimizes away the if-return logic, turning it
 into an infinite loop.

 On entry to the sccvn_dom_walker we have this gimple:

 main ()
 {
   void * a.0_1;

[count: 0]:
   a = 

[count: 0]:
 l:
   a.0_1 = a;
   goto a.0_1;
 }

 and:
   edge taken = find_taken_edge (bb, vn_valueize (val));
 reasonably valueizes the:
   goto a.0_1;
 after the:
   a = 
   a.0_1 = a;
 as if it were:
   goto *

 find_taken_edge_computed_goto then has:

 2380  dest = label_to_block (val);
 2381  if (dest)
 2382{
 2383  e = find_edge (bb, dest);
 2384  gcc_assert (e != NULL);
 2385}

 which locates dest as a self-jump from block 3 back to itself.

 However, the find_edge call returns NULL - it has a predecessor
 edge
 from block 2, but no successor edges.

 Hence the assertion fails and we ICE.

 A successor edge from the computed goto could have been created by
 make_edges if the label stmt had been in the function, but
 make_edges
 only looks in the current function when handling computed gotos,
 and
 the label only appeared after inlining.

 The following patch removes the assertion, fixing the ICE.

 Successfully bootstrapped on x86_64-pc-linux-gnu.

 If that's option (a), there could be some other approaches:

 (b) convert the assertion into a warning/error/sorry, on the
 assumption that if we don't detect such an edge then the code
 is
 presumably abusing the labels-as-values feature
 (c) have make_edges detect such a problematic computed goto (maybe
 converting make_edges_bb's return value to an enum and adding a
 4th
 value - though it's not clear what to do then with it)
 (d) detect this case on inlining and handle it somehow (e.g. adding
 edges for labels that have appeared since make_edges originally
 ran, for computed gotos that have no out-edges)
 (e) do nothing, keeping the assertion, and accept that this is
 going
 to fail on a non-release build
 (f) something else?

 Of the above, (d) seems to me to be the most robust solution, but I
 don't know how far we want to go "down the rabbit hole" of handling
 such uses of labels-as-values (beyond not ICE-ing on them).

 Thoughts?
>>>
>>> I think you can preserve the assert for ! DECL_NONLOCAL (val) thus
>>>
>>> gcc_assert (e != NULL || DECL_NONLOCAL (val));
>>>
>>> does the label in this case properly have DECL_NONLOCAL
>>> set?  Probably
>>> not given we shouldn't have duplicated it in this case.
>>
>> Indeed, the inlined copy of the label doesn't have DECL_NONLOCAL set:
>>
>> (gdb) p val->decl_common.nonlocal_flag
>> $5 = 0
>>
>>> So the issue is really
>>> that the FE doesn't set this bit for "escaped" labels... but I'm not
>>> sure how
>>> to easily constrain the extension here.
>>>
>>> The label should be FORCED_LABEL though so that's maybe a weaker
>>> check.
>>
>> It does have FORCED_LABEL set:
>>
>> (gdb) p val->base.side_effects_flag
>> $6 = 1
>>
>> ...though presumably that's going to be set for just about any label
>> that a computed goto jumps to?  Hence this is presumably of little
>> benefit for adjusting the assertion.
> Agreed.

So remove the assert and add a comment in its place explaining the
situation.

OK with that.
Richard.

> jeff


Re: [PATCH] Fix PR81038

2018-02-08 Thread Bill Schmidt

> On Feb 8, 2018, at 4:52 AM, Richard Biener  wrote:
> 
> On Sat, Feb 3, 2018 at 12:30 AM, Bill Schmidt
>  wrote:
>> Hi,
>> 
>> The test g++.dg/vect/slp-pr56812.cc is somewhat fragile and is currently 
>> failing
>> on several targets.  PR81038 notes that this began with r248678, which 
>> stopped
>> some inferior peeling solutions from preventing vectorization that could be 
>> done
>> without peeling.  I observed that for powerpc64le, r248677 vectorizes the 
>> code
>> during SLP, but r248678 vectorizes it during the loop vectorization pass.  
>> Which
>> pass does the vectorization is quite dependent on cost model, which for us 
>> is a
>> quite close decision.  In any case, the important thing is that the code is
>> vectorized, not which pass does it.
>> 
>> This patch prevents the test from flipping in and out of failure status 
>> depending
>> on which pass does the vectorization, by testing the final "optimized" dump 
>> for
>> the expected vectorized output instead of relying on a specific vectorization
>> pass dump.
>> 
>> By the way, the test case somehow had gotten DOS/Windows newlines into it, so
>> I removed those.  The ^M characters disappeared when I pasted into this 
>> mailer,
>> unfortunately.  Anyway, that's the reason for the full replacement of the 
>> file.
>> The only real changes are the dg-final directives and the documentation of 
>> the
>> expected output.
>> 
>> Verified on powerpc64le-unknown-linux-gnu.  Is this okay for trunk?
> 
> Hmm.  That removes the existing XFAIL.  Also wouldn't it be more elegant to do
> the following?  This makes the testcase pass on x86_64, thus committed ;)

Terrific, thanks!

Bill
> 
> Richard.
> 
> 2018-02-08  Richard Biener  
> 
>* g++.dg/vect/slp-pr56812.cc: Allow either basic-block or
>loop vectorization to happen.
> 
> Index: gcc/testsuite/g++.dg/vect/slp-pr56812.cc
> ===
> --- gcc/testsuite/g++.dg/vect/slp-pr56812.cc(revision 257477)
> +++ gcc/testsuite/g++.dg/vect/slp-pr56812.cc(working copy)
> @@ -1,7 +1,7 @@
> /* { dg-do compile } */
> /* { dg-require-effective-target vect_float } */
> /* { dg-require-effective-target vect_hw_misalign } */
> -/* { dg-additional-options "-O3 -funroll-loops -fvect-cost-model=dynamic" } 
> */
> +/* { dg-additional-options "-O3 -funroll-loops
> -fvect-cost-model=dynamic -fopt-info-vec" } */
> 
> class mydata {
> public:
> @@ -13,10 +13,7 @@ public:
> 
> void mydata::Set (float x)
> {
> -  for (int i=0; i +  /* We want to vectorize this either as loop or basic-block.  */
> +  for (int i=0; i vectorized" } */
> data[i] = x;
> }
> -
> -/* For targets without vector loop peeling the loop becomes cheap
> -   enough to be vectorized.  */
> -/* { dg-final { scan-tree-dump-times "basic block vectorized" 1
> "slp1" { xfail { ! vect_peeling_profitable } } } } */
> 
>> Thanks,
>> Bill
>> 
>> 
>> 2018-02-02  Bill Schmidt  
>> 
>>* g++.dg/vect/slp-pr56812.cc: Convert from DOS newline characters
>>to utf-8-unix.  Change to scan "optimized" dump for indications
>>that the code was vectorized.
>> 
>> 
>> Index: gcc/testsuite/g++.dg/vect/slp-pr56812.cc
>> ===
>> --- gcc/testsuite/g++.dg/vect/slp-pr56812.cc(revision 257352)
>> +++ gcc/testsuite/g++.dg/vect/slp-pr56812.cc(working copy)
>> @@ -1,22 +1,31 @@
>> -/* { dg-do compile } */
>> -/* { dg-require-effective-target vect_float } */
>> -/* { dg-require-effective-target vect_hw_misalign } */
>> -/* { dg-additional-options "-O3 -funroll-loops -fvect-cost-model=dynamic" } 
>> */
>> -
>> -class mydata {
>> -public:
>> -mydata() {Set(-1.0);}
>> -void Set (float);
>> -static int upper() {return 8;}
>> -float data[8];
>> -};
>> -
>> -void mydata::Set (float x)
>> -{
>> -  for (int i=0; i> -data[i] = x;
>> -}
>> -
>> -/* For targets without vector loop peeling the loop becomes cheap
>> -   enough to be vectorized.  */
>> -/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" { 
>> xfail { ! vect_peeling_profitable } } } } */
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_float } */
>> +/* { dg-require-effective-target vect_hw_misalign } */
>> +/* { dg-additional-options "-O3 -funroll-loops -fvect-cost-model=dynamic 
>> -fdump-tree-optimized" } */
>> +
>> +class mydata {
>> +public:
>> +mydata() {Set(-1.0);}
>> +void Set (float);
>> +static int upper() {return 8;}
>> +float data[8];
>> +};
>> +
>> +void mydata::Set (float x)
>> +{
>> +  for (int i=0; i> +data[i] = x;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "vect_cst__\[0-9\]* = " "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "= vect_cst__\[0-9\]*;" 2 

Re: [PATCH] add -Wstringop-overflow to LTO options (PR 84212)

2018-02-08 Thread Richard Biener
On Thu, Feb 8, 2018 at 4:08 AM, Martin Sebor  wrote:
> I went ahead and changed all the options on the list below
> to include LTO and tested the attached patch by configuring
> with --with-build-config=bootstrap-lto --disable-werror and
> making profiledbootstrap.  Attached, besides the patch, is
> also the breakdown of warnings.  The interesting column is
> the one labeled Unique.  It gives the number of distinct
> instances of each warning.
>
> This was with all languages but Go and Brig.  Those two fail
> what seem like unrelated reasons.  The Brig error is some
> unsatisfied reference (I lost the log).  Go fails with
> a bunch of errors looking like this, one for each errno
> constant:
>
> /opt/notnfs/msebor/src/gcc/84212/libgo/go/syscall/env_unix.go:96:10: error:
> reference to undefined name ‘EINVAL’
>
> On 02/07/2018 12:30 PM, Martin Sebor wrote:
>>
>> In PR 84212 the reporter asks why -Wno-stringop-overflow has
>> no effect during LTO linking.  It turns out that the reason
>> is the same as in bug 78768: the specification in the c.opt
>> file is missing LTO among the languages.
>>
>> The attached patch adds LTO to it and to -Wstringop-truncation.
>>
>> Bootstrapped and regtested on x86_64-linux.
>>
>> There are other middle-end options in the c.opt file that do
>> not mention LTO that arguably should (*).  I didn't change
>> those in this patch, in part because I don't have test cases
>> showing where it matters, and in part because I don't think
>> that having to remember to include LTO in these options (and,
>> ideally, also include a test in the test suite for each) is
>> a good approach.
>>
>> It seems that including LTO implicitly for all options would
>> obviate this manual step and eliminate the risk of missing
>> them.  Is there a reason not to do that?
>>
>> If implicitly including LTO for every option is not feasible,
>> then it might be worthwhile to write a small script to verify
>> that every middle-end option does mention LTO, and have make
>> run the script as part of the self-test step.

Well, in principle middle-end options belong in common.opt
and should be marked 'Common'.

But obviously the ones listed do not apply to, say, Fortran
but with adding LTO mixed language TUs can get late
warnings from LTO.  But they also get late warnings for
the Fortran parts then.

So related is some other PR which mentions that we do not
save warning options in the LTO IL and thus keep their
setting properly attached to function state.  I guess we should
revisit this at least for LTO supported ones.

Patch is ok.

>> Is there support for either of these changes?  If not, are
>> there any other ideas for how to avoid these kind of bugs?
>>
>> Martin
>>
>> [*] Here are a few examples.  I'm fine with adding LTO to
>> any or all of these as well as any others that I may have
>> missed for GCC 8 if that sounds like a good idea.
>>
>>   -Walloc-size-larger-than
>>   -Warray-bounds
>>   -Wformat-truncation=
>>   -Wmaybe-uninitialized
>>   -Wnonnull
>>   -Wrestrict
>>   -Wstrict-overflow
>>   -Wsuggest-attribute
>>   -Wuninitialized
>
>


[AArch64] Add SVE mul_highpart patterns

2018-02-08 Thread Richard Sandiford
One advantage of the new permute handling compared to the old way is
that we can now easily take advantage of the vectoriser's divmod patterns
for SVE.

I realise we're in stage 4, but this is entirely SVE-specific.

Tested on aarch64-linux-gnu and aarch64_be-elf.  OK to install?

Richard


2018-02-08  Richard Sandiford  

gcc/
* config/aarch64/iterators.md (UNSPEC_SMUL_HIGHPART)
(UNSPEC_UMUL_HIGHPART): New constants.
(MUL_HIGHPART): New int iteraor.
(su): Handle UNSPEC_SMUL_HIGHPART and UNSPEC_UMUL_HIGHPART.
* config/aarch64/aarch64-sve.md (mul3_highpart): New
define_expand.
(*mul3_highpart): New define_insn.

gcc/testsuite/
* gcc.target/aarch64/sve/mul_highpart_1.c: New test.
* gcc.target/aarch64/sve/mul_highpart_1_run.c: Likewise.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2018-01-26 15:14:35.386171048 +
+++ gcc/config/aarch64/iterators.md 2018-02-08 13:51:56.252511923 +
@@ -438,6 +438,8 @@ (define_c_enum "unspec"
 UNSPEC_ANDF; Used in aarch64-sve.md.
 UNSPEC_IORF; Used in aarch64-sve.md.
 UNSPEC_XORF; Used in aarch64-sve.md.
+UNSPEC_SMUL_HIGHPART ; Used in aarch64-sve.md.
+UNSPEC_UMUL_HIGHPART ; Used in aarch64-sve.md.
 UNSPEC_COND_ADD; Used in aarch64-sve.md.
 UNSPEC_COND_SUB; Used in aarch64-sve.md.
 UNSPEC_COND_SMAX   ; Used in aarch64-sve.md.
@@ -1467,6 +1469,8 @@ (define_int_iterator UNPACK [UNSPEC_UNPA
 
 (define_int_iterator UNPACK_UNSIGNED [UNSPEC_UNPACKULO UNSPEC_UNPACKUHI])
 
+(define_int_iterator MUL_HIGHPART [UNSPEC_SMUL_HIGHPART UNSPEC_UMUL_HIGHPART])
+
 (define_int_iterator SVE_COND_INT_OP [UNSPEC_COND_ADD UNSPEC_COND_SUB
  UNSPEC_COND_SMAX UNSPEC_COND_UMAX
  UNSPEC_COND_SMIN UNSPEC_COND_UMIN
@@ -1558,7 +1562,9 @@ (define_int_attr logicalf_op [(UNSPEC_AN
 (define_int_attr su [(UNSPEC_UNPACKSHI "s")
 (UNSPEC_UNPACKUHI "u")
 (UNSPEC_UNPACKSLO "s")
-(UNSPEC_UNPACKULO "u")])
+(UNSPEC_UNPACKULO "u")
+(UNSPEC_SMUL_HIGHPART "s")
+(UNSPEC_UMUL_HIGHPART "u")])
 
 (define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u")
  (UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur")
Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2018-02-01 11:04:16.723192040 +
+++ gcc/config/aarch64/aarch64-sve.md   2018-02-08 13:51:56.252511923 +
@@ -980,6 +980,34 @@ (define_insn "*msub3"
mls\t%0., %1/m, %2., %3."
 )
 
+;; Unpredicated highpart multiplication.
+(define_expand "mul3_highpart"
+  [(set (match_operand:SVE_I 0 "register_operand")
+   (unspec:SVE_I
+ [(match_dup 3)
+  (unspec:SVE_I [(match_operand:SVE_I 1 "register_operand")
+ (match_operand:SVE_I 2 "register_operand")]
+MUL_HIGHPART)]
+ UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+operands[3] = force_reg (mode, CONSTM1_RTX (mode));
+  }
+)
+
+;; Predicated highpart multiplication.
+(define_insn "*mul3_highpart"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w")
+   (unspec:SVE_I
+ [(match_operand: 1 "register_operand" "Upl")
+  (unspec:SVE_I [(match_operand:SVE_I 2 "register_operand" "%0")
+ (match_operand:SVE_I 3 "register_operand" "w")]
+MUL_HIGHPART)]
+ UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "mulh\t%0., %1/m, %0., %3."
+)
+
 ;; Unpredicated NEG, NOT and POPCOUNT.
 (define_expand "2"
   [(set (match_operand:SVE_I 0 "register_operand")
Index: gcc/testsuite/gcc.target/aarch64/sve/mul_highpart_1.c
===
--- /dev/null   2018-02-08 11:17:10.862716283 +
+++ gcc/testsuite/gcc.target/aarch64/sve/mul_highpart_1.c   2018-02-08 
13:51:56.252511923 +
@@ -0,0 +1,25 @@
+/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model --save-temps" } */
+
+#include 
+
+#define DEF_LOOP(TYPE) \
+void __attribute__ ((noipa))   \
+mod_##TYPE (TYPE *dst, TYPE *src, int count)   \
+{  \
+  for (int i = 0; i < count; ++i)  \
+dst[i] = src[i] % 17;  \
+}
+
+#define TEST_ALL(T) \
+  T (int32_t) \
+  T (uint32_t) \
+  T (int64_t) \
+  T (uint64_t)
+
+TEST_ALL (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tsmulh\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tumulh\tz[0-9]+\.s, p[0-7]/m, 
z[0-9]+\.s, 

Another fix for single-element permutes (PR 84265)

2018-02-08 Thread Richard Sandiford
PR83753 was about a case in which we ended up trying to "vectorise"
a group of loads ore stores using single-element vectors.  The problem
was that we were classifying the load or store as VMAT_CONTIGUOUS_PERMUTE
rather than VMAT_CONTIGUOUS, even though it doesn't make sense to permute
a single-element vector.

In that PR it was enough to change get_group_load_store_type,
because vectorisation ended up being unprofitable and so we didn't
take things further.  But when vectorisation is profitable, the same
fix is needed in vectorizable_load and vectorizable_store.

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK to install?

Richard


2018-02-08  Richard Sandiford  

gcc/
PR tree-optimization/84265
* tree-vect-stmts.c (vectorizable_store): Don't treat
VMAT_CONTIGUOUS accesses as grouped.
(vectorizable_load): Likewise.

gcc/testsuite/
PR tree-optimization/84265
* gcc.dg/vect/pr84265.c: New test.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2018-01-30 09:45:27.710764075 +
+++ gcc/tree-vect-stmts.c   2018-02-08 13:26:39.242566948 +
@@ -6214,7 +6214,8 @@ vectorizable_store (gimple *stmt, gimple
 }
 
   grouped_store = (STMT_VINFO_GROUPED_ACCESS (stmt_info)
-  && memory_access_type != VMAT_GATHER_SCATTER);
+  && memory_access_type != VMAT_GATHER_SCATTER
+  && (slp || memory_access_type != VMAT_CONTIGUOUS));
   if (grouped_store)
 {
   first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
@@ -7696,7 +7697,8 @@ vectorizable_load (gimple *stmt, gimple_
   return true;
 }
 
-  if (memory_access_type == VMAT_GATHER_SCATTER)
+  if (memory_access_type == VMAT_GATHER_SCATTER
+  || (!slp && memory_access_type == VMAT_CONTIGUOUS))
 grouped_load = false;
 
   if (grouped_load)
Index: gcc/testsuite/gcc.dg/vect/pr84265.c
===
--- /dev/null   2018-02-08 11:17:10.862716283 +
+++ gcc/testsuite/gcc.dg/vect/pr84265.c 2018-02-08 13:26:39.240567025 +
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+
+struct a
+{
+  unsigned long b;
+  unsigned long c;
+  int d;
+  int *e;
+  char f;
+};
+
+struct
+{
+  int g;
+  struct a h[];
+} i;
+
+int j, k;
+void l ()
+{
+  for (; k; k++)
+j += (int) (i.h[k].c - i.h[k].b);
+}


Re: Do not lose track of resolution info due to tree merging

2018-02-08 Thread Jan Hubicka
Hi,
this is patch I comitted yesterday (but failed to send email) which removes
forgotten sanity check from original fix. The check tries to catch cases where
we do merge definitions and declarations to see if resolution merging logic is
live.  It is.

Honza
* lto.c (register_resolution): Remove forgotten sanity check.

Index: lto.c
===
--- lto.c   (revision 257412)
+++ lto.c   (working copy)
@@ -839,7 +839,6 @@ register_resolution (struct lto_file_dec
   = new hash_map;
   ld_plugin_symbol_resolution_t 
  = file_data->resolution_map->get_or_insert (decl, );
-  gcc_assert (!existed || res == resolution);
   if (!existed
   || resolution == LDPR_PREVAILING_DEF_IRONLY
   || resolution == LDPR_PREVAILING_DEF


Re: Do not lose track of resolution info due to tree merging

2018-02-08 Thread Jan Hubicka
Hi,
it turns out that I have hit bit of can of worms here, so I spent most of
yesterday looking into minimal fix.  There are multiple issues remaining - we
forget to output symbols that have only DECL_EXTERNAL flag set; we behave funny
to builtins (in some cases it makes sense in others it does not); we do not
handle well references from extern inlines and constructors.

While preparing previous patch I misread unify_scc and it registers in some
cases prevailed decl instead of prevailing resulting in memory corruption.
This eventually was triggered by assert in get_resolution.

There is other bug that the sanity check in read_cgraph_and_symbols triggers
because we do not output some symbols into the lto symbol table.  This is partly
intended even though these are hacks from early LTO days. To silence false
posities I moved all the (sloppy) logic deciding whether to output symbol
into symtab_node::output_to_lto_symbol_table_p and use consistent check
at stream out time and stream in.

We still may possibly get errors on duplicated decls but I hope we do not have
those in practice anymore. (Those are bugs too) So we may need to end up
disabling this sanity check for release but I hope it won't be needed
and I would like to see if now the mechanizm works reliably, so I have decided
to keep it for now.

Patch survives bootstrap/regtest and lto-bootstrap.  I am giving it bit extra
checking and plan to commit later today unless there are complains.
I will see if I can reproduce the other issues and open separate bugs for them.

Honza

PR ipa/81360
* cgraph.h (symtab_node::output_to_lto_symbol_table_p): Declare
* symtab.c: Include builtins.h
(symtab_node::output_to_lto_symbol_table_p): Move here
from lto-streamer-out.c:output_symbol_p.
* lto-streamer-out.c (write_symbol): Turn early exit to assert.
(output_symbol_p): Move all logic to symtab.c
(produce_symtab): Update.

* lto.c (unify_scc): Register prevailing trees, not trees to be freed.
(read_cgraph_and_symbols): Use
symtab_node::output_to_lto_symbol_table_p.

Index: cgraph.h
===
--- cgraph.h(revision 257412)
+++ cgraph.h(working copy)
@@ -328,6 +328,9 @@ public:
  or abstract function kept for debug info purposes only.  */
   bool real_symbol_p (void);
 
+  /* Return true when the symbol needs to be output to the LTO symbol table.  
*/
+  bool output_to_lto_symbol_table_p (void);
+
   /* Determine if symbol declaration is needed.  That is, visible to something
  either outside this translation unit, something magic in the system
  configury. This function is used just during symbol creation.  */
Index: lto/lto.c
===
--- lto/lto.c   (revision 257442)
+++ lto/lto.c   (working copy)
@@ -1648,13 +1648,16 @@ unify_scc (struct data_in *data_in, unsi
{
  map2[i*2] = (tree)(uintptr_t)(from + i);
  map2[i*2+1] = scc->entries[i];
- lto_maybe_register_decl (data_in, scc->entries[i], from + i);
}
  qsort (map2, len, 2 * sizeof (tree), cmp_tree);
  qsort (map, len, 2 * sizeof (tree), cmp_tree);
  for (unsigned i = 0; i < len; ++i)
-   streamer_tree_cache_replace_tree (cache, map[2*i],
- (uintptr_t)map2[2*i]);
+   {
+ lto_maybe_register_decl (data_in, map[2*i],
+  (uintptr_t)map2[2*i]);
+ streamer_tree_cache_replace_tree (cache, map[2*i],
+   (uintptr_t)map2[2*i]);
+   }
}
 
  /* Free the tree nodes from the read SCC.  */
@@ -2901,8 +2904,12 @@ read_cgraph_and_symbols (unsigned nfiles
 
res = snode->lto_file_data->resolution_map->get (snode->decl);
if (!res || *res == LDPR_UNKNOWN)
- fatal_error (input_location, "missing resolution data for %s",
-  IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (snode->decl)));
+ {
+   if (snode->output_to_lto_symbol_table_p ())
+ fatal_error (input_location, "missing resolution data for %s",
+  IDENTIFIER_POINTER
+(DECL_ASSEMBLER_NAME (snode->decl)));
+ }
else
   snode->resolution = *res;
   }
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 257412)
+++ lto-streamer-out.c  (working copy)
@@ -2598,13 +2598,10 @@ write_symbol (struct streamer_tree_cache
   const char *comdat;
   unsigned char c;
 
-  /* None of the following kinds of symbols are needed in the
- symbol table.  */
-  if (!TREE_PUBLIC (t)
-  || is_builtin_fn (t)
-  || 

[hsa] Set program allocation for static local variables

2018-02-08 Thread Martin Jambor
Hi,

it has been brought to my attention that libgomp.c/target-28.c
testcase fails to finalize because the static variable s has illegal
hsa allocation.  Fixed by the patch below, which I am about to commit
to trunk (and will commit to the gcc-7-branch after testing there).

Thanks,

Martin

2018-02-08  Martin Jambor  

* hsa-gen.c (get_symbol_for_decl): Set program allocation for
static local variables.

libgomp/
* testsuite/libgomp.hsa.c/staticvar.c: New test.

Added testcase
---
 gcc/hsa-gen.c   | 10 +++---
 libgomp/testsuite/libgomp.hsa.c/staticvar.c | 23 +++
 2 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/staticvar.c

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index af0b33d658f..55a46b5a16a 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -932,9 +932,13 @@ get_symbol_for_decl (tree decl)
  else if (lookup_attribute ("hsa_group_segment",
 DECL_ATTRIBUTES (decl)))
segment = BRIG_SEGMENT_GROUP;
- else if (TREE_STATIC (decl)
-  || lookup_attribute ("hsa_global_segment",
-   DECL_ATTRIBUTES (decl)))
+ else if (TREE_STATIC (decl))
+   {
+ segment = BRIG_SEGMENT_GLOBAL;
+ allocation = BRIG_ALLOCATION_PROGRAM;
+   }
+ else if (lookup_attribute ("hsa_global_segment",
+DECL_ATTRIBUTES (decl)))
segment = BRIG_SEGMENT_GLOBAL;
  else
segment = BRIG_SEGMENT_PRIVATE;
diff --git a/libgomp/testsuite/libgomp.hsa.c/staticvar.c 
b/libgomp/testsuite/libgomp.hsa.c/staticvar.c
new file mode 100644
index 000..6d20c9aa328
--- /dev/null
+++ b/libgomp/testsuite/libgomp.hsa.c/staticvar.c
@@ -0,0 +1,23 @@
+extern void abort (void);
+
+#pragma omp declare target
+int
+foo (void)
+{
+  static int s;
+  return ++s;
+}
+#pragma omp end declare target
+
+int
+main ()
+{
+  int r;
+  #pragma omp target map(from:r)
+  {
+r = foo ();
+  }
+  if (r != 1)
+abort ();
+  return 0;
+}
-- 
2.15.1



[hsa] Fix PR82416 testcase

2018-02-08 Thread Martin Jambor
Hi,

the PR82416 was not actually being compiled into HSAIL because the
function with target region had noclone attribute, which was the
outline function for the region inherited and then the ipa-hsa pass
that clones hitherto shared functions (for both the host and an
hsa accelerator) refused to clone them.

Fixed by rearranging the functions somewhat.  The attribute is
probably actually not necessary now but let's be future-proof.

Martin

2018-02-08  Martin Jambor  

* testsuite/libgomp.hsa.c/pr82416.c: Make the function with target
clonable.
---
 libgomp/testsuite/libgomp.hsa.c/pr82416.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.hsa.c/pr82416.c 
b/libgomp/testsuite/libgomp.hsa.c/pr82416.c
index b89d421e8f3..40378ab12a5 100644
--- a/libgomp/testsuite/libgomp.hsa.c/pr82416.c
+++ b/libgomp/testsuite/libgomp.hsa.c/pr82416.c
@@ -7,8 +7,8 @@ toup (char X)
 return X;
 }
 
-char __attribute__ ((noipa))
-target_toup (char X)
+char
+target_toup_1 (char X)
 {
   char r;
 #pragma omp target map(to:X) map(from:r)
@@ -21,6 +21,12 @@ target_toup (char X)
   return r;
 }
 
+char __attribute__ ((noipa))
+target_toup (char X)
+{
+  return target_toup_1 (X);
+}
+
 int main (int argc, char **argv)
 {
   char a = 'a';
-- 
2.15.1



Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views

2018-02-08 Thread Alexandre Oliva
On Feb  7, 2018, Jason Merrill  wrote:

> OK, that makes sense.  But I'm still uncomfortable with choosing an
> existing opcode for that purpose, which previously would have been
> chosen just for reasons of encoding complexity and size.

Well, there's a good reason we didn't used to output this opcode: it's
nearly always the case that you're better off using a special opcode or
DW_LNS_advance_pc, that encodes the offset as uleb128 instead of a fixed
size.  The only exceptions I can think of are offsets that have the most
significant bits set in the representable range for
DW_LNS_fixed_advance_pc (the uleb128 representation for
DW_LNS_advance_pc would end up taking an extra byte if insns don't get
more than byte alignment), and VLIW machines, in which the
DW_LNS_advance_pc operand needs to be multiplied by the ops-per-insns
(but also divided by the min-insn-length).  So, unless you're creating a
gap of 16KiB to 64KiB in the middle of a function on an ISA such as
x86*, that has insns as small as 1 byte, you'll only use
DW_LNS_fixed_advance_pc when the assembler can't encode uleb128 offsets,
as stated in the DWARF specs.  Well, now there's one more case for using
it, and it's a rare one as well.  I didn't think it made sense to add
yet another opcode with a fixed-size operand for that.

But then again, even it was an opcode used more often, it wouldn't be a
significant problem to assign it the special behavior of not resetting
the view counter.  Views don't have to be reset for the whole thing to
work, we just need some means for the compiler (who doesn't know all
offsets) and debug info consumers (who do) to keep view numbers in sync.
A single opcode for the compiler to signal to the consumer that it
wasn't sure a smallish offset would be nonzero is enough, and
DW_LNS_fixed_advance_pc provides us with just what we need, without any
complications of having to compute a special opcode, or compute a
compiler-unknown offset times min-insn-length and have that encoded in
uleb128.


> Thanks, it would be good to have this overview in a comment somewhere.

You meant just these two paragraphs (like below), or the whole thing?


diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 3cf79270b72c..56d3e14b81bf 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3183,7 +3183,33 @@ static GTY(()) bitmap zero_view_p;
: (N) == 0)
 
 /* Return true iff we're to emit .loc directives for the assembler to
-   generate line number sections.  */
+   generate line number sections.
+
+   When we're not emitting views, all we need from the assembler is
+   support for .loc directives.
+
+   If we are emitting views, we can only use the assembler's .loc
+   support if it also supports views.
+
+   When the compiler is emitting the line number programs and
+   computing view numbers itself, it resets view numbers at known PC
+   changes and counts from that, and then it emits view numbers as
+   literal constants in locviewlists.  There are cases in which the
+   compiler is not sure about PC changes, e.g. when extra alignment is
+   requested for a label.  In these cases, the compiler may not reset
+   the view counter, and the potential PC advance in the line number
+   program will use an opcode that does not reset the view counter
+   even if the PC actually changes, so that compiler and debug info
+   consumer can keep view numbers in sync.
+
+   When the compiler defers view computation to the assembler, it
+   emits symbolic view numbers in locviewlists, with the exception of
+   views known to be zero (forced resets, or reset after
+   compiler-visible PC changes): instead of emitting symbols for
+   these, we emit literal zero and assert the assembler agrees with
+   the compiler's assessment.  We could use symbolic views everywhere,
+   instead of special-casing zero views, but then we'd be unable to
+   optimize out locviewlists that contain only zeros.  */
 
 static bool
 output_asm_line_debug_info (void)


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PATCH, v2] Recognize a missed usage of a sbfiz instruction

2018-02-08 Thread Luis Machado

Hi Kyrill,

On 02/08/2018 09:48 AM, Kyrill Tkachov wrote:

Hi Luis,

On 06/02/18 15:04, Luis Machado wrote:

Thanks for the feedback Kyrill. I've adjusted the v2 patch based on your
suggestions and re-tested the changes. Everything is still sane.


Thanks! This looks pretty good to me.


Since this is ARM-specific and fairly specific, i wonder if it would be
reasonable to consider it for inclusion at the current stage.


It is true that the target maintainers can choose to take
such patches at any stage. However, any patch at this stage increases
the risk of regressions being introduced and these regressions
can come bite us in ways that are very hard to anticipate.

Have a look at some of the bugs in bugzilla (or a quick scan of the 
gcc-bugs list)
for examples of the ways that things can go wrong with any of the myriad 
of GCC components

and the unexpected ways in which they can interact.

For example, I am now working on what I initially thought was a 
one-liner fix for
PR 84164 but it has expanded into a 3-patch series with a midend 
component and

target-specific changes for 2 ports.

These issues are very hard to catch during review and normal testing, 
and can sometimes take months of deep testing by
fuzzing and massive codebase rebuilds to expose, so the closer the 
commit is to a release
the higher the risk is that an obscure edge case will be unnoticed and 
unfixed in the release.


So the priority at this stage is to minimise the risk of destabilising 
the codebase,
as opposed to taking in new features and desirable performance 
improvements (like your patch!)


That is the rationale for delaying committing such changes until the start
of GCC 9 development. But again, this is up to the aarch64 maintainers.
I'm sure the patch will be a perfectly fine and desirable commit for GCC 9.
This is just my perspective as maintainer of the arm port.


Thanks. Your explanation makes the situation pretty clear and it sounds 
very reasonable. I'll put the patch on hold until development is open again.


Regards,
Luis


Re: C++ PATCH to fix ICE with vector expr folding (PR c++/83659)

2018-02-08 Thread Marek Polacek
On Thu, Feb 08, 2018 at 12:55:10PM +0100, Jakub Jelinek wrote:
> On Wed, Feb 07, 2018 at 04:21:43PM -0500, Jason Merrill wrote:
> > On Wed, Feb 7, 2018 at 4:14 PM, Jakub Jelinek  wrote:
> > > On Wed, Feb 07, 2018 at 03:52:39PM -0500, Jason Merrill wrote:
> > >> > E.g. the constexpr function uses 
> > >> > same_type_ignoring_top_level_qualifiers_p
> > >> > instead of == type comparisons, the COMPONENT_REF stuff, ...
> > >>
> > >> > For poly_* stuff, I think Richard S. wants to introduce it into the 
> > >> > FEs at
> > >> > some point, but I could be wrong; certainly it hasn't been done yet and
> > >> > generally, poly*int seems to be a nightmare to deal with.
> > >>
> > >> Yes, I understand how we got to this point, but having the functions
> > >> diverge because of this guideline seems like a mistake.  And there
> > >> seem to be two ways to avoid the divergence: make an exception to the
> > >> guideline, or move the function.
> > >
> > > Functionally, I think the following patch should turn fold_indirect_ref_1
> > > to be equivalent to the patched constexpr.c version (with the known
> > > documented differences), so if this is the obstackle for the acceptance
> > > of the patch, I can test this.
> > >
> > > Otherwise, I must say I have no idea how to share the code,
> > > same_type_ignoring_qualifiers is only a C++ FE function, so the middle-end
> > > can't use it even conditionally, and similarly with the TBAA issues.
> > 
> > Again, can we make an exception and use poly_int in this function
> > because it's mirroring a middle-end function?
> 
> So like this if it passes bootstrap/regtest?  It is kind of bidirectional
> merge of changes between the 2 functions, except for intentional differences
> (e.g. the same_type_ignoring_top_level_qualifiers_p vs. ==, in_gimple_form
> stuff in fold-const.c, the C++ specific empty class etc. handling in
> constexpr.c etc.).

Better than my patch, and also has comments.  Thanks and sorry for duplicated
effort!

Marek


Re: [PATCH] S/390: Disable prediction of indirect branches

2018-02-08 Thread Andreas Krebbel
On 02/08/2018 12:33 PM, Richard Biener wrote:
> On Wed, Feb 7, 2018 at 1:01 PM, Andreas Krebbel
>  wrote:
>> This patch implements GCC support for mitigating vulnerability
>> CVE-2017-5715 known as Spectre #2 on IBM Z.
>>
>> In order to disable prediction of indirect branches the implementation
>> makes use of an IBM Z specific feature - the execute instruction.
>> Performing an indirect branch via execute prevents the branch from
>> being subject to dynamic branch prediction.
>>
>> The implementation tries to stay close to the x86 solution regarding
>> user interface.
>>
>> x86 style options supported (without thunk-inline):
>>
>> -mindirect-branch=(keep|thunk|thunk-extern)
>> -mfunction-return=(keep|thunk|thunk-extern)
>>
>> IBM Z specific options:
>>
>> -mindirect-branch-jump=(keep|thunk|thunk-extern|thunk-inline)
>> -mindirect-branch-call=(keep|thunk|thunk-extern)
>> -mfunction-return-reg=(keep|thunk|thunk-extern)
>> -mfunction-return-mem=(keep|thunk|thunk-extern)
>>
>> These options allow us to enable/disable the branch conversion at a
>> finer granularity.
>>
>> -mindirect-branch sets the value of -mindirect-branch-jump and
>>  -mindirect-branch-call.
>>
>> -mfunction-return sets the value of -mfunction-return-reg and
>>  -mfunction-return-mem.
>>
>> All these options are supported on GCC command line as well as
>> function attributes.
>>
>> 'thunk' triggers the generation of out of line thunks (expolines) and
>> replaces the formerly indirect branch with a direct branch to the
>> thunk.  Depending on the -march= setting two different types of thunks
>> are generated.  With -march=z10 or higher exrl (execute relative long)
>> is being used while targeting older machines makes use of larl/ex
>> instead.  From a security perspective the exrl variant is preferable.
>>
>> 'thunk-extern' does the branch replacement like 'thunk' but does not
>> emit the thunks.
>>
>> 'thunk-inline' is only available for indirect jumps.  It should be used
>> in environments where correct CFI is important - known as user space.
>>
>> Additionally the patch introduces the -mindirect-branch-table option
>> which generates tables pointing to the locations which have been
>> modified.  This is supposed to allow reverting the changes without
>> re-compilation in situations where it isn't required. The sections are
>> split up into one section per option.
>>
>> I plan to commit the patch tomorrow.
> 
> Do you also plan to backport this to the GCC 7 branch?

Yes, I'm working on it.

-Andreas-



Re: Use nonzero bits to refine range in split_constant_offset (PR 81635)

2018-02-08 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Feb 2, 2018 at 3:12 PM, Richard Sandiford
>  wrote:
>> Index: gcc/tree-data-ref.c
>> ===
>> --- gcc/tree-data-ref.c 2018-02-02 14:03:53.964530009 +
>> +++ gcc/tree-data-ref.c 2018-02-02 14:03:54.184521826 +
>> @@ -721,7 +721,13 @@ split_constant_offset_1 (tree type, tree
>> if (TREE_CODE (tmp_var) != SSA_NAME)
>>   return false;
>> wide_int var_min, var_max;
>> -   if (get_range_info (tmp_var, _min, _max) != VR_RANGE)
>> +   value_range_type vr_type = get_range_info (tmp_var, _min,
>> +  _max);
>> +   wide_int var_nonzero = get_nonzero_bits (tmp_var);
>> +   signop sgn = TYPE_SIGN (itype);
>> +   if (intersect_range_with_nonzero_bits (vr_type, _min,
>> +  _max, var_nonzero,
>> +  sgn) != VR_RANGE)
>
> Above it looks like we could go from VR_RANGE to VR_UNDEFINED.
> I'm not sure if the original range-info might be useful in this case -
> if it may be
> can we simply use only the range info if it was VR_RANGE?

I think we only drop to VR_UNDEFINED if we have contradictory
information: nonzero bits says some bits must be clear, but the range
only contains values for which the bits are set.  In that case I think
we should either be conservative and not use the information, or be
aggressive and say that we have undefined behaviour, so overflow is OK.

It seems a bit of a fudge to go back to the old range when we know it's
false, and use it to allow the split some times and not others.

Thanks,
Richard

>
> Ok otherwise.
> Thanks,
> Richard.
>
>>   return false;
>>
>> /* See whether the range of OP0 (i.e. TMP_VAR + TMP_OFF)
>> @@ -729,7 +735,6 @@ split_constant_offset_1 (tree type, tree
>>operations done in ITYPE.  The addition must overflow
>>at both ends of the range or at neither.  */
>> bool overflow[2];
>> -   signop sgn = TYPE_SIGN (itype);
>> unsigned int prec = TYPE_PRECISION (itype);
>> wide_int woff = wi::to_wide (tmp_off, prec);
>> wide_int op0_min = wi::add (var_min, woff, sgn, 
>> [0]);
>> Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-3.c
>> ===
>> --- /dev/null   2018-02-02 09:03:36.168354735 +
>> +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-3.c2018-02-02 
>> 14:03:54.183521863 +
>> @@ -0,0 +1,62 @@
>> +/* { dg-do compile } */
>> +/* { dg-additional-options "-fno-tree-loop-vectorize" } */
>> +/* { dg-require-effective-target vect_double } */
>> +/* { dg-require-effective-target lp64 } */
>> +
>> +void
>> +f1 (double *p, double *q, unsigned int n)
>> +{
>> +  p = (double *) __builtin_assume_aligned (p, sizeof (double) * 2);
>> +  q = (double *) __builtin_assume_aligned (q, sizeof (double) * 2);
>> +  for (unsigned int i = 0; i < n; i += 4)
>> +{
>> +  double a = q[i] + p[i];
>> +  double b = q[i + 1] + p[i + 1];
>> +  q[i] = a;
>> +  q[i + 1] = b;
>> +}
>> +}
>> +
>> +void
>> +f2 (double *p, double *q, unsigned int n)
>> +{
>> +  p = (double *) __builtin_assume_aligned (p, sizeof (double) * 2);
>> +  q = (double *) __builtin_assume_aligned (q, sizeof (double) * 2);
>> +  for (unsigned int i = 0; i < n; i += 2)
>> +{
>> +  double a = q[i] + p[i];
>> +  double b = q[i + 1] + p[i + 1];
>> +  q[i] = a;
>> +  q[i + 1] = b;
>> +}
>> +}
>> +
>> +void
>> +f3 (double *p, double *q, unsigned int n)
>> +{
>> +  p = (double *) __builtin_assume_aligned (p, sizeof (double) * 2);
>> +  q = (double *) __builtin_assume_aligned (q, sizeof (double) * 2);
>> +  for (unsigned int i = 0; i < n; i += 6)
>> +{
>> +  double a = q[i] + p[i];
>> +  double b = q[i + 1] + p[i + 1];
>> +  q[i] = a;
>> +  q[i + 1] = b;
>> +}
>> +}
>> +
>> +void
>> +f4 (double *p, double *q, unsigned int start, unsigned int n)
>> +{
>> +  p = (double *) __builtin_assume_aligned (p, sizeof (double) * 2);
>> +  q = (double *) __builtin_assume_aligned (q, sizeof (double) * 2);
>> +  for (unsigned int i = start & -2; i < n; i += 2)
>> +{
>> +  double a = q[i] + p[i];
>> +  double b = q[i + 1] + p[i + 1];
>> +  q[i] = a;
>> +  q[i + 1] = b;
>> +}
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 4 "slp1" } } 
>> */
>> Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-4.c
>> ===
>> --- /dev/null   2018-02-02 09:03:36.168354735 +
>> +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr81635-4.c2018-02-02 
>> 

RE: [PATCH, i386] Fix ix86_multiplication_cost for SKX

2018-02-08 Thread Shalnov, Sergey
Uros,
I provided a patch for cost model tuning here 
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00405.html
Current patch fixes a regression in a test that caused my cost model tuning 
patch.
Sergey

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Wednesday, February 7, 2018 2:15 PM
To: Shalnov, Sergey 
Cc: gcc-patches@gcc.gnu.org; Peryt, Sebastian ; 
Ivchenko, Alexander ; Kirill Yukhin 

Subject: Re: [PATCH, i386] Fix ix86_multiplication_cost for SKX

On Wed, Feb 7, 2018 at 2:02 PM, Shalnov, Sergey  
wrote:
> Hi,
> This patch is one of the set of patches to fix SKX costs.

Please post the whole series for review.

Thanks,
Uros.



[AArch64] Add a tlsdesc call pattern for SVE

2018-02-08 Thread Richard Sandiford
tlsdesc calls are guaranteed to preserve all Advanced SIMD registers,
but are not guaranteed to preserve the SVE extension of them.
The calls also don't preserve the SVE predicate registers.

The long-term plan for handling the SVE vector registers is CLOBBER_HIGH,
which adds a clobber equivalent of TARGET_HARD_REGNO_CALL_PART_CLOBBERED.
The pattern can then directly model the fact that the low 128 bits are
preserved and the upper bits are clobbered.

However, it's too late now for that to be included in GCC 8, so this
patch conservatively treats the whole vector register as being clobbered.
This has the obvious disadvantage that compiling for SVE can make NEON
code worse, but I don't think there's much we can do about that until
CLOBBER_HIGH is in.

Tested on aarch64-linux-gnu and aarch64_be-elf.  OK to install?

Richard


2018-02-08  Richard Sandiford  

gcc/
* config/aarch64/aarch64.md (V4_REGNUM, V8_REGNUM, V12_REGNUM)
(V20_REGNUM, V24_REGNUM, V28_REGNUM, P1_REGNUM, P2_REGNUM, P3_REGNUM)
(P4_REGNUM, P5_REGNUM, P6_REGNUM, P8_REGNUM, P9_REGNUM, P10_REGNUM)
(P11_REGNUM, P12_REGNUM, P13_REGNUM, P14_REGNUM): New define_constants.
(tlsdesc_small_): Turn a define_expand and use
tlsdesc_small_sve_ for SVE.  Rename original define_insn to...
(tlsdesc_small_advsimd_): ...this.
(tlsdesc_small_sve_): New pattern.

gcc/testsuite/
* gcc.target/aarch64/sve/tls_1.c: New test.
* gcc.target/aarch64/sve/tls_2.C: Likewise.

Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   2018-02-01 11:04:16.726191903 +
+++ gcc/config/aarch64/aarch64.md   2018-02-08 11:51:37.226675644 +
@@ -57,7 +57,14 @@ (define_constants
 (LR_REGNUM 30)
 (SP_REGNUM 31)
 (V0_REGNUM 32)
+(V4_REGNUM 36)
+(V8_REGNUM 40)
+(V12_REGNUM44)
 (V15_REGNUM47)
+(V16_REGNUM48)
+(V20_REGNUM52)
+(V24_REGNUM56)
+(V28_REGNUM60)
 (V31_REGNUM63)
 (LAST_SAVED_REGNUM 63)
 (SFP_REGNUM64)
@@ -66,7 +73,20 @@ (define_constants
 ;; Defined only to make the DWARF description simpler.
 (VG_REGNUM 67)
 (P0_REGNUM 68)
+(P1_REGNUM 69)
+(P2_REGNUM 70)
+(P3_REGNUM 71)
+(P4_REGNUM 72)
+(P5_REGNUM 73)
+(P6_REGNUM 74)
 (P7_REGNUM 75)
+(P8_REGNUM 76)
+(P9_REGNUM 77)
+(P10_REGNUM78)
+(P11_REGNUM79)
+(P12_REGNUM80)
+(P13_REGNUM81)
+(P14_REGNUM82)
 (P15_REGNUM83)
   ]
 )
@@ -5786,14 +5806,68 @@ (define_insn "tlsle48_"
(set_attr "length" "12")]
 )
 
-(define_insn "tlsdesc_small_"
+(define_expand "tlsdesc_small_"
+  [(unspec:PTR [(match_operand 0 "aarch64_valid_symref")] UNSPEC_TLSDESC)]
+  "TARGET_TLS_DESC"
+  {
+if (TARGET_SVE)
+  emit_insn (gen_tlsdesc_small_sve_ (operands[0]));
+else
+  emit_insn (gen_tlsdesc_small_advsimd_ (operands[0]));
+DONE;
+  }
+)
+
+;; tlsdesc calls preserve all core and Advanced SIMD registers except
+;; R0 and LR.
+(define_insn "tlsdesc_small_advsimd_"
   [(set (reg:PTR R0_REGNUM)
 (unspec:PTR [(match_operand 0 "aarch64_valid_symref" "S")]
-  UNSPEC_TLSDESC))
+   UNSPEC_TLSDESC))
(clobber (reg:DI LR_REGNUM))
(clobber (reg:CC CC_REGNUM))
(clobber (match_scratch:DI 1 "=r"))]
-  "TARGET_TLS_DESC"
+  "TARGET_TLS_DESC && !TARGET_SVE"
+  "adrp\\tx0, %A0\;ldr\\t%1, [x0, #%L0]\;add\\t0, 0, 
%L0\;.tlsdesccall\\t%0\;blr\\t%1"
+  [(set_attr "type" "call")
+   (set_attr "length" "16")])
+
+;; For SVE, model tlsdesc calls as clobbering all vector and predicate
+;; registers, on top of the usual R0 and LR.  In reality the calls
+;; preserve the low 128 bits of the vector registers, but we don't
+;; yet have a way of representing that in the instruction pattern.
+(define_insn "tlsdesc_small_sve_"
+  [(set (reg:PTR R0_REGNUM)
+(unspec:PTR [(match_operand 0 "aarch64_valid_symref" "S")]
+   UNSPEC_TLSDESC))
+   (clobber (reg:DI LR_REGNUM))
+   (clobber (reg:CC CC_REGNUM))
+   (clobber (reg:XI V0_REGNUM))
+   (clobber (reg:XI V4_REGNUM))
+   (clobber (reg:XI V8_REGNUM))
+   (clobber (reg:XI V12_REGNUM))
+   (clobber (reg:XI V16_REGNUM))
+   (clobber (reg:XI V20_REGNUM))
+   (clobber (reg:XI V24_REGNUM))
+   (clobber (reg:XI V28_REGNUM))
+   (clobber (reg:VNx2BI P0_REGNUM))
+   (clobber (reg:VNx2BI P1_REGNUM))
+   (clobber (reg:VNx2BI P2_REGNUM))
+   (clobber (reg:VNx2BI P3_REGNUM))
+   (clobber (reg:VNx2BI P4_REGNUM))
+   (clobber (reg:VNx2BI P5_REGNUM))
+   (clobber (reg:VNx2BI P6_REGNUM))
+   

Re: C++ PATCH to fix ICE with vector expr folding (PR c++/83659)

2018-02-08 Thread Marek Polacek
On Thu, Feb 08, 2018 at 10:15:45AM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > On Wed, Feb 07, 2018 at 03:23:25PM -0500, Jason Merrill wrote:
> >> On Wed, Feb 7, 2018 at 2:48 PM, Jakub Jelinek  wrote:
> >> > On Wed, Feb 07, 2018 at 08:36:31PM +0100, Marek Polacek wrote:
> >> >> > > That was my first patch, but it was rejected:
> >> >> > > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00271.html
> >> >> >
> >> >> > Then should we update fold_indirect_ref_1 to use the new code?  Is
> >> >> > there a reason for them to stay out of sync?
> >> >>
> >> >> One of the reasons is that middle end uses poly_uint64 type but the
> >> >> front ends
> >> >> shouldn't use them.  So some of these functions will unfortunately 
> >> >> differ.
> >> >
> >> > Yeah.  Part of the patch makes the two implementations slightly more
> >> > similar, but I have e.g. no idea how to test for poly_uint64 that fits
> >> > also in poly_int64 and the poly_int* stuff makes the two substantially
> >> > different in any case.
> >> 
> >> Hmm.  Well, that seems rather unfortunate.  Why shouldn't the front
> >> ends use them?  Can we make an exception for this function because
> >> it's supposed to mirror a middle-end function?
> >> Should we try to push this function back into the middle end?
> >
> > The function comment seems to explain the reasons:
> > /* A less strict version of fold_indirect_ref_1, which requires cv-quals to
> >match.  We want to be less strict for simple *& folding; if we have a
> >non-const temporary that we access through a const pointer, that should
> >work.  We handle this here rather than change fold_indirect_ref_1
> >because we're dealing with things like ADDR_EXPR of INTEGER_CST which
> >don't really make sense outside of constant expression evaluation.  Also
> >we want to allow folding to COMPONENT_REF, which could cause trouble
> >with TBAA in fold_indirect_ref_1.
> >
> >Try to keep this function synced with fold_indirect_ref_1.  */
> >
> > E.g. the constexpr function uses same_type_ignoring_top_level_qualifiers_p
> > instead of == type comparisons, the COMPONENT_REF stuff, ...
> >
> > For poly_* stuff, I think Richard S. wants to introduce it into the FEs at
> > some point, but I could be wrong; certainly it hasn't been done yet and
> > generally, poly*int seems to be a nightmare to deal with.
> 
> There's no problem with FEs using poly_int now if they want to,
> or if it happens to be more convenient in a particular context
> (which seems to be the case here to avoid divergence).  It's more
> that FEs don't need to go out of their way to handle poly_int,
> since the FE can't yet introduce any cases in which the poly_ints
> would be nonconstant.
> 
> In practice FEs do already use poly_int directly (and indirectly
> via support routines).

In that case, this patch attemps to synchronize cxx_fold_indirect_ref and
fold_indirect_ref_1 somewhat, especially regarding the poly* stuff.

It also changes if-else-if to if, if, but I can drop that change.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-02-08  Marek Polacek  

PR c++/83659
* constexpr.c (cxx_fold_indirect_ref): Synchronize with
fold_indirect_ref_1.

* g++.dg/torture/pr83659.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index 93dd8ae049c..6286c7828c6 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -3025,9 +3025,10 @@ cxx_eval_vec_init (const constexpr_ctx *ctx, tree t,
 static tree
 cxx_fold_indirect_ref (location_t loc, tree type, tree op0, bool *empty_base)
 {
-  tree sub, subtype;
+  tree sub = op0;
+  tree subtype;
+  poly_uint64 const_op01;
 
-  sub = op0;
   STRIP_NOPS (sub);
   subtype = TREE_TYPE (sub);
   if (!POINTER_TYPE_P (subtype))
@@ -3106,8 +3107,9 @@ cxx_fold_indirect_ref (location_t loc, tree type, tree 
op0, bool *empty_base)
  return fold_build3 (COMPONENT_REF, type, op, field, NULL_TREE);
}
 }
-  else if (TREE_CODE (sub) == POINTER_PLUS_EXPR
-  && TREE_CODE (TREE_OPERAND (sub, 1)) == INTEGER_CST)
+
+  if (TREE_CODE (sub) == POINTER_PLUS_EXPR
+  && poly_int_tree_p (TREE_OPERAND (sub, 1), _op01))
 {
   tree op00 = TREE_OPERAND (sub, 0);
   tree op01 = TREE_OPERAND (sub, 1);
@@ -3124,26 +3126,25 @@ cxx_fold_indirect_ref (location_t loc, tree type, tree 
op0, bool *empty_base)
  && (same_type_ignoring_top_level_qualifiers_p
  (type, TREE_TYPE (op00type
{
- HOST_WIDE_INT offset = tree_to_shwi (op01);
  tree part_width = TYPE_SIZE (type);
- unsigned HOST_WIDE_INT part_widthi = tree_to_shwi 
(part_width)/BITS_PER_UNIT;
- unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
- tree index = bitsize_int (indexi);
-
- if (known_lt (offset / part_widthi,
-   TYPE_VECTOR_SUBPARTS 

Re: [PATCH, v2] Recognize a missed usage of a sbfiz instruction

2018-02-08 Thread Kyrill Tkachov

Hi Luis,

On 06/02/18 15:04, Luis Machado wrote:

Thanks for the feedback Kyrill. I've adjusted the v2 patch based on your
suggestions and re-tested the changes. Everything is still sane.


Thanks! This looks pretty good to me.


Since this is ARM-specific and fairly specific, i wonder if it would be
reasonable to consider it for inclusion at the current stage.


It is true that the target maintainers can choose to take
such patches at any stage. However, any patch at this stage increases
the risk of regressions being introduced and these regressions
can come bite us in ways that are very hard to anticipate.

Have a look at some of the bugs in bugzilla (or a quick scan of the gcc-bugs 
list)
for examples of the ways that things can go wrong with any of the myriad of GCC 
components
and the unexpected ways in which they can interact.

For example, I am now working on what I initially thought was a one-liner fix 
for
PR 84164 but it has expanded into a 3-patch series with a midend component and
target-specific changes for 2 ports.

These issues are very hard to catch during review and normal testing, and can 
sometimes take months of deep testing by
fuzzing and massive codebase rebuilds to expose, so the closer the commit is to 
a release
the higher the risk is that an obscure edge case will be unnoticed and unfixed 
in the release.

So the priority at this stage is to minimise the risk of destabilising the 
codebase,
as opposed to taking in new features and desirable performance improvements 
(like your patch!)

That is the rationale for delaying committing such changes until the start
of GCC 9 development. But again, this is up to the aarch64 maintainers.
I'm sure the patch will be a perfectly fine and desirable commit for GCC 9.
This is just my perspective as maintainer of the arm port.

Thanks,
Kyrill


Regards,
Luis

Changes in v2:

- Added more restrictive predicates to operands 2, 3 and 4.
- Removed pattern conditional.
- Switched to long long for 64-bit signed integer for the testcase.

---

A customer reported the following missed opportunities to combine a couple
instructions into a sbfiz.

int sbfiz32 (int x)
{
   return x << 29 >> 10;
}

long long sbfiz64 (long long x)
{
   return x << 58 >> 20;
}

This gets converted to the following pattern:

(set (reg:SI 98)
 (ashift:SI (sign_extend:SI (reg:HI 0 x0 [ xD.3334 ]))
 (const_int 6 [0x6])))

Currently, gcc generates the following:

sbfiz32:
lsl x0, x0, 29
asr x0, x0, 10
ret

sbfiz64:
lsl x0, x0, 58
asr x0, x0, 20
ret

It could generate this instead:

sbfiz32:
sbfiz   w0, w0, 19, 3
ret

sbfiz64::
sbfiz   x0, x0, 38, 6
ret

The unsigned versions already generate ubfiz for the same code, so the lack of
a sbfiz pattern may have been an oversight.

This particular sbfiz pattern shows up in both CPU2006 (~ 80 hits) and
CPU2017 (~ 280 hits). It's not a lot, but seems beneficial in any case. No
significant performance differences, probably due to the small number of
occurrences.

2018-02-06  Luis Machado  

gcc/
* config/aarch64/aarch64.md (*ashift_extv_bfiz): New pattern.

2018-02-06  Luis Machado  

gcc/testsuite/
* gcc.target/aarch64/lsl_asr_sbfiz.c: New test.
---
  gcc/config/aarch64/aarch64.md| 13 +
  gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c | 24 
  2 files changed, 37 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5a2a930..e8284ae 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4828,6 +4828,19 @@
[(set_attr "type" "bfx")]
  )
  
+;; Match sbfiz pattern in a shift left + shift right operation.

+
+(define_insn "*ashift_extv_bfiz"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+   (ashift:GPI (sign_extract:GPI (match_operand:GPI 1 "register_operand" 
"r")
+ (match_operand 2 
"aarch64_simd_shift_imm_offset_" "n")
+ (match_operand 3 "aarch64_simd_shift_imm_" 
"n"))
+(match_operand 4 "aarch64_simd_shift_imm_" "n")))]
+  ""
+  "sbfiz\\t%0, %1, %4, %2"
+  [(set_attr "type" "bfx")]
+)
+
  ;; When the bit position and width of the equivalent extraction add up to 32
  ;; we can use a W-reg LSL instruction taking advantage of the implicit
  ;; zero-extension of the X-reg.
diff --git a/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c 
b/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c
new file mode 100644
index 000..106433d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+/* Check that a LSL followed by an ASR can be combined into a single SBFIZ
+   instruction.  */
+
+/* 

[PATCH, i386] PR target/83008: Fix for SKX cost model

2018-02-08 Thread Shalnov, Sergey
Hi,
This patch contain cost model change for SKX and closes PR target/83008 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008)

It provides following performance scores in geomean:
SPEC CPU2017 intrate +0.6%
SPEC CPU2017 fprate +1.5%
SPEC 2006 [int|fp] no changes out of noise

I found a regression and solve it with 
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00320.html

Could you please merge the patch (and patch in the link 
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00320.html)
to the main trunk?

Thank you
Sergey

2018-02-06  Sergey Shalnov  

gcc/
PR target/83008
* config/i386/x86-tune-costs.h (struct processor_costs): Fixed
256 and 512 aligned store costs and integer stores according PR83008

gcc/testsuite/
* gcc.target/i386/pr83008.c: New test.



0013-SKX_cost_model_test.patch
Description: 0013-SKX_cost_model_test.patch


Re: Use nonzero bits to refine range in split_constant_offset (PR 81635)

2018-02-08 Thread Richard Biener
On Fri, Feb 2, 2018 at 3:12 PM, Richard Sandiford
 wrote:
> This patch is part 2 of the fix for PR 81635.  It means that
> split_constant_offset can handle loops like:
>
>   for (unsigned int i = 0; i < n; i += 4)
> {
>   a[i] = ...;
>   a[i + 1] = ...;
> }
>
> CCP records that "i" must have its low 2 bits clear, but we don't
> include this information in the range of "i", which remains [0, +INF].
> I tried making set_nonzero_bits update the range info in the same
> way that set_range_info updates the nonzero bits, but it regressed
> cases like vrp117.c and made some other tests worse.
>
> vrp117.c has a multiplication by 10, so CCP can infer that the low bit
> of the result is clear.  If we included that in the range, the range
> would go from [-INF, +INF] to [-INF, not-quite-+INF].  However,
> the multiplication is also known to overflow in all cases, so VRP
> saturates the result to [INT_MAX, INT_MAX].  This obviously creates a
> contradiction with the nonzero bits, and intersecting the new saturated
> range with an existing not-quite-+INF range would make us drop to
> VR_UNDEFINED.  We're prepared to fold a comparison with an [INT_MAX,
> INT_MAX] value but not with a VR_UNDEFINED value.
>
> The other problems were created when intersecting [-INF, not-quite-+INF]
> with a useful VR_ANTI_RANGE like ~[-1, 1].  The intersection would
> keep the former range rather than the latter.
>
> The patch therefore keeps the adjustment local to split_constant_offset
> for now, but adds a helper routine so that it's easy to move this later.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?
>
> Richard
>
>
> 2018-02-02  Richard Sandiford  
>
> gcc/
> PR tree-optimization/81635
> * wide-int.h (wi::round_down_for_mask, wi::round_up_for_mask): 
> Declare.
> * wide-int.cc (wi::round_down_for_mask, wi::round_up_for_mask)
> (test_round_for_mask): New functions.
> (wide_int_cc_tests): Call test_round_for_mask.
> * tree-vrp.h (intersect_range_with_nonzero_bits): Declare.
> * tree-vrp.c (intersect_range_with_nonzero_bits): New function.
> * tree-data-ref.c (split_constant_offset_1): Use it to refine the
> range returned by get_range_info.
>
> gcc/testsuite/
> PR tree-optimization/81635
> * gcc.dg/vect/bb-slp-pr81635-3.c: New test.
> * gcc.dg/vect/bb-slp-pr81635-4.c: Likewise.
>
> Index: gcc/wide-int.h
> ===
> --- gcc/wide-int.h  2018-02-02 14:03:53.964530009 +
> +++ gcc/wide-int.h  2018-02-02 14:03:54.185521788 +
> @@ -3308,6 +3308,8 @@ gt_pch_nx (trailing_wide_ints  *, voi
>wide_int set_bit_in_zero (unsigned int, unsigned int);
>wide_int insert (const wide_int , const wide_int , unsigned int,
>unsigned int);
> +  wide_int round_down_for_mask (const wide_int &, const wide_int &);
> +  wide_int round_up_for_mask (const wide_int &, const wide_int &);
>
>template 
>T mask (unsigned int, bool);
> Index: gcc/wide-int.cc
> ===
> --- gcc/wide-int.cc 2018-02-02 14:03:53.964530009 +
> +++ gcc/wide-int.cc 2018-02-02 14:03:54.185521788 +
> @@ -2132,6 +2132,70 @@ wi::only_sign_bit_p (const wide_int_ref
>return only_sign_bit_p (x, x.precision);
>  }
>
> +/* Return VAL if VAL has no bits set outside MASK.  Otherwise round VAL
> +   down to the previous value that has no bits set outside MASK.
> +   This rounding wraps for signed values if VAL is negative and
> +   the top bit of MASK is clear.
> +
> +   For example, round_down_for_mask (6, 0xf1) would give 1 and
> +   round_down_for_mask (24, 0xf1) would give 17.  */
> +
> +wide_int
> +wi::round_down_for_mask (const wide_int , const wide_int )
> +{
> +  /* Get the bits in VAL that are outside the mask.  */
> +  wide_int extra_bits = wi::bit_and_not (val, mask);
> +  if (extra_bits == 0)
> +return val;
> +
> +  /* Get a mask that includes the top bit in EXTRA_BITS and is all 1s
> + below that bit.  */
> +  unsigned int precision = val.get_precision ();
> +  wide_int lower_mask = wi::mask (precision - wi::clz (extra_bits),
> + false, precision);
> +
> +  /* Clear the bits that aren't in MASK, but ensure that all bits
> + in MASK below the top cleared bit are set.  */
> +  return (val & mask) | (mask & lower_mask);
> +}
> +
> +/* Return VAL if VAL has no bits set outside MASK.  Otherwise round VAL
> +   up to the next value that has no bits set outside MASK.  The rounding
> +   wraps if there are no suitable values greater than VAL.
> +
> +   For example, round_up_for_mask (6, 0xf1) would give 16 and
> +   round_up_for_mask (24, 0xf1) would give 32.  */
> +
> +wide_int
> +wi::round_up_for_mask (const wide_int , const wide_int )
> +{
> +  /* 

[PR bootstrap/56750] implement --disable-stage1-static-libs

2018-02-08 Thread Aldy Hernandez
In this PR, the reporter is complaining that forcing -static-libstdc++ 
and -static-libgcc during stage1 will also force it down to all 
subdirectories (gdb for instance).


There is some back and forth in the PR whether this is good or not.  I'm 
indifferent, but an alternative is to provide a flag 
--disable-stage1-static-libs to disable this behavior.


Tested on an x86-64 Linux system with static libraries and verifying 
that with --disable-stage1-static-libs we get an xgcc linked against 
shared libraries of libstdc++ and libgcc:


$ ldd xgcc
linux-vdso.so.1 (0x7ffe92084000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x7fec11a06000)
libm.so.6 => /lib64/libm.so.6 (0x7fec116fd000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fec114e6000)
libc.so.6 => /lib64/libc.so.6 (0x7fec1112)
/lib64/ld-linux-x86-64.so.2 (0x557117206000)

I also verified that without the flag or with 
--enable-stage1-static-libs we get no such shared libraries.


Again, I'm agnostic here.  We can just as easily close the PR and tell 
users to specify --with-stage1-libs to override the static linking, as 
I've mentioned in the PR.


OK for trunk?


	PR bootstrap/56750
	* configure.ac (stage1-static-libs): New option.
	* configure: Regenerate.

gcc/

	PR bootstrap/56750
	* doc/install.texi (--enable-stage1-static-libs): New.

diff --git a/configure.ac b/configure.ac
index aae94501e48..94b540226cc 100644
--- a/configure.ac
+++ b/configure.ac
@@ -465,6 +465,12 @@ ENABLE_LIBSTDCXX=default)
   noconfigdirs="$noconfigdirs target-libstdc++-v3"
 fi]
 
+AC_ARG_ENABLE(stage1-static-libs,
+AS_HELP_STRING([--disable-stage1-static-libs],
+  [do not statically link libstdc++ and libgcc into stage1 binaries]),
+ENABLE_STAGE1_STATIC_LIBS=$enableval,
+ENABLE_STAGE1_STATIC_LIBS=yes)
+
 # If this is accelerator compiler and its target is intelmic we enable
 # target liboffloadmic by default.  If this is compiler with offloading
 # for intelmic we enable host liboffloadmic by default.  Otherwise
@@ -1619,7 +1625,7 @@ AC_ARG_WITH(stage1-ldflags,
  # In stage 1, default to linking libstdc++ and libgcc statically with GCC
  # if supported.  But if the user explicitly specified the libraries to use,
  # trust that they are doing what they want.
- if test "$stage1_libs" = "" -a "$have_static_libs" = yes; then
+ if test "$ENABLE_STAGE1_STATIC_LIBS" = "yes" -a "$stage1_libs" = "" -a "$have_static_libs" = yes; then
stage1_ldflags="-static-libstdc++ -static-libgcc"
  fi])
 AC_SUBST(stage1_ldflags)
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 64ad2445a33..5567daad957 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1958,10 +1958,23 @@ include and lib options directly.
 These flags are applicable to the host platform only.  When building
 a cross compiler, they will not be used to configure target libraries.
 
+@item --enable-stage1-static-libs
+@itemx --disable-stage1-static-libs
+
+Enable linking stage1 binaries with @samp{-static-libstdc++
+-static-libgcc} if available on your system.  This also enables
+linking stage1 binaries statically when not bootstrapping.  The
+default is enabled.
+
+Even with this option enabled, static libraries will only be used when
+@option{--with-stage1-libs} is not set to a value.  See option
+@option{--with-stage1-ldflags} below.
+
 @item --with-stage1-ldflags=@var{flags}
 This option may be used to set linker flags to be used when linking
 stage 1 of GCC.  These are also used when linking GCC if configured with
-@option{--disable-bootstrap}.  If @option{--with-stage1-libs} is not set to a
+@option{--disable-bootstrap}.  If @option{--enable-stage1-static-libs}
+is enabled and @option{--with-stage1-libs} is not set to a
 value, then the default is @samp{-static-libstdc++ -static-libgcc}, if
 supported.
 


Re: [PATCH] S/390: Disable prediction of indirect branches

2018-02-08 Thread Richard Biener
On Wed, Feb 7, 2018 at 1:01 PM, Andreas Krebbel
 wrote:
> This patch implements GCC support for mitigating vulnerability
> CVE-2017-5715 known as Spectre #2 on IBM Z.
>
> In order to disable prediction of indirect branches the implementation
> makes use of an IBM Z specific feature - the execute instruction.
> Performing an indirect branch via execute prevents the branch from
> being subject to dynamic branch prediction.
>
> The implementation tries to stay close to the x86 solution regarding
> user interface.
>
> x86 style options supported (without thunk-inline):
>
> -mindirect-branch=(keep|thunk|thunk-extern)
> -mfunction-return=(keep|thunk|thunk-extern)
>
> IBM Z specific options:
>
> -mindirect-branch-jump=(keep|thunk|thunk-extern|thunk-inline)
> -mindirect-branch-call=(keep|thunk|thunk-extern)
> -mfunction-return-reg=(keep|thunk|thunk-extern)
> -mfunction-return-mem=(keep|thunk|thunk-extern)
>
> These options allow us to enable/disable the branch conversion at a
> finer granularity.
>
> -mindirect-branch sets the value of -mindirect-branch-jump and
>  -mindirect-branch-call.
>
> -mfunction-return sets the value of -mfunction-return-reg and
>  -mfunction-return-mem.
>
> All these options are supported on GCC command line as well as
> function attributes.
>
> 'thunk' triggers the generation of out of line thunks (expolines) and
> replaces the formerly indirect branch with a direct branch to the
> thunk.  Depending on the -march= setting two different types of thunks
> are generated.  With -march=z10 or higher exrl (execute relative long)
> is being used while targeting older machines makes use of larl/ex
> instead.  From a security perspective the exrl variant is preferable.
>
> 'thunk-extern' does the branch replacement like 'thunk' but does not
> emit the thunks.
>
> 'thunk-inline' is only available for indirect jumps.  It should be used
> in environments where correct CFI is important - known as user space.
>
> Additionally the patch introduces the -mindirect-branch-table option
> which generates tables pointing to the locations which have been
> modified.  This is supposed to allow reverting the changes without
> re-compilation in situations where it isn't required. The sections are
> split up into one section per option.
>
> I plan to commit the patch tomorrow.

Do you also plan to backport this to the GCC 7 branch?

> gcc/ChangeLog:
>
> 2018-02-07  Andreas Krebbel  
>
> * config/s390/s390-opts.h (enum indirect_branch): Define.
> * config/s390/s390-protos.h (s390_return_addr_from_memory)
> (s390_indirect_branch_via_thunk)
> (s390_indirect_branch_via_inline_thunk): Add function prototypes.
> (enum s390_indirect_branch_type): Define.
> * config/s390/s390.c (struct s390_frame_layout, struct
> machine_function): Remove.
> (indirect_branch_prez10thunk_mask, indirect_branch_z10thunk_mask)
> (indirect_branch_table_label_no, indirect_branch_table_name):
> Define variables.
> (INDIRECT_BRANCH_NUM_OPTIONS): Define macro.
> (enum s390_indirect_branch_option): Define.
> (s390_return_addr_from_memory): New function.
> (s390_handle_string_attribute): New function.
> (s390_attribute_table): Add new attribute handler.
> (s390_execute_label): Handle UNSPEC_EXECUTE_JUMP patterns.
> (s390_indirect_branch_via_thunk): New function.
> (s390_indirect_branch_via_inline_thunk): New function.
> (s390_function_ok_for_sibcall): When jumping via thunk disallow
> sibling call optimization for non z10 compiles.
> (s390_emit_call): Force indirect branch target to be a single
> register.  Add r1 clobber for non-z10 compiles.
> (s390_emit_epilogue): Emit return jump via return_use expander.
> (s390_reorg): Handle JUMP_INSNs as execute targets.
> (s390_option_override_internal): Perform validity checks for the
> new command line options.
> (s390_indirect_branch_attrvalue): New function.
> (s390_indirect_branch_settings): New function.
> (s390_set_current_function): Invoke s390_indirect_branch_settings.
> (s390_output_indirect_thunk_function):  New function.
> (s390_code_end): Implement target hook.
> (s390_case_values_threshold): Implement target hook.
> (TARGET_ASM_CODE_END, TARGET_CASE_VALUES_THRESHOLD): Define target
> macros.
> * config/s390/s390.h (struct s390_frame_layout)
> (struct machine_function): Move here from s390.c.
> (TARGET_INDIRECT_BRANCH_NOBP_RET)
> (TARGET_INDIRECT_BRANCH_NOBP_JUMP)
> (TARGET_INDIRECT_BRANCH_NOBP_JUMP_THUNK)
> (TARGET_INDIRECT_BRANCH_NOBP_JUMP_INLINE_THUNK)
> (TARGET_INDIRECT_BRANCH_NOBP_CALL)
> (TARGET_DEFAULT_INDIRECT_BRANCH_TABLE)
> (TARGET_INDIRECT_BRANCH_THUNK_NAME_EXRL)
> 

Re: [patch, libfortran] Use flexible array members for array descriptor

2018-02-08 Thread Richard Biener
On Mon, Feb 5, 2018 at 1:09 PM, Janne Blomqvist
 wrote:
> On Sun, Feb 4, 2018 at 9:49 PM, Thomas Koenig  wrote:
>> Hello world,
>>
>> in the attached patch, I have used flexible array members for
>> using the different descriptor types (following Richi's advice).
>> This does not change the binary ABI, but the library code
>> maches what we are actually doing in the front end.  I have
>> not yet given up hope of enabling LTO for the library :-)
>> and this, I think, will be a prerequisite.
>>
>> OK for trunk?
>
> Given that Jakub and Richi apparently weren't yet unanimous in their
> recommendations on the best path forward, maybe wait a bit for the
> smoke to clear?

If the effect of the patch is (it doesn't include generated files) that the
function arguments now have pointers to array descriptor types with
the flexible array then yes, that's what will be needed anyways, no need
for any dust to settle here.

The other part would then be to change the FE declarations of the
intrinsic functions as they will now mismatch as well (and it still
generates multiple ones).  So making the IL emitted by the
FE also call a function with the flexible array descriptor pointer type
is the second step to fix the LTO ODR warning.

Then the warning is gone but the alias issue still exists which means
the FE has to do all actual accesses to any array descriptor via
the flex array variant type (or using its alias set).

Richard.

> In the meantime, a few comments:
>
> 1) Is there some particular benefit to all those macroized
> descriptors, given that the only thing different is the type of the
> base_addr pointer? Wouldn't it be simpler to just have a single
> descriptor type with base_addr a void pointer, then typecast that
> pointer to whatever type is needed?
>
> 2)
>
> Index: intrinsics/date_and_time.c
> ===
> --- intrinsics/date_and_time.c (Revision 257347)
> +++ intrinsics/date_and_time.c (Arbeitskopie)
> @@ -268,7 +268,7 @@ secnds (GFC_REAL_4 *x)
>GFC_REAL_4 temp1, temp2;
>
>/* Make the INTEGER*4 array for passing to date_and_time.  */
> -  gfc_array_i4 *avalues = xmalloc (sizeof (gfc_array_i4));
> +  gfc_array_i4 *avalues = xmalloc (sizeof (gfc_full_array_i4));
>
>
> Since date_and_time requires the values array to always be rank 1,
> can't this be "xmalloc (sizeof (gfc_array_i4) +
> sizeof(dimension_data))" ?
>
> (The GFC_FULL_DESCRIPTOR stuff is useful for stack allocated
> descriptors to avoid VLA's / alloca(), but for heap allocated ones we
> can allocate only the needed size, I think)
>
>
> 3)
>
> Index: io/format.c
> ===
> --- io/format.c (Revision 257347)
> +++ io/format.c (Arbeitskopie)
> @@ -1025,7 +1025,7 @@ parse_format_list (st_parameter_dt *dtp, bool *see
>t = format_lex (fmt);
>
>/* Initialize the vlist to a zero size array.  */
> -  tail->u.udf.vlist= xmalloc (sizeof(gfc_array_i4));
> +  tail->u.udf.vlist= xmalloc (sizeof(gfc_full_array_i4));
>GFC_DESCRIPTOR_DATA(tail->u.udf.vlist) = NULL;
>GFC_DIMENSION_SET(tail->u.udf.vlist->dim[0],1, 0, 0);
>
>
> And same here?
>
>
>
> --
> Janne Blomqvist


Fwd: [C++ Patch] PR 83806 ("[6/7/8 Regression] Spurious -Wunused-but-set-parameter with nullptr")

2018-02-08 Thread Paolo Carlini

Hi,

this one should be rather straightforward. As noticed by Jakub, we 
started emitting the spurious warning with the fix for c++/69257, which, 
among other things, fixed decay_conversion wrt mark_rvalue_use and 
mark_lvalue_use calls. In particular it removed the mark_rvalue_use call 
at the very beginning of the function, thus now a PARM_DECL with 
NULLPTR_TYPE as type, being handled specially at the beginning of the 
function, doesn't get the mark_rvalue_use treatment - which, for 
example, POINTER_TYPE now gets later. I'm finishing testing on 
x86_64-linux the below. Ok if it passes?


Thanks, Paolo.

PS: sorry Jason, I have to re-send separately to the mailing list 
because some HTML crept in again. Grrr.


/

/cp
2018-02-08  Paolo Carlini  

PR c++/83806
* typeck.c (decay_conversion): Use mark_rvalue_use for the special
case of nullptr too.

/testsuite
2018-02-08  Paolo Carlini  

PR c++/83806
* g++.dg/warn/Wunused-parm-11.C: New.
Index: testsuite/g++.dg/warn/Wunused-parm-11.C
===
--- testsuite/g++.dg/warn/Wunused-parm-11.C (nonexistent)
+++ testsuite/g++.dg/warn/Wunused-parm-11.C (working copy)
@@ -0,0 +1,13 @@
+// PR c++/83806
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wunused-but-set-parameter" }
+
+template 
+bool equals(X x, Y y) {
+return (x == y); 
+}
+
+int main() {
+const char* p = nullptr;
+equals(p, nullptr);
+}
Index: cp/typeck.c
===
--- cp/typeck.c (revision 257477)
+++ cp/typeck.c (working copy)
@@ -2009,7 +2009,10 @@ decay_conversion (tree exp,
 return error_mark_node;
 
   if (NULLPTR_TYPE_P (type) && !TREE_SIDE_EFFECTS (exp))
-return nullptr_node;
+{
+  exp = mark_rvalue_use (exp, loc, reject_builtin);
+  return nullptr_node;
+}
 
   /* build_c_cast puts on a NOP_EXPR to make the result not an lvalue.
  Leave such NOP_EXPRs, since RHS is being used in non-lvalue context.  */


Re: [PR tree-optimization/84224] do not ICE on malformed allocas

2018-02-08 Thread Aldy Hernandez



On 02/07/2018 11:38 PM, Jeff Law wrote:

On 02/06/2018 02:38 AM, Aldy Hernandez wrote:

The -Walloca pass can receive a malformed alloca, courtesy of someone
providing a faulty prototype.  This was causing an ICE because we
assumed alloca calls had at least one argument, which the testcase does
not:

+void *alloca ();
+__typeof__(alloca ()) a () { return alloca (); }

I don't believe it should be the responsibility of the
-Walloca-larger-than=* pass to warn against such things, so I propose we
just ignore this.

I also think we should handle this testcase, regardless of the target
having an alloca builtin, since the testcase includes its own
prototype.  Thus, the missing "{dg-require-effect-target alloca}".

Wouldn't it make more sense to put the argument check into
gimple_alloca_call_p that way all callers are insulated?

After all if the alloca doesn't have an argument it probably doesn't
have the semantics of alloca that we expect -- like returning nonzero.

And if we look at vr-values.c:gimple_stmt_nonzero_p we see that it calls
gimple_alloca_call_p and if it returns true, then we assume the call
always returns a nonzero value.   Fixing gimple_alloca_call_p would take
care of both problems at once.

OK with that change.


Attached patch is final version.  I retested on x86-64 Linux.

Committing to trunk and closing PR.

Thanks.
gcc/

	PR tree-optimization/84224
	* gimple-ssa-warn-alloca.c (pass_walloca::execute): Remove assert.
	* calls.c (gimple_alloca_call_p): Only return TRUE when we have
	non-zero arguments.

diff --git a/gcc/calls.c b/gcc/calls.c
index 54fea158631..19c95b8455b 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -730,7 +730,7 @@ gimple_alloca_call_p (const gimple *stmt)
 switch (DECL_FUNCTION_CODE (fndecl))
   {
   CASE_BUILT_IN_ALLOCA:
-return true;
+	return gimple_call_num_args (stmt) > 0;
   default:
 	break;
   }
diff --git a/gcc/gimple-ssa-warn-alloca.c b/gcc/gimple-ssa-warn-alloca.c
index 941810a997e..327c806ae11 100644
--- a/gcc/gimple-ssa-warn-alloca.c
+++ b/gcc/gimple-ssa-warn-alloca.c
@@ -445,7 +445,6 @@ pass_walloca::execute (function *fun)
 
 	  if (!gimple_alloca_call_p (stmt))
 	continue;
-	  gcc_assert (gimple_call_num_args (stmt) >= 1);
 
 	  const bool is_vla
 	= gimple_call_alloca_for_var_p (as_a  (stmt));
diff --git a/gcc/testsuite/gcc.dg/Walloca-16.c b/gcc/testsuite/gcc.dg/Walloca-16.c
new file mode 100644
index 000..3ee96a9570a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Walloca-16.c
@@ -0,0 +1,6 @@
+/* PR tree-optimization/84224 */
+/* { dg-do compile } */
+/* { dg-options "-O0 -Walloca" } */
+
+void *alloca ();
+__typeof__(alloca ()) a () { return alloca (); }


Re: RFA: Sanitize deprecation messages (PR 84195)

2018-02-08 Thread Nick Clifton
Hi David,

>> +  /* PR 84195: Replace control characters in the message with their
>> + escaped equivalents.  Allow newlines if -fmessage-length has
>> + been set to a non-zero value.  
> 
> I'm not quite sure why we allow newlines in this case, sorry.

Because the documentation for -fmessage-length says:

  Try to format error messages so that they fit on lines 
  of about N characters.  If N is zero, then no 
  line-wrapping is done; each error message appears on a 
  single line.  This is the default for all front ends.

So with a non-zero message length, multi-line messages are allowed.

At least that was my understanding of the option.

Thanks for the patch review.  I will get onto fixing the points you
raised today.

Cheers
  Nick


Re: [PATCH] Fix PR81038

2018-02-08 Thread Richard Biener
On Sat, Feb 3, 2018 at 12:30 AM, Bill Schmidt
 wrote:
> Hi,
>
> The test g++.dg/vect/slp-pr56812.cc is somewhat fragile and is currently 
> failing
> on several targets.  PR81038 notes that this began with r248678, which stopped
> some inferior peeling solutions from preventing vectorization that could be 
> done
> without peeling.  I observed that for powerpc64le, r248677 vectorizes the code
> during SLP, but r248678 vectorizes it during the loop vectorization pass.  
> Which
> pass does the vectorization is quite dependent on cost model, which for us is 
> a
> quite close decision.  In any case, the important thing is that the code is
> vectorized, not which pass does it.
>
> This patch prevents the test from flipping in and out of failure status 
> depending
> on which pass does the vectorization, by testing the final "optimized" dump 
> for
> the expected vectorized output instead of relying on a specific vectorization
> pass dump.
>
> By the way, the test case somehow had gotten DOS/Windows newlines into it, so
> I removed those.  The ^M characters disappeared when I pasted into this 
> mailer,
> unfortunately.  Anyway, that's the reason for the full replacement of the 
> file.
> The only real changes are the dg-final directives and the documentation of the
> expected output.
>
> Verified on powerpc64le-unknown-linux-gnu.  Is this okay for trunk?

Hmm.  That removes the existing XFAIL.  Also wouldn't it be more elegant to do
the following?  This makes the testcase pass on x86_64, thus committed ;)

Richard.

2018-02-08  Richard Biener  

* g++.dg/vect/slp-pr56812.cc: Allow either basic-block or
loop vectorization to happen.

Index: gcc/testsuite/g++.dg/vect/slp-pr56812.cc
===
--- gcc/testsuite/g++.dg/vect/slp-pr56812.cc(revision 257477)
+++ gcc/testsuite/g++.dg/vect/slp-pr56812.cc(working copy)
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_float } */
 /* { dg-require-effective-target vect_hw_misalign } */
-/* { dg-additional-options "-O3 -funroll-loops -fvect-cost-model=dynamic" } */
+/* { dg-additional-options "-O3 -funroll-loops
-fvect-cost-model=dynamic -fopt-info-vec" } */

 class mydata {
 public:
@@ -13,10 +13,7 @@ public:

 void mydata::Set (float x)
 {
-  for (int i=0; i Thanks,
> Bill
>
>
> 2018-02-02  Bill Schmidt  
>
> * g++.dg/vect/slp-pr56812.cc: Convert from DOS newline characters
> to utf-8-unix.  Change to scan "optimized" dump for indications
> that the code was vectorized.
>
>
> Index: gcc/testsuite/g++.dg/vect/slp-pr56812.cc
> ===
> --- gcc/testsuite/g++.dg/vect/slp-pr56812.cc(revision 257352)
> +++ gcc/testsuite/g++.dg/vect/slp-pr56812.cc(working copy)
> @@ -1,22 +1,31 @@
> -/* { dg-do compile } */
> -/* { dg-require-effective-target vect_float } */
> -/* { dg-require-effective-target vect_hw_misalign } */
> -/* { dg-additional-options "-O3 -funroll-loops -fvect-cost-model=dynamic" } 
> */
> -
> -class mydata {
> -public:
> -mydata() {Set(-1.0);}
> -void Set (float);
> -static int upper() {return 8;}
> -float data[8];
> -};
> -
> -void mydata::Set (float x)
> -{
> -  for (int i=0; i -data[i] = x;
> -}
> -
> -/* For targets without vector loop peeling the loop becomes cheap
> -   enough to be vectorized.  */
> -/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" { 
> xfail { ! vect_peeling_profitable } } } } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect_hw_misalign } */
> +/* { dg-additional-options "-O3 -funroll-loops -fvect-cost-model=dynamic 
> -fdump-tree-optimized" } */
> +
> +class mydata {
> +public:
> +mydata() {Set(-1.0);}
> +void Set (float);
> +static int upper() {return 8;}
> +float data[8];
> +};
> +
> +void mydata::Set (float x)
> +{
> +  for (int i=0; i +data[i] = x;
> +}
> +
> +/* { dg-final { scan-tree-dump "vect_cst__\[0-9\]* = " "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "= vect_cst__\[0-9\]*;" 2 "optimized" } 
> } */
> +
> +/* Expected vectorized output is something like:
> +
> +   [11.11%]:
> +  vect_cst__10 = {x_5(D), x_5(D), x_5(D), x_5(D)};
> +  MEM[(float *)this_4(D)] = vect_cst__10;
> +  MEM[(float *)this_4(D) + 16B] = vect_cst__10;
> +  return;
> +
> +  Could be vectorized either by 

Re: [PR tree-optimization/84047] missing -Warray-bounds on an out-of-bounds index

2018-02-08 Thread Richard Biener
On Thu, Feb 1, 2018 at 6:42 PM, Aldy Hernandez  wrote:
> Since my patch isn't the easy one liner I wanted it to be, perhaps we
> should concentrate on Martin's patch, which is more robust, and has
> testcases to boot!  His patch from last week also fixes a couple other
> PRs.
>
> Richard, would this be acceptable?  That is, could you or Jakub review
> Martin's all-encompassing patch?  If so, I'll drop mine.

Sorry, no - this one looks way too complicated.

> Also, could someone pontificate on whether we want to fix
> -Warray-bounds regressions for this release cycle?

Remove bogus ones?  Yes.  Add "missing ones"?  No.

Richard.

> Thanks.
>
> On Wed, Jan 31, 2018 at 6:05 AM, Richard Biener
>  wrote:
>> On Tue, Jan 30, 2018 at 11:11 PM, Aldy Hernandez  wrote:
>>> Hi!
>>>
>>> [Note: Jakub has mentioned that missing -Warray-bounds regressions should be
>>> punted to GCC 9.  I think this particular one is easy pickings, but if this
>>> and/or the rest of the -Warray-bounds regressions should be marked as GCC 9
>>> material, please let me know so we can adjust all relevant PRs.]
>>>
>>> This is a -Warray-bounds regression that happens because the IL now has an
>>> MEM_REF instead on ARRAY_REF.
>>>
>>> Previously we had an ARRAY_REF we could diagnose:
>>>
>>>   D.2720_5 = "12345678"[1073741824];
>>>
>>> But now this is represented as:
>>>
>>>   _1 = MEM[(const char *)"12345678" + 1073741824B];
>>>
>>> I think we can just allow check_array_bounds() to handle MEM_REF's and
>>> everything should just work.
>>>
>>> The attached patch fixes both regressions mentioned in the PR.
>>>
>>> Tested on x86-64 Linux.
>>>
>>> OK?
>>
>> This doesn't look correct.  You lump MEM_REF handling together with
>> ADDR_EXPR handling but for the above case you want to diagnose
>> _dereferences_ not address-taking.
>>
>> For the dereference case you need to amend the ARRAY_REF case, for example
>> via
>>
>> Index: gcc/tree-vrp.c
>> ===
>> --- gcc/tree-vrp.c  (revision 257181)
>> +++ gcc/tree-vrp.c  (working copy)
>> @@ -5012,6 +5012,13 @@ check_array_bounds (tree *tp, int *walk_
>>if (TREE_CODE (t) == ARRAY_REF)
>>  vrp_prop->check_array_ref (location, t, false /*ignore_off_by_one*/);
>>
>> +  else if (TREE_CODE (t) == MEM_REF
>> +  && TREE_CODE (TREE_OPERAND (t, 0)) == ADDR_EXPR
>> +  && TREE_CODE (TREE_OPERAND (TREE_OPERAND (t, 0), 0)) == 
>> STRING_CST)
>> +{
>> +  call factored part of check_array_ref passing in STRING_CST and offset
>> +}
>> +
>>else if (TREE_CODE (t) == ADDR_EXPR)
>>  {
>>vrp_prop->search_for_addr_array (t, location);
>>
>> note your patch will fail to warn for "1"[1] because taking that
>> address is valid but not
>> dereferencing it.
>>
>> Richard.


Re: C++ PATCH to fix ICE with vector expr folding (PR c++/83659)

2018-02-08 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Wed, Feb 07, 2018 at 03:23:25PM -0500, Jason Merrill wrote:
>> On Wed, Feb 7, 2018 at 2:48 PM, Jakub Jelinek  wrote:
>> > On Wed, Feb 07, 2018 at 08:36:31PM +0100, Marek Polacek wrote:
>> >> > > That was my first patch, but it was rejected:
>> >> > > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00271.html
>> >> >
>> >> > Then should we update fold_indirect_ref_1 to use the new code?  Is
>> >> > there a reason for them to stay out of sync?
>> >>
>> >> One of the reasons is that middle end uses poly_uint64 type but the
>> >> front ends
>> >> shouldn't use them.  So some of these functions will unfortunately differ.
>> >
>> > Yeah.  Part of the patch makes the two implementations slightly more
>> > similar, but I have e.g. no idea how to test for poly_uint64 that fits
>> > also in poly_int64 and the poly_int* stuff makes the two substantially
>> > different in any case.
>> 
>> Hmm.  Well, that seems rather unfortunate.  Why shouldn't the front
>> ends use them?  Can we make an exception for this function because
>> it's supposed to mirror a middle-end function?
>> Should we try to push this function back into the middle end?
>
> The function comment seems to explain the reasons:
> /* A less strict version of fold_indirect_ref_1, which requires cv-quals to
>match.  We want to be less strict for simple *& folding; if we have a
>non-const temporary that we access through a const pointer, that should
>work.  We handle this here rather than change fold_indirect_ref_1
>because we're dealing with things like ADDR_EXPR of INTEGER_CST which
>don't really make sense outside of constant expression evaluation.  Also
>we want to allow folding to COMPONENT_REF, which could cause trouble
>with TBAA in fold_indirect_ref_1.
>
>Try to keep this function synced with fold_indirect_ref_1.  */
>
> E.g. the constexpr function uses same_type_ignoring_top_level_qualifiers_p
> instead of == type comparisons, the COMPONENT_REF stuff, ...
>
> For poly_* stuff, I think Richard S. wants to introduce it into the FEs at
> some point, but I could be wrong; certainly it hasn't been done yet and
> generally, poly*int seems to be a nightmare to deal with.

There's no problem with FEs using poly_int now if they want to,
or if it happens to be more convenient in a particular context
(which seems to be the case here to avoid divergence).  It's more
that FEs don't need to go out of their way to handle poly_int,
since the FE can't yet introduce any cases in which the poly_ints
would be nonconstant.

In practice FEs do already use poly_int directly (and indirectly
via support routines).

Thanks,
Richard


[PATCH] Fix PR84278

2018-02-08 Thread Richard Biener

Noticed while (still...) working on PR84038.  The vectorizer happily
tries to construct a V4SFmode from two V2SFmode vectors because
there's an optab handler for it.  But it failed to check whether
that mode is supported and RTL expansion later uses TYPE_MODE
to get at the element mode which ends up as BLKmode and thus
we go through the stack...

So this makes the vectorizer test targetm.vector_mode_supported_p
as well before making use of such types.  In the above case the
vectorizer then resorts to using two DImode scalars instead.
I've verified that's still faster than doing four SFmode scalar
loads despite whatever reformatting penalty that might occur.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

For PR84038 this makes a difference when compiling with
-mprefer-avx128 -fno-vect-cost-model.

Richard.

2018-02-08  Richard Biener  

PR tree-optimization/84278
* tree-vect-stmts.c (vectorizable_store): When looking for
smaller vector types to perform grouped strided loads/stores
make sure the mode is supported by the target.
(vectorizable_load): Likewise.

* gcc.target/i386/pr84278.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 257477)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -6510,6 +6558,7 @@ vectorizable_store (gimple *stmt, gimple
  machine_mode vmode;
  if (!mode_for_vector (elmode, group_size).exists ()
  || !VECTOR_MODE_P (vmode)
+ || !targetm.vector_mode_supported_p (vmode)
  || (convert_optab_handler (vec_extract_optab,
 TYPE_MODE (vectype), vmode)
  == CODE_FOR_nothing))
@@ -6528,6 +6577,7 @@ vectorizable_store (gimple *stmt, gimple
 element size stores.  */
  if (mode_for_vector (elmode, lnunits).exists ()
  && VECTOR_MODE_P (vmode)
+ && targetm.vector_mode_supported_p (vmode)
  && (convert_optab_handler (vec_extract_optab,
 vmode, elmode)
  != CODE_FOR_nothing))
@@ -7573,6 +7633,7 @@ vectorizable_load (gimple *stmt, gimple_
  machine_mode vmode;
  if (mode_for_vector (elmode, group_size).exists ()
  && VECTOR_MODE_P (vmode)
+ && targetm.vector_mode_supported_p (vmode)
  && (convert_optab_handler (vec_init_optab,
 TYPE_MODE (vectype), vmode)
  != CODE_FOR_nothing))
@@ -7598,6 +7659,7 @@ vectorizable_load (gimple *stmt, gimple_
 element loads of the original vector type.  */
  if (mode_for_vector (elmode, lnunits).exists ()
  && VECTOR_MODE_P (vmode)
+ && targetm.vector_mode_supported_p (vmode)
  && (convert_optab_handler (vec_init_optab, vmode, elmode)
  != CODE_FOR_nothing))
{
Index: gcc/testsuite/gcc.target/i386/pr84278.c
===
--- gcc/testsuite/gcc.target/i386/pr84278.c (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr84278.c (working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -msse2" } */
+
+float A[1024];
+float B[1024];
+int s;
+
+void foo(void)
+{
+  int i;
+  for (i = 0; i < 128; i++)
+{
+  B[i*2+0] = A[i*s+0];
+  B[i*2+1] = A[i*s+1];
+}
+}
+
+/* { dg-final { scan-assembler-not "\(%.sp\)" } } */


  1   2   >