[PATCH] sccvn: Handle bitfields in vn_reference_lookup_3 [PR93582]

2020-02-12 Thread Jakub Jelinek
Hi!

The following patch is first step towards fixing PR93582.
vn_reference_lookup_3 right now punts on anything that isn't byte aligned,
so to be able to lookup a constant bitfield store, one needs to use
the exact same COMPONENT_REF, otherwise it isn't found.

This patch lifts up that that restriction if the bits to be loaded are
covered by a single store of a constant (keeps the restriction so far
for the multiple store case, can tweak that incrementally, but I think
for bisection etc. it is worth to do it one step at a time).

Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux
and powerpc64-linux (the last one regtested with \{-m32,-m64\}).
Additionally, I've bootstrapped/regtested it on those with statistics
gathering on when val was non-NULL (i.e. we managed to see something
through) when any of offseti, offset2i, maxsizei or size2i wasn't multiple
of BITS_PER_UNIT (i.e. when this optimization triggered).  Below is
a list of the unique cases where it triggered, the first column says on
which of the 5 ABIs it triggered, qmath stands for those that enable
libquadmath, i.e. ia32,x86_64,ppc64le.

Ok for trunk?

all ../../gcc/gimple-ssa-backprop.c {anonymous}::backprop::process_var
all gcc/gcc/testsuite/gcc.c-torture/compile/20191015-1.c f
all gcc/gcc/testsuite/gcc.c-torture/compile/20191015-2.c f
all gcc/gcc/testsuite/gcc.c-torture/compile/20200105-1.c g
all gcc/gcc/testsuite/gcc.c-torture/compile/20200105-2.c g
all gcc/gcc/testsuite/gcc.c-torture/compile/20200105-3.c g
all gcc/gcc/testsuite/gcc.c-torture/execute/20181120-1.c main
all gcc/gcc/testsuite/gcc.c-torture/execute/921204-1.c main
all gcc/gcc/testsuite/gcc.c-torture/execute/pr58726.c main
all gcc/gcc/testsuite/gcc.c-torture/execute/pr86492.c foo
all gcc/gcc/testsuite/gcc.dg/store_merging_24.c foo
all gcc/gcc/testsuite/gcc.dg/store_merging_25.c foo
all gcc/gcc/testsuite/gcc.dg/tree-ssa/pr93582-1.c foo
all gcc/gcc/testsuite/gcc.dg/tree-ssa/pr93582-2.c foo
all gcc/gcc/testsuite/gcc.dg/tree-ssa/pr93582-3.c foo
all gcc/gcc/testsuite/g++.dg/other/pr89692.C foo
all ../../../libgcc/unwind-dw2-fde-dip.c init_object
all ../../../libgcc/unwind-dw2-fde-dip.c __register_frame_info
all ../../../libgcc/unwind-dw2-fde-dip.c __register_frame_info_bases
all ../../../libgcc/unwind-dw2-fde-dip.c __register_frame_info_table
all ../../../libgcc/unwind-dw2-fde-dip.c __register_frame_info_table_bases
all ../../../libgcc/unwind-dw2-fde-dip.c __register_frame_table
all ../../../libgcc/unwind-dw2-fde-dip.c search_object
all ../../../libgcc/unwind-dw2-fde-dip.c _Unwind_IteratePhdrCallback
ia32 /tmp/921204-1.exe.NdZMiZ.ltrans0.o main
ia32 /tmp/ccK7HiiN.o main
ia32 /tmp/ccWtcuID.o foo
ia32 /tmp/pr86492.exe.IKs2SA.ltrans0.o foo
lp64 gcc/gcc/testsuite/gnat.dg/pack22.adb pack22
ppc32 /tmp/921204-1.exe.1GoDpE.ltrans0.o main
ppc32 /tmp/ccivx4sg.o foo
ppc32 /tmp/ccLFNqjQ.o main
ppc32 /tmp/pr86492.exe.VkJDwY.ltrans0.o foo
ppc32 /tmp/pr88739.exe.y33n0X.ltrans0.o main
ppc64le cd2a32a.adb cd2a32a
ppc64le /tmp/921204-1.exe.la923O.ltrans0.o main
ppc64le /tmp/ccH3NcwE.o main
ppc64le /tmp/ccmHysWx.o foo
ppc64le /tmp/pr86492.exe.EYd24l.ltrans0.o foo
ppc64le /tmp/pr88739.exe.uAT6pA.ltrans0.o main
ppc64 /tmp/921204-1.exe.vtoTSo.ltrans0.o main
ppc64 /tmp/ccm4RGtK.o main
ppc64 /tmp/cczZpRCD.o foo
ppc64 /tmp/pr86492.exe.UdbN8I.ltrans0.o foo
ppc64 /tmp/pr88739.exe.DtQe1H.ltrans0.o main
qmath ../../../libquadmath/math/expq.c expq
qmath ../../../libquadmath/math/fmaq.c fmaq
qmath ../../../libquadmath/math/nanq.c nanq
qmath ../../../libquadmath/strtod/strtoflt128.c strtoflt128_internal
x86_64 cd2a32a.adb cd2a32a
x86_64 /tmp/921204-1.exe.0059fB.ltrans0.o main
x86_64 /tmp/cci9lHhD.o main
x86_64 /tmp/ccTlWSbV.o foo
x86_64 /tmp/pr86492.exe.NE0yez.ltrans0.o foo
x86_64 /tmp/pr88739.exe.PKCG2M.ltrans0.o main
x86 gcc/gcc/testsuite/gcc.dg/torture/pr45017.c main
x86 gcc/gcc/testsuite/gcc.target/i386/pr37870.c main

2020-02-13  Jakub Jelinek  

PR tree-optimization/93582
* fold-const.h (shift_bytes_in_array_left,
shift_bytes_in_array_right): Declare.
* fold-const.c (shift_bytes_in_array_left,
shift_bytes_in_array_right): New function, moved from
gimple-ssa-store-merging.c, no longer static.
* gimple-ssa-store-merging.c (shift_bytes_in_array): Move
to gimple-ssa-store-merging.c and rename to shift_bytes_in_array_left.
(shift_bytes_in_array_right): Move to gimple-ssa-store-merging.c.
(encode_tree_to_bitpos): Use shift_bytes_in_array_left instead of
shift_bytes_in_array.
(verify_shift_bytes_in_array): Rename to ...
(verify_shift_bytes_in_array_left): ... this.  Use
shift_bytes_in_array_left instead of shift_bytes_in_array.
(store_merging_c_tests): Call verify_shift_bytes_in_array_left
instead of verify_shift_bytes_in_array.
* tree-ssa-sccvn.c (vn_reference_lookup_3): For native_encode_expr
/ native_interpret_expr where the store covers all needed 

Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-02-12 Thread Richard Biener
On Wed, 12 Feb 2020, Segher Boessenkool wrote:

> On Wed, Feb 12, 2020 at 11:53:22AM +0100, Richard Biener wrote:
> > On Wed, 12 Feb 2020, Segher Boessenkool wrote:
> > > On Wed, Feb 12, 2020 at 09:12:58AM +0100, Richard Biener wrote:
> > > > On Tue, 11 Feb 2020, Segher Boessenkool wrote:
> > > > > Basic block partitioning has wildly disproportionate fallout in all
> > > > > later passes, both in terms of what those *do* (or don't, if 
> > > > > partitioning
> > > > > is enabled), and of impact on the code (not to mention developer 
> > > > > time).
> > > > > 
> > > > > Maybe the implementation can be improved, but probably we should do 
> > > > > this
> > > > > in a different way altogether.  The current situation is not good.
> > > > 
> > > > I think the expectation that you can go back to CFG layout mode
> > > > and then work with CFG layout tools after we've lowered to CFG RTL
> > > > is simply bogus.
> > > 
> > > Partitioning is also quite problematic if you do not use cfglayout
> > > mode.  For example, in shrink-wrapping.  It prevents a lot there.
> > > 
> > > > Yeah, you can probably do analysis things but
> > > > I wouldn't be surprised if a CFG RTL -> CFG layout -> CFG RTL cycle 
> > > > can wreck things.  Undoubtedly doing CFG manipulations is not going
> > > > to work since CFG layout does not respect CFG RTL restrictions.
> > > 
> > > Doing CFG manipulations on CFG RTL mode directly is almost impossible
> > > to do correctly.
> > > 
> > > For example, bb-reorder.  Which is a much more important optimisation
> > > than partitioning, btw.
> > 
> > BB reorder switches back and forth as well ... :/
> 
> Yes.  It is extremely hard to change any jumps in cfgrtl mode.
> 
> The goal is to use cfglayout mode more, ideally *always*, not to use
> it less!

Sure!  It must be that split requires CFG RTL (it doesn't say so, but
we don't have a PROP_rtllayout or a "cannot work with" set of
properties).  Otherwise I'm not sure why we go out of CFG layout
mode so early.  Passes not messing with the CFG at all should be
ignorant of what mode we are in.

> > I think both are closely enough related that we probably should do
> > partitioning from within the same framework?  OTOH BB reorder
> > happens _much_ later.
> 
> This may be doable.  What we need to do first though is to find a better
> thing to use than EDGE_CROSSING.
> 
> Maybe we can determine which blocks should be hot and cold early, but
> actually make that happen only very late (maybe adjusting the decision
> if we have to to make things work)?  That way, intervening passes do not
> have to care (much).

The question is whether we can even preserve things like BB counts
and edge frequencies once we've gone out of CFG layout mode for once.

> > > > Partitioning simply uncovered latent bugs, there's nothing wrong
> > > > with it IMHO.
> > > 
> > > I don't agree.  The whole way EDGE_CROSSING works hinders all other
> > > optimisations a lot.
> > 
> > I'm not sure if it's there for correctness reasons or just to
> > help checking that nothing "undoes" the partitioning decision.
> 
> The latter.  (It may also make it easier for the former, of course, but
> that can be solved anyway).  And that makes live very hard for all later
> passes, while it is doubtful that it even give the best decisions: for
> example it prevents a lot of shrink-wrapping, which you dearly *want* to
> do on cold code!

Sure.  So do that earlier ;)

Richard.


testsuite: Fix g++.dg/analyzer/pr93212.C with check-c++-all

2020-02-12 Thread Jakub Jelinek
On Thu, Feb 06, 2020 at 03:27:29PM -0500, David Malcolm wrote:
> gcc/testsuite/ChangeLog:
>   PR analyzer/93212
>   * g++.dg/analyzer/analyzer.exp: New subdirectory and .exp suite.
>   * g++.dg/analyzer/malloc.C: New test.
>   * g++.dg/analyzer/pr93212.C: New test.

The test FAILs with c++11:
.../gcc/testsuite/g++.dg/analyzer/pr93212.C:4:1: error: 'lol' function uses 
'auto' type specifier without trailing return type
.../gcc/testsuite/g++.dg/analyzer/pr93212.C:4:1: note: deduced return type only 
available with '-std=c++14' or '-std=gnu++14'

Fixed thusly, regtested on x86_64-linux, committed to trunk as obvious.

2020-02-13  Jakub Jelinek  

* g++.dg/analyzer/pr93212.C: Require c++14 rather than c++11.

--- gcc/testsuite/g++.dg/analyzer/pr93212.C
+++ gcc/testsuite/g++.dg/analyzer/pr93212.C
@@ -1,4 +1,4 @@
-// { dg-do compile { target c++11 } }
+// { dg-do compile { target c++14 } }
 
 #include 
 auto lol()

Jakub



[PATCH] Fix bug in recursiveness check for function to be cloned (ipa/pr93707)

2020-02-12 Thread Feng Xue OS
I've submitted a bug tracker, 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93707.

The root cause is that for a self-recursive function, a for-all-contexts clone 
could generate
an edge whose callee is not the function. Therefore, to check whether an edge 
stands for a
recursive call during cloning, we should not use ordinary way of comparing 
caller and callee
of the edge.

Bootstrapped/regtested on x86_64-linux and aarch64-linux, and also tested Spec 
2017 with
option "--param ipa-cp-eval-threshold=1". 

Feng
---
2020-02-13  Feng Xue  

PR ipa/93707
* ipa-cp.c (self_recursive_pass_through_p): Add a new parameter "node",
and check recursiveness by comparing "node" and cs->caller.
(self_recursive_agg_pass_through_p): Likewise.
(find_more_scalar_values_for_callers_subset): Add "node" argument to
self_recursive_pass_through_p call.
(intersect_aggregates_with_edge): Add a new parameter "node", and add
"node" argument to self_cursive_agg_pass_through_p call.
(find_aggregate_values_for_callers_subset): Add "node" argument to
self_recursive_pass_through_p and intersect_aggregates_with_edge calls.
(cgraph_edge_brings_all_agg_vals_for_node): Add "node" argument to
intersect_aggregates_with_edge call.

> 
> From: gcc-patches-ow...@gcc.gnu.org 
> on behalf of Tamar Christina 
> Sent: Tuesday, February 11, 2020 6:05 PM
> To: Feng Xue OS; Martin Jambor; Jan Hubicka; gcc-patches@gcc.gnu.org
> Cc: nd
> Subject: RE: [PATCH V2] Generalized value pass-through for self-recursive
> function (ipa/pr93203)
>
> Hi Feng,
>
> This patch (commit a0f6a8cb414b687f22c9011a894d5e8e398c4be0) is causing
> ICEs in the GCC and perlbench benchmark in Spec2017.
>
> during IPA pass: cp
> lto1: internal compiler error: in find_more_scalar_values_for_callers_subset,
> at ipa-cp.c:4709
> 0x1698187 find_more_scalar_values_for_callers_subset
> ../.././gcc/ipa-cp.c:4709
> 0x169f7d3 decide_about_value
> ../.././gcc/ipa-cp.c:5490
> 0x169fdc3 decide_whether_version_node
> ../.././gcc/ipa-cp.c:5537
> 0x169fdc3 ipcp_decision_stage
> ../.././gcc/ipa-cp.c:5718
> 0x169fdc3 ipcp_driver
> ../.././gcc/ipa-cp.c:5901
> Please submit a full bug report,
>
> Thanks,
> Tamar
>

0001-Fix-bug-in-recursiveness-check-for-function-in-clone.patch
Description:  0001-Fix-bug-in-recursiveness-check-for-function-in-clone.patch


[PATCH] config: import pkg.m4 from pkg-config

2020-02-12 Thread Mike Frysinger
We use this in the sim tree currently.  Rather than require people to
have pkg-config installed, include it in the config/ dir.

2012-12-23  Mike Frysinger  

* pkg.m4: New file from pkg-config-0.29.2.
---
 config/pkg.m4 | 275 ++
 1 file changed, 275 insertions(+)
 create mode 100644 config/pkg.m4

diff --git a/config/pkg.m4 b/config/pkg.m4
new file mode 100644
index ..13a889017866
--- /dev/null
+++ b/config/pkg.m4
@@ -0,0 +1,275 @@
+# pkg.m4 - Macros to locate and utilise pkg-config.   -*- Autoconf -*-
+# serial 12 (pkg-config-0.29.2)
+
+dnl Copyright ?? 2004 Scott James Remnant .
+dnl Copyright ?? 2012-2015 Dan Nicholson 
+dnl
+dnl This program is free software; you can redistribute it and/or modify
+dnl it under the terms of the GNU General Public License as published by
+dnl the Free Software Foundation; either version 2 of the License, or
+dnl (at your option) any later version.
+dnl
+dnl This program is distributed in the hope that it will be useful, but
+dnl WITHOUT ANY WARRANTY; without even the implied warranty of
+dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+dnl General Public License for more details.
+dnl
+dnl You should have received a copy of the GNU General Public License
+dnl along with this program; if not, write to the Free Software
+dnl Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
+dnl 02111-1307, USA.
+dnl
+dnl As a special exception to the GNU General Public License, if you
+dnl distribute this file as part of a program that contains a
+dnl configuration script generated by Autoconf, you may include it under
+dnl the same distribution terms that you use for the rest of that
+dnl program.
+
+dnl PKG_PREREQ(MIN-VERSION)
+dnl ---
+dnl Since: 0.29
+dnl
+dnl Verify that the version of the pkg-config macros are at least
+dnl MIN-VERSION. Unlike PKG_PROG_PKG_CONFIG, which checks the user's
+dnl installed version of pkg-config, this checks the developer's version
+dnl of pkg.m4 when generating configure.
+dnl
+dnl To ensure that this macro is defined, also add:
+dnl m4_ifndef([PKG_PREREQ],
+dnl [m4_fatal([must install pkg-config 0.29 or later before running 
autoconf/autogen])])
+dnl
+dnl See the "Since" comment for each macro you use to see what version
+dnl of the macros you require.
+m4_defun([PKG_PREREQ],
+[m4_define([PKG_MACROS_VERSION], [0.29.2])
+m4_if(m4_version_compare(PKG_MACROS_VERSION, [$1]), -1,
+[m4_fatal([pkg.m4 version $1 or higher is required but 
]PKG_MACROS_VERSION[ found])])
+])dnl PKG_PREREQ
+
+dnl PKG_PROG_PKG_CONFIG([MIN-VERSION])
+dnl --
+dnl Since: 0.16
+dnl
+dnl Search for the pkg-config tool and set the PKG_CONFIG variable to
+dnl first found in the path. Checks that the version of pkg-config found
+dnl is at least MIN-VERSION. If MIN-VERSION is not specified, 0.9.0 is
+dnl used since that's the first version where most current features of
+dnl pkg-config existed.
+AC_DEFUN([PKG_PROG_PKG_CONFIG],
+[m4_pattern_forbid([^_?PKG_[A-Z_]+$])
+m4_pattern_allow([^PKG_CONFIG(_(PATH|LIBDIR|SYSROOT_DIR|ALLOW_SYSTEM_(CFLAGS|LIBS)))?$])
+m4_pattern_allow([^PKG_CONFIG_(DISABLE_UNINSTALLED|TOP_BUILD_DIR|DEBUG_SPEW)$])
+AC_ARG_VAR([PKG_CONFIG], [path to pkg-config utility])
+AC_ARG_VAR([PKG_CONFIG_PATH], [directories to add to pkg-config's search path])
+AC_ARG_VAR([PKG_CONFIG_LIBDIR], [path overriding pkg-config's built-in search 
path])
+
+if test "x$ac_cv_env_PKG_CONFIG_set" != "xset"; then
+   AC_PATH_TOOL([PKG_CONFIG], [pkg-config])
+fi
+if test -n "$PKG_CONFIG"; then
+   _pkg_min_version=m4_default([$1], [0.9.0])
+   AC_MSG_CHECKING([pkg-config is at least version $_pkg_min_version])
+   if $PKG_CONFIG --atleast-pkgconfig-version $_pkg_min_version; then
+   AC_MSG_RESULT([yes])
+   else
+   AC_MSG_RESULT([no])
+   PKG_CONFIG=""
+   fi
+fi[]dnl
+])dnl PKG_PROG_PKG_CONFIG
+
+dnl PKG_CHECK_EXISTS(MODULES, [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND])
+dnl ---
+dnl Since: 0.18
+dnl
+dnl Check to see whether a particular set of modules exists. Similar to
+dnl PKG_CHECK_MODULES(), but does not set variables or print errors.
+dnl
+dnl Please remember that m4 expands AC_REQUIRE([PKG_PROG_PKG_CONFIG])
+dnl only at the first occurence in configure.ac, so if the first place
+dnl it's called might be skipped (such as if it is within an "if", you
+dnl have to call PKG_CHECK_EXISTS manually
+AC_DEFUN([PKG_CHECK_EXISTS],
+[AC_REQUIRE([PKG_PROG_PKG_CONFIG])dnl
+if test -n "$PKG_CONFIG" && \
+AC_RUN_LOG([$PKG_CONFIG --exists --print-errors "$1"]); then
+  m4_default([$2], [:])
+m4_ifvaln([$3], [else
+  $3])dnl
+fi])
+
+dnl _PKG_CONFIG([VARIABLE], [COMMAND], [MODULES])
+dnl -
+dnl Internal wrapper calling pkg-config via PKG_CONFIG and setting
+dnl pkg_failed based on 

[PATCH] libiberty.h: punt duplicate strverscmp prototype

2020-02-12 Thread Mike Frysinger
SVN r216772 accidentally copied & pasted this prototype when adding
other ones nearby.

2020-02-13  Mike Frysinger  

* libiberty.h (strverscmp): Delete duplicate prototype.
---
 include/ChangeLog   | 4 
 include/libiberty.h | 5 -
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/include/ChangeLog b/include/ChangeLog
index 3f9382d9ad4c..5094f6ce742f 100644
--- a/include/ChangeLog
+++ b/include/ChangeLog
@@ -1,3 +1,7 @@
+2020-02-13  Mike Frysinger  
+
+   * libiberty.h (strverscmp): Delete duplicate prototype.
+
 2020-02-05  Andrew Burgess  
 
* hashtab.h (htab_remove_elt): Make a parameter const.
diff --git a/include/libiberty.h b/include/libiberty.h
index 141cb886a850..08ede9889213 100644
--- a/include/libiberty.h
+++ b/include/libiberty.h
@@ -706,11 +706,6 @@ extern unsigned long long int strtoull (const char *nptr,
 char **endptr, int base);
 #endif
 
-#if defined(HAVE_DECL_STRVERSCMP) && !HAVE_DECL_STRVERSCMP
-/* Compare version strings.  */
-extern int strverscmp (const char *, const char *);
-#endif
-
 /* Set the title of a process */
 extern void setproctitle (const char *name, ...);
 
-- 
2.23.0



[fixincludes] skip fixinc on vxworks7*, amend mkheaders

2020-02-12 Thread Alexandre Oliva


vxworks7 headers haven't required fixes, and we've long avoided
running fixinc on them.

The problem with that is that, with a dummy fixinc, mkheaders wipes
out include-fixed but then multi_dir subdirs are not created again, so
we end up with a limits.h named after each multi_dir, when there are
non-default multilibs.  Oops.

This patch arranges for a dummy fixinc to be created for *-*-vxworks7*
targets, and fixes mkheaders so as to create multi_dir subdirs in
include-fixed after wiping them out, and to copy limits.h so that it
won't take the name that should be of a subdir (unless the multi_dir
is limits.h, but that's hopefully never the case ;-)

This was tested on x86_64-linux-gnu (no changes to include-fixed there,
as expected), and with various of AdaCore's vx6 and vx7 targets,
including ones with and without multilibs.  Ok to install?


for  fixincludes/ChangeLog

* mkheaders.in: Re-create subdirs, copy limits.h into subdir.
* mkfixinc.sh: Create dummy fixinc for *-*-vxworks7*.
---
 fixincludes/mkfixinc.sh  |1 +
 fixincludes/mkheaders.in |3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fixincludes/mkfixinc.sh b/fixincludes/mkfixinc.sh
index 0f96486..df90720 100755
--- a/fixincludes/mkfixinc.sh
+++ b/fixincludes/mkfixinc.sh
@@ -19,6 +19,7 @@ case $machine in
 powerpc-*-rtems*   | \
 powerpcle-*-eabisim* | \
 powerpcle-*-eabi* | \
+*-*-vxworks7* | \
 *-musl* )
#  IF there is no include fixing,
#  THEN create a no-op fixer and exit
diff --git a/fixincludes/mkheaders.in b/fixincludes/mkheaders.in
index 9109b05..a293a57 100644
--- a/fixincludes/mkheaders.in
+++ b/fixincludes/mkheaders.in
@@ -86,6 +86,7 @@ for ml in `cat ${itoolsdatadir}/fixinc_list`; do
   sysroot_headers_suffix=`echo ${ml} | sed -e 's/;.*$//'`
   multi_dir=`echo ${ml} | sed -e 's/^[^;]*;//'`
   subincdir=${incdir}${multi_dir}
+  ${mkinstalldirs} ${subincdir}
   . ${itoolsdatadir}/mkheaders.conf
   if [ x${STMP_FIXINC} != x ] ; then
TARGET_MACHINE="${target}" target_canonical="${target}" \
@@ -100,5 +101,5 @@ for ml in `cat ${itoolsdatadir}/fixinc_list`; do
fi
   fi
 
-  cp ${itoolsdatadir}/include${multi_dir}/limits.h ${subincdir}
+  cp ${itoolsdatadir}/include${multi_dir}/limits.h ${subincdir}/limits.h
 done


-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist   Stallman was right, but he's left :(
GNU Toolchain EngineerFSMatrix: It was he who freed the first of us
FSF & FSFLA board memberThe Savior shall return (true);


[PATCH]Several intrinsic macros lack a closing parenthesis[PR93274]

2020-02-12 Thread Hongtao Liu
Hi
  As mentioned in PR93724, several intrinsic macros lack a closing
parenthesis. These macros are only used with -O0 option, and currently
unit tests use -O2, so not covered.
  Bootstrap ok, regression tests on i386/x86_64 is ok.
  Ok for trunk?

Changelog
gcc/
* config/i386/avx512vbmi2intrin.h
(_mm512_[,mask_,maskz_]shrdi_epi16,
_mm512_[,mask_,maskz_]shrdi_epi32,
_m512_[,mask_,maskz_]shrdi_epi64,
_mm512_[,mask_,maskz_]shldi_epi16,
_mm512_[,mask_,maskz_]shldi_epi32,
_m512_[,mask_,maskz_]shldi_epi64): Fix typo of lacking a
closing parenthesis.
* config/i386/avx512vbmi2vlintrin.h
(_mm256_[,mask_,maskz_]shrdi_epi16,
_mm256_[,mask_,maskz_]shrdi_epi32,
_m256_[,mask_,maskz_]shrdi_epi64,
_mm_[,mask_,maskz_]shrdi_epi16,
_mm_[,mask_,maskz_]shrdi_epi32,
_mm_[,mask_,maskz_]shrdi_epi64,
_mm256_[,mask_,maskz_]shldi_epi16,
_mm256_[,mask_,maskz_]shldi_epi32,
_m256_[,mask_,maskz_]shldi_epi64,
_mm_[,mask_,maskz_]shldi_epi16,
_mm_[,mask_,maskz_]shldi_epi32,
_mm_[,mask_,maskz_]shldi_epi64): Ditto.

gcc/testsuite/
* gcc.target/i386/avx512vbmi2-vpshld-1.c: New test.
* gcc.target/i386/avx512vbmi2-vpshld-O0-1.c: Ditto.
* gcc.target/i386/avx512vbmi2-vpshrd-1.c: Ditto.
* gcc.target/i386/avx512vbmi2-vpshrd-O0-1.c: Ditto.
* gcc.target/i386/avx512vl-vpshld-O0-1.c: Ditto.
* gcc.target/i386/avx512vl-vpshrd-O0-1.c: Ditto.

-- 
BR,
Hongtao


0001-Intrinsic-macro-of-vpshr-and-vpshl-lack-a-closing-pa.patch
Description: Binary data


[committed] c++: Fix constexpr if and braced functional cast.

2020-02-12 Thread Jason Merrill
While partially instantiating a generic lambda, we can encounter pack
expansions or constexpr if where we can't actually do the substitution
immediately, and instead remember a partial instantiation context
in *_EXTRA_ARGS.  This includes any local_specializations used in the
pattern or condition.  In this testcase our tree walk wasn't finding the use
of i because we weren't walking into the type of a CONSTRUCTOR.  Fixed by
moving the code for doing that from find_parameter_packs_r into
cp_walk_subtrees.

Tested x86_64-pc-linux-gnu, applying to trunk.

2020-02-11  Jason Merrill  

PR c++/92583
PR c++/92654
* tree.c (cp_walk_subtrees): Walk CONSTRUCTOR types here.
* pt.c (find_parameter_packs_r): Not here.
---
 gcc/cp/pt.c   |  3 ---
 gcc/cp/tree.c |  5 
 gcc/testsuite/g++.dg/cpp0x/nondeduced7.C  |  2 +-
 .../g++.dg/cpp1z/constexpr-if-lambda3.C   | 24 +++
 4 files changed, 30 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-if-lambda3.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index c2d3a98b1c5..6e7f4555da8 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -3924,9 +3924,6 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
 case TEMPLATE_DECL:
   if (!DECL_TEMPLATE_TEMPLATE_PARM_P (t))
return NULL_TREE;
-  gcc_fallthrough();
-
-case CONSTRUCTOR:
   cp_walk_tree (_TYPE (t),
_parameter_packs_r, ppd, ppd->visited);
   return NULL_TREE;
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index eb540f851ee..736ef6fe667 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -5024,6 +5024,11 @@ cp_walk_subtrees (tree *tp, int *walk_subtrees_p, 
walk_tree_fn func,
   *walk_subtrees_p = 0;
   break;
 
+case CONSTRUCTOR:
+  if (COMPOUND_LITERAL_P (*tp))
+   WALK_SUBTREE (TREE_TYPE (*tp));
+  break;
+
 case TRAIT_EXPR:
   WALK_SUBTREE (TRAIT_EXPR_TYPE1 (*tp));
   WALK_SUBTREE (TRAIT_EXPR_TYPE2 (*tp));
diff --git a/gcc/testsuite/g++.dg/cpp0x/nondeduced7.C 
b/gcc/testsuite/g++.dg/cpp0x/nondeduced7.C
index a8aa073c57e..eb5d5faf493 100644
--- a/gcc/testsuite/g++.dg/cpp0x/nondeduced7.C
+++ b/gcc/testsuite/g++.dg/cpp0x/nondeduced7.C
@@ -2,5 +2,5 @@
 // { dg-do compile { target c++11 } }
 
 template struct A;
-template struct A {}; // { dg-error "partial 
specialization" }
+template struct A {}; // { dg-error "partial 
specialization|involves template parameter" }
 A a;
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-if-lambda3.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-if-lambda3.C
new file mode 100644
index 000..34615f71ee2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-if-lambda3.C
@@ -0,0 +1,24 @@
+// PR c++/92583
+// { dg-do compile { target c++17 } }
+
+template  struct a {
+  constexpr operator int() { return 42; }
+};
+template  using b = int;
+template  struct e {};
+template  using h = e;
+template  void apply(j f, e) {
+  (f(a{}), ...);
+}
+template  void m(j f) {
+  using k = b;
+  using n = h;
+  apply(f, n{});
+}
+template  void o() {
+  auto p = [](auto i) {
+if constexpr (a{}) ;
+  };
+  m(p);
+}
+auto q() { o<0, 1>; }

base-commit: 1cd9bef89ef25b3fd506cedbc82e8bc00ff56fa8
-- 
2.18.1



Re: [committed] testsuite: Fix up gcc.target/powerpc/pr93122.c test

2020-02-12 Thread Michael Meissner
On Wed, Feb 12, 2020 at 11:27:01PM +0100, Jakub Jelinek wrote:
> On Mon, Feb 10, 2020 at 01:45:42PM -0500, Michael Meissner wrote:
> > This patch renames the PowerPC internal switch -mprefixed-addr to be
> > -mprefixed.
> 
> --- gcc/config/rs6000/rs6000.opt
> +++ gcc/config/rs6000/rs6000.opt
> @@ -570,8 +570,8 @@ mfuture
>  Target Report Mask(FUTURE) Var(rs6000_isa_flags)
>  Use instructions for a future architecture.
>  
> -mprefixed-addr
> -Target Undocumented Mask(PREFIXED_ADDR) Var(rs6000_isa_flags)
> +mprefixed
> +Target Report Mask(PREFIXED) Var(rs6000_isa_flags)
>  Generate (do not generate) prefixed memory instructions.
>  
>  mpcrel
> 
> This change broke the gcc.target/powerpc/pr93122.c test, so it now
> FAIL: gcc.target/powerpc/pr93122.c (test for excess errors)
> Excess errors:
> xgcc: error: unrecognized command-line option '-mprefixed-addr'; did you mean 
> '-mprefixed'?
> 
> Fixed thusly, bootstrapped/regtested on powerpc64le-linux, committed to
> trunk as obvious.

Thanks.  I don't think that test was in the trunk when I did the the bootstrap
for the -mprefixed-addr to -mprefixed option.  I was about to send a similar
patch.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [committed] testsuite: Fix up gcc.target/powerpc/pr93122.c test

2020-02-12 Thread Segher Boessenkool
On Wed, Feb 12, 2020 at 11:27:01PM +0100, Jakub Jelinek wrote:
> On Mon, Feb 10, 2020 at 01:45:42PM -0500, Michael Meissner wrote:
> > This patch renames the PowerPC internal switch -mprefixed-addr to be
> > -mprefixed.

> This change broke the gcc.target/powerpc/pr93122.c test, so it now
> FAIL: gcc.target/powerpc/pr93122.c (test for excess errors)
> Excess errors:
> xgcc: error: unrecognized command-line option '-mprefixed-addr'; did you mean 
> '-mprefixed'?
> 
> Fixed thusly, bootstrapped/regtested on powerpc64le-linux, committed to
> trunk as obvious.

Thanks!


Segher


[committed] testsuite: Fix up gcc.target/powerpc/pr93122.c test

2020-02-12 Thread Jakub Jelinek
On Mon, Feb 10, 2020 at 01:45:42PM -0500, Michael Meissner wrote:
> This patch renames the PowerPC internal switch -mprefixed-addr to be
> -mprefixed.

--- gcc/config/rs6000/rs6000.opt
+++ gcc/config/rs6000/rs6000.opt
@@ -570,8 +570,8 @@ mfuture
 Target Report Mask(FUTURE) Var(rs6000_isa_flags)
 Use instructions for a future architecture.
 
-mprefixed-addr
-Target Undocumented Mask(PREFIXED_ADDR) Var(rs6000_isa_flags)
+mprefixed
+Target Report Mask(PREFIXED) Var(rs6000_isa_flags)
 Generate (do not generate) prefixed memory instructions.
 
 mpcrel

This change broke the gcc.target/powerpc/pr93122.c test, so it now
FAIL: gcc.target/powerpc/pr93122.c (test for excess errors)
Excess errors:
xgcc: error: unrecognized command-line option '-mprefixed-addr'; did you mean 
'-mprefixed'?

Fixed thusly, bootstrapped/regtested on powerpc64le-linux, committed to
trunk as obvious.

2020-02-12  Jakub Jelinek  

* gcc.target/powerpc/pr93122.c: Use -mprefixed instead of
-mprefixed-addr in dg-options.

--- gcc/testsuite/gcc.target/powerpc/pr93122.c
+++ gcc/testsuite/gcc.target/powerpc/pr93122.c
@@ -1,6 +1,6 @@
 /* PR target/93122 */
 /* { dg-do compile { target lp64 } } */
-/* { dg-options "-fstack-clash-protection -mprefixed-addr -mfuture" } */
+/* { dg-options "-fstack-clash-protection -mprefixed -mfuture" } */
 
 void bar (char *);
 

Jakub



Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-02-12 Thread Segher Boessenkool
On Wed, Feb 12, 2020 at 11:53:22AM +0100, Richard Biener wrote:
> On Wed, 12 Feb 2020, Segher Boessenkool wrote:
> > On Wed, Feb 12, 2020 at 09:12:58AM +0100, Richard Biener wrote:
> > > On Tue, 11 Feb 2020, Segher Boessenkool wrote:
> > > > Basic block partitioning has wildly disproportionate fallout in all
> > > > later passes, both in terms of what those *do* (or don't, if 
> > > > partitioning
> > > > is enabled), and of impact on the code (not to mention developer time).
> > > > 
> > > > Maybe the implementation can be improved, but probably we should do this
> > > > in a different way altogether.  The current situation is not good.
> > > 
> > > I think the expectation that you can go back to CFG layout mode
> > > and then work with CFG layout tools after we've lowered to CFG RTL
> > > is simply bogus.
> > 
> > Partitioning is also quite problematic if you do not use cfglayout
> > mode.  For example, in shrink-wrapping.  It prevents a lot there.
> > 
> > > Yeah, you can probably do analysis things but
> > > I wouldn't be surprised if a CFG RTL -> CFG layout -> CFG RTL cycle 
> > > can wreck things.  Undoubtedly doing CFG manipulations is not going
> > > to work since CFG layout does not respect CFG RTL restrictions.
> > 
> > Doing CFG manipulations on CFG RTL mode directly is almost impossible
> > to do correctly.
> > 
> > For example, bb-reorder.  Which is a much more important optimisation
> > than partitioning, btw.
> 
> BB reorder switches back and forth as well ... :/

Yes.  It is extremely hard to change any jumps in cfgrtl mode.

The goal is to use cfglayout mode more, ideally *always*, not to use
it less!

> I think both are closely enough related that we probably should do
> partitioning from within the same framework?  OTOH BB reorder
> happens _much_ later.

This may be doable.  What we need to do first though is to find a better
thing to use than EDGE_CROSSING.

Maybe we can determine which blocks should be hot and cold early, but
actually make that happen only very late (maybe adjusting the decision
if we have to to make things work)?  That way, intervening passes do not
have to care (much).

> > > Partitioning simply uncovered latent bugs, there's nothing wrong
> > > with it IMHO.
> > 
> > I don't agree.  The whole way EDGE_CROSSING works hinders all other
> > optimisations a lot.
> 
> I'm not sure if it's there for correctness reasons or just to
> help checking that nothing "undoes" the partitioning decision.

The latter.  (It may also make it easier for the former, of course, but
that can be solved anyway).  And that makes live very hard for all later
passes, while it is doubtful that it even give the best decisions: for
example it prevents a lot of shrink-wrapping, which you dearly *want* to
do on cold code!


Segher


[committed] Consolidate two H8 peepholes into one peephole using a mode iterator

2020-02-12 Thread Jeff Law
Another minor cleanup for the H8 port.

There was another pattern that handles shortening of comparison
operators.  It handles shortening from HI->QI (the other handled SI->HI 
and SI->QI).

This patch uses a mode iterator to handle them all with one pattern. 
It also installs the correct version of the previous patch (which would
generate RTL with incorrect modes).

Regression tested on h8300-elf with a slew of multilibs :-)

Installing on the trunk.

jeff

Combine the two H8 mode shortening peepholes into a single peephole

* config/h8300/h8300.md (comparison shortening peepholes): Use
a mode iterator to merge the HImode and SImode peepholes.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b566ca4b591..fe42cee9e22 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-02-12  Jeff Law  
+
+   * config/h8300/h8300.md (comparison shortening peepholes): Use
+   a mode iterator to merge the HImode and SImode peepholes.
+
 2020-02-12  Jakub Jelinek  
 
PR middle-end/93663
diff --git a/gcc/config/h8300/h8300.md b/gcc/config/h8300/h8300.md
index dcc9c3682d6..f12eb46306f 100644
--- a/gcc/config/h8300/h8300.md
+++ b/gcc/config/h8300/h8300.md
@@ -5366,38 +5366,13 @@
   {
 operands[5] = GEN_INT (- INTVAL (operands[1]));
   })
-;; Narrow the mode of testing if possible.
 
-(define_peephole2
-  [(set (match_operand:HI 0 "register_operand" "")
-   (and:HI (match_dup 0)
-   (match_operand:HI 1 "const_int_qi_operand" "")))
-   (set (cc0) (compare (match_dup 0)
-  (const_int 0)))
-   (set (pc)
-   (if_then_else (match_operator 3 "eqne_operator"
-  [(cc0) (const_int 0)])
- (label_ref (match_operand 2 "" ""))
- (pc)))]
-  "peep2_reg_dead_p (2, operands[0])"
-  [(set (match_dup 4)
-   (and:QI (match_dup 4)
-   (match_dup 5)))
-   (set (cc0) (compare (match_dup 4)
-  (const_int 0)))
-   (set (pc)
-   (if_then_else (match_op_dup 3 [(cc0) (const_int 0)])
- (label_ref (match_dup 2))
- (pc)))]
-  {
-operands[4] = gen_rtx_REG (QImode, REGNO (operands[0]));
-operands[5] = gen_int_mode (INTVAL (operands[1]), QImode);
-  })
+;; Narrow the mode of testing if possible.
 
 (define_peephole2
-  [(set (match_operand:SI 0 "register_operand" "")
-   (and:SI (match_dup 0)
-   (match_operand:SI 1 "const_int_operand" "")))
+  [(set (match_operand:HSI 0 "register_operand" "")
+   (and:HSI (match_dup 0)
+(match_operand:HSI 1 "const_int_operand" "")))
(set (cc0) (compare (match_dup 0)
   (const_int 0)))
(set (pc)
@@ -5406,11 +5381,10 @@
  (label_ref (match_operand 2 "" ""))
  (pc)))]
   "((const_int_qi_operand (operands[1], QImode)
- || const_int_hi_operand (operands[1], HImode))
+ || (GET_MODE (operands[0]) == SImode
+&& const_int_hi_operand (operands[1], HImode)))
 && peep2_reg_dead_p (2, operands[0]))"
-  [(set (match_dup 4)
-   (and:QI (match_dup 4)
-   (match_dup 5)))
+  [(set (match_dup 4) (match_dup 6))
(set (cc0) (compare (match_dup 4)
   (const_int 0)))
(set (pc)
@@ -5423,6 +5397,7 @@
 mode = const_int_qi_operand (operands[1], QImode) ? QImode : HImode;
 operands[4] = gen_rtx_REG (mode, REGNO (operands[0]));
 operands[5] = gen_int_mode (INTVAL (operands[1]), mode);
+operands[6] = gen_rtx_AND (mode, operands[4], operands[5]);
   })
 
 ;; These triggers right at the end of allocation of locals in the


Re: [PATCH] avoid user-constructible types in reshape_init_array (PR 90938)

2020-02-12 Thread Marek Polacek
On Wed, Feb 12, 2020 at 01:21:58PM -0700, Martin Sebor wrote:
> On 2/11/20 5:28 PM, Jason Merrill wrote:
> > On 2/11/20 9:00 PM, Martin Sebor wrote:
> > > r270155, committed in GCC 9, introduced a transformation that strips
> > > redundant trailing zero initializers from array initializer lists in
> > > order to support string literals as template arguments.
> > > 
> > > The transformation neglected to consider the case of array elements
> > > of trivial class types with user-defined conversion ctors and either
> > > defaulted or deleted default ctors.  (It didn't occur to me that
> > > those qualify as trivial types despite the user-defined ctors.)  As
> > > a result, some valid initialization expressions are rejected when
> > > the explicit zero-initializers are dropped in favor of the (deleted)
> > > default ctor,
> > 
> > Hmm, a type with only a deleted default constructor is not trivial, that
> > should have been OK already.
> 
> For Marek's test case:
>   struct A { A () == delete; A (int) = delete; };
> 
> trivial_type_p() returns true (as does __is_trivial (A) in both GCC
> and Clang).
> 
> [class.prop] says that
> 
>   A trivial class is a class that is trivially copyable and has one
>   or more default constructors (10.3.4.1), all of which are either
>   trivial or deleted and at least one of which is not deleted.
> 
> That sounds like A above is not trivial because it doesn't have
> at least one default ctor that's not deleted, but both GCC and
> Clang say it is.  What am I missing?  Is there some other default
> constructor hiding in there that I don't know about?

Note that [class.prop]/2 now says something other than that:

"A trivial class is a class that is trivially copyable and has one or
more eligible default constructors ([class.default.ctor]), all of which
are trivial."
I think this changed in P0848R3 (Conditionally Trivial Special Member
Functions).

Here A has a default constructor, but it's not eligible: [special]/6 says
"An eligible special member function is a special member function for which:
 -- the function is not deleted,
[...]"

So it seems that A should not be trivial.  But the wording you quoted
also means that: it was changed in
CWG1496 Triviality with deleted and missing default constructors
but we don't implement that (https://gcc.gnu.org/PR85723).

So going further down memory lane, [class.prop] used to say
"A trivial class is a class that has a default constructor (12.1), has no
non-trivial default constructors, and is trivially copyable."
which A is (because it has an implicit copy ctor).

So I think it all comes down to the fact that neither g++ not clang++
implement CWG 1496.  And that's probably GCC 11 work.

Marek



Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-02-12 Thread Segher Boessenkool
On Wed, Feb 12, 2020 at 09:07:27AM +0100, Richard Biener wrote:
> On Tue, 11 Feb 2020, Segher Boessenkool wrote:
> 
> > On Tue, Feb 11, 2020 at 02:58:47PM +0100, Richard Biener wrote:
> > > On Tue, 11 Feb 2020, Roman Zhuykov wrote:
> > > > 11.02.2020 11:01, Richard Biener wrote:
> > > > Sound good, but IMHO modulo scheduler is not the best choice to be the
> > > > first step implementing such a concept.
> > > 
> > > True ;)   But since the context of this thread is unrolling ...
> > > Not sure how you'd figure the unroll factor to apply if you want
> > > to do unrolling within a classical scheduling framework?  Maybe
> > > unroll as much as you can fill slots until the last instruction
> > > of the first iteration retires?
> > 
> > That will be terrible on register-rich architectures: it *already* is
> > problematic how often some things are unrolled, blindly unrolling more
> > would make things worse.  We need to unroll more where it helps, but
> > less where it does not.  For that we need a good cost/benefit estimate.
> 
> True.  For x86 we tried but did not come up with a sensible estimate
> (probably the x86 uarchs are way too complicated to understand).

There are three main factors at work:

1) The cost for iterating the loop.  This is related to what ivopts
does, and also on most uarchs every loop iteration has some minimum cost
(on many uarchs it needs a fetch redirect, or you can only do one branch
per cycle, etc.; this is mainly important for "small" loops, where what
is "small" differs per uarch).  Unrolling brings this cost down.
2) Cost reduction by better scheduling.
3) The cost for using more registers, when unrolling.  This is worse if
you also use SMS or variable expansion, or simply because of the better
scheduling.  This can increase the cost a *lot*.

1) isn't all that hard to estimate.  2) and esp. 3) are though :-/


Segher


Re: [PATCH] [MIPS] Remove unnecessary moves around DSP multiply-accumulate instructions

2020-02-12 Thread Jeff Law
On Thu, 2020-02-06 at 14:05 +0100, Mihailo Stojanovic wrote:
> Unnecessary moves around dpadd and dpsub are caused by different pseudos
> being assigned to the input-output operands which correspond to the same
> register.
> 
> Just like for the MSA multiply-accumulate instructions, this forces the
> same pseudo to the input-output operands,
> which removes unnecesary moves.
> 
> Tested on mips-mti-linux-gnu.
> 
> gcc/ChangeLog:
> 
> * gcc/config/mips/mips.c (mips_expand_builtin_insn): Operands of
> DSP multiply-accumulate instructions which correspond to the
> same input-output register now have the same pseudo asigned to
> them.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc/testsuite/gcc.target/mips/mac-zero-reload.c: New test.
Affects just MIPS, so OK even though we're in stage4.

jeff
> 



Re: Patch ping

2020-02-12 Thread Jeff Law
On Mon, 2020-02-10 at 10:24 +0100, Jakub Jelinek wrote:
> Hi!
> 
> I'd like to ping a couple of patches:
> 
> PR target/91913 - arm movsi + cmpsi -> movsi_compare0 peephole2 ICE fix
>https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00010.html
Letting the ARM guys deal with this.


> 
> PR preprocessor/92319 - partially implement P1042R1: __VA_OPT__ wording 
> clarifications
>https://gcc.gnu.org/ml/gcc-patches/2020-01/msg02104.html
Jason for this one.

> 
> PR target/93069 - avx512* rejects-valid fix (rejected by assembler)
>http://gcc.gnu.org/ml/gcc-patches/2019-12/msg01606.html
This is in my queue :-)

> 
> PR tree-optimization/92868 - compute_objsize/gimple_call_alloc_size
>  /maybe_warn_overflow fixes
>http://gcc.gnu.org/ml/gcc-patches/2019-12/msg01164.html
Martin's patch should have addressed all the issues and should include
your tests (tweaked, but supposed to be equivalent).

jeff



Re: [PATCH] Don't override various Makefile variables for gnulib et al

2020-02-12 Thread Christian Biesinger via gcc-patches
Ping

On Wed, Jan 29, 2020 at 8:07 AM Christian Biesinger
 wrote:
>
> Ping
>
> On Sat, Nov 23, 2019 at 12:30 AM Christian Biesinger
>  wrote:
> >
> > Normally the toplevel Makefile will pass various CC=foo and other
> > flags down to subdir Makefiles. However, for Gnulib this is a problem
> > because Gnulib's configure specifically sets CC to something that
> > includes a -std=gnu11 flag on some systems, and this override would
> > set it back to CC=gcc, leading to compile errors in a GDB build
> > with an updated Gnulib.
> >
> > I don't believe this is needed outside of GCC, so this patch changes
> > Gnulib and other non-GCC modules to just not override any flags --
> > the values set during configure time should be fine. If a user
> > overrides them manually when invoking make, those will still work.
> >
> > Under the same condition, I also removed the host_exports. I don't
> > understand why this is ever necessary (this is only after configure
> > has run).
> >
> > The other option is to clear MAKEOVERRIDES in gnulib/Makefile.am, but
> > that means the user can't override any variables for this subdirectory.
> >
> > ChangeLog:
> >
> > 2019-11-22  Christian Biesinger  
> >
> > * Makefile.def: Pass no_exports_and_flags to various non-GCC
> > modules.
> > * Makefile.in: Allow passing a no_exports_and_flags argument to
> > "all" to suppress emitting exports and make flags. Useful when
> > invoked via host_modules from Makefile.def.
> > * Makefile.tpl: Regenerate.
> >
> > Change-Id: I7d80328cf81c133ba6157eec7d10c422b6790723
> > ---
> >  Makefile.def | 12 ++--
> >  Makefile.in  | 30 --
> >  Makefile.tpl |  9 ++---
> >  3 files changed, 24 insertions(+), 27 deletions(-)
> >
> > diff --git a/Makefile.def b/Makefile.def
> > index 311feb9de3..e1ff065202 100644
> > --- a/Makefile.def
> > +++ b/Makefile.def
> > @@ -33,7 +33,7 @@ build_modules= { module= fixincludes; };
> >  build_modules= { module= libcpp;
> >   extra_configure_flags='--disable-nls 
> > am_cv_func_iconv=no';};
> >
> > -host_modules= { module= bfd; bootstrap=true; };
> > +host_modules= { module= bfd; bootstrap=true; no_exports_and_flags=true; };
> >  host_modules= { module= opcodes; bootstrap=true; };
> >  host_modules= { module= binutils; bootstrap=true; };
> >  host_modules= { module= bison; no_check_cross= true; };
> > @@ -105,15 +105,15 @@ host_modules= { module= libiconv;
> > missing= install-html;
> > missing= install-info; };
> >  host_modules= { module= m4; };
> > -host_modules= { module= readline; };
> > +host_modules= { module= readline; no_exports_and_flags=true; };
> >  host_modules= { module= sid; };
> > -host_modules= { module= sim; };
> > +host_modules= { module= sim; no_exports_and_flags=true; };
> >  host_modules= { module= texinfo; no_install= true; };
> >  host_modules= { module= zlib; no_install=true; no_check=true;
> > bootstrap=true;
> > extra_configure_flags='@extra_host_zlib_configure_flags@';};
> > -host_modules= { module= gnulib; };
> > -host_modules= { module= gdb; };
> > +host_modules= { module= gnulib; no_exports_and_flags=true; };
> > +host_modules= { module= gdb; no_exports_and_flags=true; };
> >  host_modules= { module= expect; };
> >  host_modules= { module= guile; };
> >  host_modules= { module= tk; };
> > @@ -129,7 +129,7 @@ host_modules= { module= lto-plugin; bootstrap=true;
> > extra_make_flags='@extra_linker_plugin_flags@'; };
> >  host_modules= { module= libcc1; extra_configure_flags=--enable-shared; };
> >  host_modules= { module= gotools; };
> > -host_modules= { module= libctf; no_check=true;
> > +host_modules= { module= libctf; no_check=true; no_exports_and_flags=true;
> > bootstrap=true; };
> >
> >  target_modules = { module= libstdc++-v3;
> > diff --git a/Makefile.in b/Makefile.in
> > index 1aabf6ede4..bd41753543 100644
> > --- a/Makefile.in
> > +++ b/Makefile.in
> > @@ -3414,10 +3414,9 @@ maybe-all-bfd: all-bfd
> >  all-bfd: configure-bfd
> > @r=`${PWD_COMMAND}`; export r; \
> > s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
> > -   $(HOST_EXPORTS)  \
> > +\
> > (cd $(HOST_SUBDIR)/bfd && \
> > - $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS) 
> > $(STAGE1_FLAGS_TO_PASS)  \
> > -   $(TARGET-bfd))
> > + $(MAKE) $(TARGET-bfd))
> >  @endif bfd
> >
> >
> > @@ -25530,10 +25529,9 @@ all-readline: configure-readline
> > @: $(MAKE); $(unstage)
> > @r=`${PWD_COMMAND}`; export r; \
> > s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
> > -   $(HOST_EXPORTS)  \
> > +\
> > (cd $(HOST_SUBDIR)/readline && \
> > - $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS) 
> > $(STAGE1_FLAGS_TO_PASS)  \
> > -   $(TARGET-readline))
> > + $(MAKE) $(TARGET-readline))
> >  @endif readline
> >
> >
> > @@ 

Re: [PATCH] Use a non-empty test program to test ability to link

2020-02-12 Thread Joseph Myers
On Wed, 12 Feb 2020, Sandra Loosemore wrote:

> Hmmm, I tried again and saw that autoconf didn't even touch the timestamp on
> the existing configure file, but I was able to force it to regenerate the
> files by removing the old ones first.  Is this version of the patch OK to
> check in?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PING PATCH] document that alias and target must have the same type

2020-02-12 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00290.html

On 2/5/20 1:13 PM, Martin Sebor wrote:

On 2/4/20 6:05 PM, Martin Sebor wrote:

GCC diagnoses declarations of function aliases whose type doesn't
match that of the target (ditto for attribute weakref).  It doesn't
yet diagnose such incompatbilities for variable aliases but that's
just an oversight that I will try to remember to correct in GCC 11.
The attached patch updates the manual to make it clear that
aliases must have the same type as their targets, or the behavior
is undefined (and may be diagnosed).


On further review I noticed a few problems with the documentation
of attribute weakref.  The manual says that

   Without a target, given as an argument to weakref or to alias,
   weakref is equivalent to weak.

and

   At present, a declaration to which weakref is attached can only
   be static.

However, GCC accepts the following declaration:

   extern int x (void) __attribute__ ((weakref));

so the second sentence isn't correct without qualification (unlike
weakref, a weak declaration must be external).

Another documentation problem is with the example in the manual that
says that this declaration:

   static int x() __attribute__ ((weakref ("y")));

   /* is equivalent to... */

   static int x() __attribute__ ((weak, weakref, alias ("y")));

but GCC rejects the latter with

   error: weak declaration of ‘x’ must be public

Changing the latter from static to extern changes the error to

   error: ‘weakref’ symbol ‘x’ must have static linkage

So clearly two declarations with the two sets of attributes are not
equivalent, either extern or static.

I've fixed up these documentation problems in the attached revision
of the original patch.  I also mention that besides their types
having to match, the alias must have the same size and alignment
as the target.

Martin

PS I also noted that when the function is then also used GCC issues:

   warning: ‘weakref’ attribute should be accompanied with an ‘alias’ 
attribute


This matches what the manual says in "Without arguments, it should
be accompanied by an alias attribute naming the target symbol."

But when the weakref function is not used there is no warning.
That seems like an unfortunate side-effect of the choice of issuing
the warning a little too late (waning from the front-end it would
make it consistent regardless of whether the function was used, and
it would highlight the omission even in translation units that
define the weakref without using it).




Re: [PATCH] real: Fix roundeven on inf/nan [PR93663]

2020-02-12 Thread Joseph Myers
On Wed, 12 Feb 2020, Jakub Jelinek wrote:

> As can be seen in the testcase, roundeven with inf or nan arguments
> ICE because of those asserts where nothing prevents from is_halfway_below
> being called with those arguments.
> 
> The following patch fixes that by just returning false for rvc_inf/rvc_nan
> like it returns for rvc_zero, so that we handle roundeven with all those
> values as round.  Inf/NaN are not halfway in between two integers...
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Use a non-empty test program to test ability to link

2020-02-12 Thread Sandra Loosemore

On 2/10/20 3:58 PM, Joseph Myers wrote:

On Sat, 8 Feb 2020, Sandra Loosemore wrote:


BTW, I did run autoconf in every subdirectory that contains a
configure.ac,
but it appears only libstc++-v3 actually uses this test; all the other
regenerated configure scripts were unchanged.


There's some problem with that regeneration, then; every directory where
configure.ac mentions GCC_NO_EXECUTABLES should have a resulting change (I
tested regenerating in libgcc/ with this patch applied and got such a
change, for example).


Hmmm.  I just tried that again and got nothing.  Do I have to do something
other than just run "autoconf" in that directory?  I'm always confused by
anything involving configuration changes.  :-(


Just running autoconf in that directory should suffice to get configure
regenerated with the change.


Hmmm, I tried again and saw that autoconf didn't even touch the 
timestamp on the existing configure file, but I was able to force it to 
regenerate the files by removing the old ones first.  Is this version of 
the patch OK to check in?


-Sandra

>From c258184d7945071450856e8563128417993758a9 Mon Sep 17 00:00:00 2001
From: Sandra Loosemore 
Date: Wed, 12 Feb 2020 12:20:15 -0800
Subject: [PATCH v2] Use a non-empty test program to test ability to link.

On bare-metal targets, I/O support is typically provided by a BSP and
requires a linker script and/or hosting library to be specified on the
linker command line.  Linking an empty program with the default linker
script may succeed, however, which confuses libstdc++ configuration
when programs that probe for the presence of various I/O features fail
with link errors.

2020-02-12  Sandra Loosemore  

	PR libstdc++/79193
	PR libstdc++/88999

	config/
	* no-executables.m4: Use a non-empty program to test for linker
	support.

	libgcc/
	* configure: Regenerated.

	libgfortran/
	* configure: Regenerated.

	libiberty/
	* configure: Regenerated.

	libitm/
	* configure: Regenerated.

	libobjc/
	* configure: Regenerated.

	libquadmath/
	* configure: Regenerated.

	libssp/
	* configure: Regenerated.

	libstdc++v-3/
	* configure: Regenerated.
---
 config/ChangeLog | 8 
 config/no-executables.m4 | 4 +++-
 libgcc/ChangeLog | 7 +++
 libgcc/configure | 4 ++--
 libgfortran/ChangeLog| 7 +++
 libgfortran/configure| 4 ++--
 libiberty/ChangeLog  | 7 +++
 libiberty/configure  | 4 ++--
 libitm/ChangeLog | 7 +++
 libitm/configure | 0
 libobjc/ChangeLog| 7 +++
 libobjc/configure| 4 ++--
 libquadmath/ChangeLog| 7 +++
 libquadmath/configure| 4 ++--
 libssp/ChangeLog | 7 +++
 libssp/configure | 4 ++--
 libstdc++-v3/ChangeLog   | 7 +++
 libstdc++-v3/configure   | 4 ++--
 18 files changed, 81 insertions(+), 15 deletions(-)
 mode change 100644 => 100755 libitm/configure

diff --git a/config/ChangeLog b/config/ChangeLog
index f1fec81..01428dd 100644
--- a/config/ChangeLog
+++ b/config/ChangeLog
@@ -1,3 +1,11 @@
+2020-02-12  Sandra Loosemore  
+
+	PR libstdc++/79193
+	PR libstdc++/88999
+
+	* no-executables.m4: Use a non-empty program to test for linker
+	support.
+
 2020-02-01  Andrew Burgess  
 
 	* lib-link.m4 (AC_LIB_LINKFLAGS_BODY): Update shell syntax.
diff --git a/config/no-executables.m4 b/config/no-executables.m4
index 9061624..6842f84 100644
--- a/config/no-executables.m4
+++ b/config/no-executables.m4
@@ -25,7 +25,9 @@ AC_BEFORE([$0], [_AC_COMPILER_EXEEXT])
 AC_BEFORE([$0], [AC_LINK_IFELSE])
 
 m4_define([_AC_COMPILER_EXEEXT],
-[AC_LANG_CONFTEST([AC_LANG_PROGRAM()])
+[AC_LANG_CONFTEST([AC_LANG_PROGRAM(
+		 [#include ],
+		 [printf ("hello world\n");])])
 # FIXME: Cleanup?
 AS_IF([AC_TRY_EVAL(ac_link)], [gcc_no_link=no], [gcc_no_link=yes])
 if test x$gcc_no_link = xyes; then
diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index 917d0e7..7b46ccb 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,10 @@
+2020-02-12  Sandra Loosemore  
+
+	PR libstdc++/79193
+	PR libstdc++/88999
+
+	* configure: Regenerated.
+
 2020-02-10  Jeff Law  
 
 	* config/frv/frvbegin.c: Use right flags for .ctors and .dtors
diff --git a/libgcc/configure b/libgcc/configure
index ab8d471..093036a 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -3553,11 +3553,11 @@ done
 
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
-
+#include 
 int
 main ()
 {
-
+printf ("hello world\n");
   ;
   return 0;
 }
diff --git a/libgfortran/ChangeLog b/libgfortran/ChangeLog
index fe7e480..5632fbc 100644
--- a/libgfortran/ChangeLog
+++ b/libgfortran/ChangeLog
@@ -1,3 +1,10 @@
+2020-02-12  Sandra Loosemore  
+
+	PR libstdc++/79193
+	PR libstdc++/88999
+
+	* configure: Regenerated.
+
 2020-01-24  Maciej W. Rozycki  
 
 	* configure.ac: Handle `--with-toolexeclibdir='.
diff --git a/libgfortran/configure b/libgfortran/configure
index 8ba6831..d01654e 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -4042,11 +4042,11 @@ done

[PATCH 2/2] libstdc++: Implement ranges [specialized.algorithms]

2020-02-12 Thread Patrick Palka
This implements all the ranges members defined in [specialized.algorithms]:

  ranges::uninitialized_default_construct
  ranges::uninitialized_value_construct
  ranges::uninitialized_copy
  ranges::uninitialized_copy_n
  ranges::uninitialized_move
  ranges::uninitialized_move_n
  ranges::uninitialized_fill
  ranges::uninitialized_fill_n
  ranges::construct_at
  ranges::destroy_at
  ranges::destroy

It also implements (hopefully correctly) the "obvious" optimizations for these
algos, namely that if the output range has a trivial value type and if the
appropriate operation won't throw then we can dispatch to the standard ranges
version of the algorithm which will then potentially enable further
optimizations.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add .
* include/Makefile.in: Regenerate.
* include/bits/ranges_uninitialized.h: New header.
* include/std/memory: Include it.
* testsuite/20_util/specialized_algorithms/destroy/constrained.cc: New
test.
* .../uninitialized_copy/constrained.cc: New test.
* .../uninitialized_default_construct/constrained.cc: New test.
* .../uninitialized_fill/constrained.cc: New test.
* .../uninitialized_move/constrained.cc: New test.
* .../uninitialized_value_construct/constrained.cc: New test.
---
 libstdc++-v3/include/Makefile.am  |   1 +
 libstdc++-v3/include/Makefile.in  |   1 +
 .../include/bits/ranges_uninitialized.h   | 491 ++
 libstdc++-v3/include/std/memory   |   1 +
 .../destroy/constrained.cc|  76 +++
 .../uninitialized_copy/constrained.cc | 166 ++
 .../constrained.cc| 147 ++
 .../uninitialized_fill/constrained.cc | 137 +
 .../uninitialized_move/constrained.cc | 176 +++
 .../constrained.cc| 140 +
 10 files changed, 1336 insertions(+)
 create mode 100644 libstdc++-v3/include/bits/ranges_uninitialized.h
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/destroy/constrained.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_copy/constrained.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_default_construct/constrained.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_fill/constrained.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_move/constrained.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/specialized_algorithms/uninitialized_value_construct/constrained.cc

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 614222db400..e131ce04f8c 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -159,6 +159,7 @@ bits_headers = \
${bits_srcdir}/range_cmp.h \
${bits_srcdir}/ranges_algobase.h \
${bits_srcdir}/ranges_algo.h \
+   ${bits_srcdir}/ranges_uninitialized.h \
${bits_srcdir}/refwrap.h \
${bits_srcdir}/regex.h \
${bits_srcdir}/regex.tcc \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 7ee6a1e3f61..ae20f6b1d21 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -504,6 +504,7 @@ bits_headers = \
${bits_srcdir}/range_cmp.h \
${bits_srcdir}/ranges_algobase.h \
${bits_srcdir}/ranges_algo.h \
+   ${bits_srcdir}/ranges_uninitialized.h \
${bits_srcdir}/refwrap.h \
${bits_srcdir}/regex.h \
${bits_srcdir}/regex.tcc \
diff --git a/libstdc++-v3/include/bits/ranges_uninitialized.h 
b/libstdc++-v3/include/bits/ranges_uninitialized.h
new file mode 100644
index 000..252295faed7
--- /dev/null
+++ b/libstdc++-v3/include/bits/ranges_uninitialized.h
@@ -0,0 +1,491 @@
+// Raw memory manipulators -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and 

[PATCH 1/2] libstdc++: Move some ranges algos to a new header

2020-02-12 Thread Patrick Palka
This roughly mirrors the existing split between  and
.  The ranges [specialized.algorithms] will use this new
header to avoid including all of of .

libstdc++-v3/ChangeLog:

* include/Makefile.am: Add bits/ranges_algobase.h
* include/Makefile.in: Regenerate.
* bits/ranges_algo.h: Include  and refactor
existing #includes.
(__detail::__is_normal_iterator, __detail::is_reverse_iterator,
__detail::__is_move_iterator, copy_result, move_result,
__equal, equal, copy_result, move_result, move_backward_result,
copy_backward_result, __copy_or_move_backward, __copy_or_move, copy,
move, copy_backward, move_backward, copy_n_result, copy_n, fill_n,
fill): Split out into ...
* bits/range_algobase.h: ... this new header.
---
 libstdc++-v3/include/Makefile.am|   1 +
 libstdc++-v3/include/Makefile.in|   1 +
 libstdc++-v3/include/bits/ranges_algo.h | 508 +-
 libstdc++-v3/include/bits/ranges_algobase.h | 556 
 4 files changed, 559 insertions(+), 507 deletions(-)
 create mode 100644 libstdc++-v3/include/bits/ranges_algobase.h

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 1d342cecbcc..614222db400 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -157,6 +157,7 @@ bits_headers = \
${bits_srcdir}/random.tcc \
${bits_srcdir}/range_access.h \
${bits_srcdir}/range_cmp.h \
+   ${bits_srcdir}/ranges_algobase.h \
${bits_srcdir}/ranges_algo.h \
${bits_srcdir}/refwrap.h \
${bits_srcdir}/regex.h \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index c735d67a5d3..7ee6a1e3f61 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -502,6 +502,7 @@ bits_headers = \
${bits_srcdir}/random.tcc \
${bits_srcdir}/range_access.h \
${bits_srcdir}/range_cmp.h \
+   ${bits_srcdir}/ranges_algobase.h \
${bits_srcdir}/ranges_algo.h \
${bits_srcdir}/refwrap.h \
${bits_srcdir}/regex.h \
diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index e065ff2a974..84a02cabb80 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -32,13 +32,7 @@
 
 #if __cplusplus > 201703L
 
-#include 
-#include 
-#include 
-// #include 
-#include 
-#include 
-#include  // __is_byte
+#include 
 #include  // concept uniform_random_bit_generator
 
 #if __cpp_lib_concepts
@@ -49,28 +43,6 @@ namespace ranges
 {
   namespace __detail
   {
-template
-  constexpr inline bool __is_normal_iterator = false;
-
-template
-  constexpr inline bool
-   __is_normal_iterator<__gnu_cxx::__normal_iterator<_Iterator,
- _Container>> = true;
-
-template
-  constexpr inline bool __is_reverse_iterator = false;
-
-template
-  constexpr inline bool
-   __is_reverse_iterator> = true;
-
-template
-  constexpr inline bool __is_move_iterator = false;
-
-template
-  constexpr inline bool
-   __is_move_iterator> = true;
-
 template
   constexpr auto
   __make_comp_proj(_Comp& __comp, _Proj& __proj)
@@ -741,420 +713,6 @@ namespace ranges
std::move(__proj1), std::move(__proj2));
 }
 
-  template _Sent1,
-  input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
-  typename _Pred, typename _Proj1, typename _Proj2>
-requires indirectly_comparable<_Iter1, _Iter2, _Pred, _Proj1, _Proj2>
-constexpr bool
-__equal(_Iter1 __first1, _Sent1 __last1, _Iter2 __first2, _Sent2 __last2,
-   _Pred __pred, _Proj1 __proj1, _Proj2 __proj2)
-{
-  // TODO: implement more specializations to at least have parity with
-  // std::equal.
-  constexpr bool __sized_iters
-   = (sized_sentinel_for<_Sent1, _Iter1>
-  && sized_sentinel_for<_Sent2, _Iter2>);
-  if constexpr (__sized_iters)
-   {
- auto __d1 = ranges::distance(__first1, __last1);
- auto __d2 = ranges::distance(__first2, __last2);
- if (__d1 != __d2)
-   return false;
-
- using _ValueType1 = iter_value_t<_Iter1>;
- using _ValueType2 = iter_value_t<_Iter2>;
- constexpr bool __use_memcmp
-   = ((is_integral_v<_ValueType1> || is_pointer_v<_ValueType1>)
-  && is_same_v<_ValueType1, _ValueType2>
-  && is_pointer_v<_Iter1>
-  && is_pointer_v<_Iter2>
-  && is_same_v<_Pred, ranges::equal_to>
-  && is_same_v<_Proj1, identity>
-  && is_same_v<_Proj2, identity>);
- if constexpr (__use_memcmp)
-   {
- if (const size_t __len = (__last1 - __first1))
-   return !std::__memcmp(__first1, __first2, __len);
-   

Re: [PATCH] avoid user-constructible types in reshape_init_array (PR 90938)

2020-02-12 Thread Martin Sebor

On 2/11/20 5:28 PM, Jason Merrill wrote:

On 2/11/20 9:00 PM, Martin Sebor wrote:

r270155, committed in GCC 9, introduced a transformation that strips
redundant trailing zero initializers from array initializer lists in
order to support string literals as template arguments.

The transformation neglected to consider the case of array elements
of trivial class types with user-defined conversion ctors and either
defaulted or deleted default ctors.  (It didn't occur to me that
those qualify as trivial types despite the user-defined ctors.)  As
a result, some valid initialization expressions are rejected when
the explicit zero-initializers are dropped in favor of the (deleted)
default ctor,


Hmm, a type with only a deleted default constructor is not trivial, that 
should have been OK already.


For Marek's test case:
  struct A { A () == delete; A (int) = delete; };

trivial_type_p() returns true (as does __is_trivial (A) in both GCC
and Clang).

[class.prop] says that

  A trivial class is a class that is trivially copyable and has one
  or more default constructors (10.3.4.1), all of which are either
  trivial or deleted and at least one of which is not deleted.

That sounds like A above is not trivial because it doesn't have
at least one default ctor that's not deleted, but both GCC and
Clang say it is.  What am I missing?  Is there some other default
constructor hiding in there that I don't know about?


and others are eliminated in favor of the defaulted
ctor instead of invoking a user-defined conversion ctor, leading to
wrong code.


This seems like a bug in type_initializer_zero_p; it shouldn't treat 0 
as a zero initializer for any class.


That does fix it, and it seems like the right solution to me as well.
Thanks for the suggestion.  I'm a little unsure about the condition
I put in place though.

Attached is an updated patch rested on x86_64-linux.

Martin
PR c++/90938 - Initializing array with {1} works but not {0}

gcc/cp/ChangeLog:

	PR c++/90938
	* decl.c (reshape_init_array_1): Avoid types with non-trivial
	user-defined ctors.
	* tree.c (type_initializer_zero_p): Fail for structs initialized
	with non-structs.

gcc/testsuite/ChangeLog:

	PR c++/90938
	* g++.dg/init/array55.C: New test.
	* g++.dg/init/array56.C: New test.
	* g++.dg/cpp2a/nontype-class33.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 31a556a0a1f..60731cb3f9d 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6051,11 +6051,14 @@ reshape_init_array_1 (tree elt_type, tree max_index, reshape_iter *d,
 	break;
 }
 
-  if (sized_array_p && trivial_type_p (elt_type))
+  if (sized_array_p
+  && trivial_type_p (elt_type)
+  && !TYPE_NEEDS_CONSTRUCTING (elt_type))
 {
   /* Strip trailing zero-initializers from an array of a trivial
-	 type of known size.  They are redundant and get in the way
-	 of telling them apart from those with implicit zero value.  */
+	 type with no user-defined ctor of known size.  They are
+	 redundant and get in the way of telling them apart from those
+	 with implicit zero value.  */
   unsigned HOST_WIDE_INT nelts = CONSTRUCTOR_NELTS (new_init);
   if (last_nonzero > nelts)
 	nelts = 0;
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index eb540f851ee..d79d7b1ddd1 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -5708,7 +5708,15 @@ type_initializer_zero_p (tree type, tree init)
 return TREE_CODE (init) != STRING_CST && initializer_zerop (init);
 
   if (TREE_CODE (init) != CONSTRUCTOR)
-return initializer_zerop (init);
+{
+  /* A class can only be initialized by a non-class type if it has
+	 a ctor that converts from that type.  Such classes are excluded
+	 since their semantics are unknown.  */
+  if (RECORD_OR_UNION_TYPE_P (type)
+	  && !RECORD_OR_UNION_TYPE_P (TREE_TYPE (init)))
+	return false;
+  return initializer_zerop (init);
+}
 
   if (TREE_CODE (type) == ARRAY_TYPE)
 {
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class33.C b/gcc/testsuite/g++.dg/cpp2a/nontype-class33.C
new file mode 100644
index 000..8407f4b4c8a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class33.C
@@ -0,0 +1,35 @@
+// PR c++/90938
+// { dg-do compile }
+
+struct A { int i; };
+struct B { A a[2]; };
+
+static const constexpr A a0 = { 0 };
+static const constexpr A a_ = { };
+
+template  struct X { };
+
+typedef X XB;
+typedef XXB;
+typedef X  XB;
+typedef X  XB;
+typedef X  XB;
+typedef XXB;
+typedef X  XB;
+typedef X  XB;
+typedef X  XB;
+
+
+struct C { constexpr C () = default; };
+struct D { C c[2]; };
+
+static const constexpr C c_ = { };
+
+template  struct Y { };
+
+typedef Y YD;
+typedef Y YD;
+typedef Y  YD;
+typedef YYD;
+typedef Y  YD;
+typedef Y  YD;
diff --git a/gcc/testsuite/g++.dg/init/array55.C b/gcc/testsuite/g++.dg/init/array55.C
new file mode 100644
index 000..70fb183b897
--- /dev/null
+++ 

[PATCH] c++: Fix poor diagnostic for array initializer [PR93710]

2020-02-12 Thread Marek Polacek
A small improvement for an error in build_user_type_conversion_1:
instead of

array-init1.C:11:1: error: conversion from ‘long int’ to ‘A’ is ambiguous
   11 | };
  | ^

we will print

array-init1.C:8:3: error: conversion from ‘long int’ to ‘A’ is ambiguous
8 |   0L,
  |   ^~

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2020-02-12  Marek Polacek  

PR c++/93710 - poor diagnostic for array initializer.
* call.c (build_user_type_conversion_1): Use cp_expr_loc_or_input_loc
for an error call.

* g++.dg/diagnostic/array-init1.C: New test.
---
 gcc/cp/call.c |  5 +++--
 gcc/testsuite/g++.dg/diagnostic/array-init1.C | 11 +++
 2 files changed, 14 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/array-init1.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 51621b7dd87..f47f96bf1c2 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4172,8 +4172,9 @@ build_user_type_conversion_1 (tree totype, tree expr, int 
flags,
   if (complain & tf_error)
{
  auto_diagnostic_group d;
- error ("conversion from %qH to %qI is ambiguous",
-fromtype, totype);
+ error_at (cp_expr_loc_or_input_loc (expr),
+   "conversion from %qH to %qI is ambiguous",
+   fromtype, totype);
  print_z_candidates (location_of (expr), candidates);
}
 
diff --git a/gcc/testsuite/g++.dg/diagnostic/array-init1.C 
b/gcc/testsuite/g++.dg/diagnostic/array-init1.C
new file mode 100644
index 000..78580ad6b83
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/array-init1.C
@@ -0,0 +1,11 @@
+// PR c++/93710 - poor diagnostic for array initializer.
+
+struct A { A (int); A (char*); int i; };
+
+int x;
+
+A a1[] = {
+  0L, // { dg-error "3:conversion from .long int. to .A. is ambiguous" }
+  , // { dg-error "3:invalid conversion from .int\\*. to .int." }
+  __builtin_offsetof (A, i) // { dg-error "23:conversion from .long unsigned 
int. to .A. is ambiguous" }
+};

base-commit: 5bfc8303ffe2d86e938d45f13cd99a39469dac4f
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



[committed] More H8 cleanups

2020-02-12 Thread Jeff Law

This is another small H8 cleanup.  This time we're killing a peephole2
that doesn't seem terribly useful.  The peephole in question narrows a
SImode comparison to QImode when it's fed by an (and (xor)) and the
result is the same in either mode.  I couldn't get this to trigger
within libgcc, newlib or in the testsuite.  Note that we do some of
this kind of narrowing in match.pd these days, so it may be the case
that the pattern was of marginal value before and none after the
match.pd improvements.

The more straightforward SI->QI and HI->QI operand narrowing when fed
by a simple (and) operation were consolidated into a single peephole2. 
These are still triggering which may point to a failing of the match.pd
patterns, but addressing in match.pd seems terribly out of scope at
this point.

Installing on the trunk.

Jeff


commit 37462a131c528d0980915d98567361aa9396b030
Author: Jeff Law 
Date:   Wed Feb 12 12:12:22 2020 -0700

Drop unused comparison shortening pattern and consolidate remaining 
comparison shortening patterns.

* config/h8300/h8300.md (comparison shortening peepholes): Drop
(and (xor)) variant.  Combine other two into single peephole.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 735cc47a3dd..1927290f568 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-02-12  Jeff Law  
+
+   * config/h8300/h8300.md (comparison shortening peepholes): Drop
+   (and (xor)) variant.  Combine other two into single peephole.
+
 2020-02-12  Wilco Dijkstra  
 
PR rtl-optimization/93565
diff --git a/gcc/config/h8300/h8300.md b/gcc/config/h8300/h8300.md
index 0548368215b..dcc9c3682d6 100644
--- a/gcc/config/h8300/h8300.md
+++ b/gcc/config/h8300/h8300.md
@@ -5405,7 +5405,9 @@
   [(cc0) (const_int 0)])
  (label_ref (match_operand 2 "" ""))
  (pc)))]
-  "peep2_reg_dead_p (2, operands[0])"
+  "((const_int_qi_operand (operands[1], QImode)
+ || const_int_hi_operand (operands[1], HImode))
+&& peep2_reg_dead_p (2, operands[0]))"
   [(set (match_dup 4)
(and:QI (match_dup 4)
(match_dup 5)))
@@ -5416,68 +5418,11 @@
  (label_ref (match_dup 2))
  (pc)))]
   {
-operands[4] = gen_rtx_REG (QImode, REGNO (operands[0]));
-operands[5] = gen_int_mode (INTVAL (operands[1]), QImode);
-  })
+enum machine_mode mode;
 
-(define_peephole2
-  [(set (match_operand:SI 0 "register_operand" "")
-   (and:SI (match_dup 0)
-   (match_operand:SI 1 "const_int_hi_operand" "")))
-   (set (cc0) (compare (match_dup 0)
-  (const_int 0)))
-   (set (pc)
-   (if_then_else (match_operator 3 "eqne_operator"
-  [(cc0) (const_int 0)])
- (label_ref (match_operand 2 "" ""))
- (pc)))]
-  "peep2_reg_dead_p (2, operands[0])"
-  [(set (match_dup 4)
-   (and:HI (match_dup 4)
-   (match_dup 5)))
-   (set (cc0) (compare (match_dup 4)
-  (const_int 0)))
-   (set (pc)
-   (if_then_else (match_op_dup 3 [(cc0) (const_int 0)])
- (label_ref (match_dup 2))
- (pc)))]
-  {
-operands[4] = gen_rtx_REG (HImode, REGNO (operands[0]));
-operands[5] = gen_int_mode (INTVAL (operands[1]), HImode);
-  })
-
-(define_peephole2
-  [(set (match_operand:SI 0 "register_operand" "")
-   (and:SI (match_dup 0)
-   (match_operand:SI 1 "const_int_qi_operand" "")))
-   (set (match_dup 0)
-   (xor:SI (match_dup 0)
-   (match_operand:SI 2 "const_int_qi_operand" "")))
-   (set (cc0) (compare (match_dup 0)
-  (const_int 0)))
-   (set (pc)
-   (if_then_else (match_operator 4 "eqne_operator"
-  [(cc0) (const_int 0)])
- (label_ref (match_operand 3 "" ""))
- (pc)))]
-  "peep2_reg_dead_p (3, operands[0])
-   && (~INTVAL (operands[1]) & INTVAL (operands[2])) == 0"
-  [(set (match_dup 5)
-   (and:QI (match_dup 5)
-   (match_dup 6)))
-   (set (match_dup 5)
-   (xor:QI (match_dup 5)
-   (match_dup 7)))
-   (set (cc0) (compare (match_dup 5)
-  (const_int 0)))
-   (set (pc)
-   (if_then_else (match_op_dup 4 [(cc0) (const_int 0)])
- (label_ref (match_dup 3))
- (pc)))]
-  {
-operands[5] = gen_rtx_REG (QImode, REGNO (operands[0]));
-operands[6] = gen_int_mode (INTVAL (operands[1]), QImode);
-operands[7] = gen_int_mode (INTVAL (operands[2]), QImode);
+mode = const_int_qi_operand (operands[1], QImode) ? QImode : HImode;
+operands[4] = gen_rtx_REG (mode, REGNO (operands[0]));
+operands[5] = gen_int_mode (INTVAL (operands[1]), mode);
   })
 
 ;; These triggers right at the end of allocation of locals in the


Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Wilco Dijkstra
Hi Andrew,

> Yes I agree a better cost model for CTZ/CLZ is the right solution but
> I disagree with 2 ALU instruction as the cost.  It should either be
> the same cost as a multiply or have its own cost entry.
> For an example on OcteonTX (and ThunderX1), the cost of CLS/CLZ is 4
> cycles, the same as the cost as a multiple; on OcteonTX2 it is 5
> cycles (again the same cost as a multiple).

+  if (speed)
+  *cost += extra_cost->alu.clz + extra_cost->alu.rev;
+  return false;

So if the cost of clz and ctz are similar enough, this will use the defined
per-cpu costs.

Cheers,
Wilco

Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Wilco Dijkstra
Hi Richard,

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565#c8 - the problem is
more generic like I suspected and it's easy to create similar examples. So while
this turned out to be an easy worksaround for ctz, there general case is harder
to avoid since you still want to allow beneficial multi-use cases (such as 
merging
a shift into 2 ALU instructions).

Cheers,
Wilco



Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Andrew Pinski
On Wed, Feb 12, 2020 at 9:56 AM Richard Sandiford
 wrote:
>
> Wilco Dijkstra  writes:
> > Hi Richard,
> >
> > Right, so this is an alternative approach using costs - Combine won't try to
> > duplicate instructions if it increases costs, so increasing the ctz cost to 
> > 2
> > instructions (which is the correct cost for ctz anyway)
>
> ...agreed...

Yes I agree a better cost model for CTZ/CLZ is the right solution but
I disagree with 2 ALU instruction as the cost.  It should either be
the same cost as a multiply or have its own cost entry.
For an example on OcteonTX (and ThunderX1), the cost of CLS/CLZ is 4
cycles, the same as the cost as a multiple; on OcteonTX2 it is 5
cycles (again the same cost as a multiple).

Thanks,
Andrew Pinski

>
> > ensures we still get efficient code for this example:
> >
> > [AArch64] Set ctz rtx_cost (PR93565)
> >
> > Although GCC should understand the limited range of clz/ctz/cls results,
> > Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary
> > sign extension.  Avoid this by setting the cost for ctz to be higher than
> > that of a simple ALU instruction.  Deepsjeng performance improves by ~0.6%.
> >
> > Bootstrap OK.
> >
> > ChangeLog:
> > 2020-02-12  Wilco Dijkstra  
> >
> > PR rtl-optimization/93565
> > * config/aarch64/aarch64.c (aarch64_rtx_costs): Add CTZ costs.
> >
> > * gcc.target/aarch64/pr93565.c: New test.
>
> OK, thanks.  Could you remove the bit about combine behaving oddly when
> you commit though?  I think this was simply a target bug and combine
> was being given duff information.
>
> Richard
>
> >
> > --
> >
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 
> > e40750380cce202473da3cf572ebdbc28a4ecc06..7426629d6c973c06640f75d3de53a2815ff40f1b
> >  100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -11459,6 +11459,13 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int 
> > outer ATTRIBUTE_UNUSED,
> >
> >return false;
> >
> > +case CTZ:
> > +  *cost = COSTS_N_INSNS (2);
> > +
> > +  if (speed)
> > +*cost += extra_cost->alu.clz + extra_cost->alu.rev;
> > +  return false;
> > +
> >  case COMPARE:
> >op0 = XEXP (x, 0);
> >op1 = XEXP (x, 1);
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c 
> > b/gcc/testsuite/gcc.target/aarch64/pr93565.c
> > new file mode 100644
> > index 
> > ..7200f80d1bb161f6a058cc6591f61b6b75cf1749
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr93565.c
> > @@ -0,0 +1,34 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +static const unsigned long long magic = 0x03f08c5392f756cdULL;
> > +
> > +static const char table[64] = {
> > + 0,  1, 12,  2, 13, 22, 17,  3,
> > +14, 33, 23, 36, 18, 58, 28,  4,
> > +62, 15, 34, 26, 24, 48, 50, 37,
> > +19, 55, 59, 52, 29, 44, 39,  5,
> > +63, 11, 21, 16, 32, 35, 57, 27,
> > +61, 25, 47, 49, 54, 51, 43, 38,
> > +10, 20, 31, 56, 60, 46, 53, 42,
> > + 9, 30, 45, 41,  8, 40,  7,  6,
> > +};
> > +
> > +static inline int ctz1 (unsigned long  b)
> > +{
> > +  unsigned long lsb = b & -b;
> > +  return table[(lsb * magic) >> 58];
> > +}
> > +
> > +void f (unsigned long x, int *p)
> > +{
> > +  if (x != 0)
> > +{
> > +  int a = ctz1 (x);
> > +  *p = a | p[a];
> > +}
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "rbit\t" 1 } } */
> > +/* { dg-final { scan-assembler-times "clz\t" 1 } } */
> > +


Re: [PATCH] issue -Wstringop-overflow for potential overflow, not -truncation (PR 93646)

2020-02-12 Thread Jeff Law
On Mon, 2020-02-10 at 15:47 -0700, Martin Sebor wrote:
> The reporter of RHBZ #1798636 was mislead and confused by GCC
> issuing -Wstringop-truncation for a possible overflow in strncat.
> It took a few iterations to appreciate this subtlety and realize
> the warning was of the wrong kind.
> 
> The attached patch adjusts the logic of the function responsible
> for the warning not to issue -Wstringop-truncation only for strncpy
> and -Wstringop-overflow for strncat.
> 
> I'm not sure if the bug is strictly speaking a regression: GCC 8
> didn't issue any warning so it could be viewed as one, but then
> again, issuing a warning in this instance is intended, but not
> a misleading one.
> 
> Tested on x86_64-linux.
> 
> Martin
> PR middle-end/93646 - confusing -Wstringop-truncation on strncat where 
> -Wstringop-overflow is expected
> 
> gcc/ChangeLog:
> 
>   PR middle-end/93646
>   * tree-ssa-strlen.c (handle_builtin_stxncpy): Rename...
>   (handle_builtin_stxncpy_strncat): ...to this.  Change first argument.
>   Issue only -Wstringop-overflow strncat, never -Wstringop-truncation.
>   (strlen_check_and_optimize_call): Adjust callee name.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR middle-end/93646
>   * gcc.dg/Wstringop-overflow-31.c: New test.
Not strictly a regression bugfix, but I think this is OK for the trunk,
even in stage4.

jeff



Re: [PATCH] regalloc/debug: fix buggy print_hard_reg_set

2020-02-12 Thread Jeff Law
On Tue, 2020-02-11 at 15:54 +0100, Hans-Peter Nilsson wrote:
> I was using ira-conflicts.c:print_hard_reg_set with a local
> patch to gdbinit.in in a debug-session, and noticed the
> erroneous output.  I see there's an almost identical function in
> ira-color.c and on top of that, there's another function by the
> same name and with similar semantics in sel-sched-dump.c, but
> the last one doesn't try to print ranges.
> 
> I'm guessing the dump functions have been used with targets that
> have "impossible" registers at FIRST_PSEUDO_REGISTER - 1.  CRIS
> happens to have the condition-code register at
> FIRST_PSEUDO_REGISTER - 1; not allocatable but appears a lot in
> (clobbered) register sets after decc0ration.
> 
> Before, for a target with FIRST_PSEUDO_REGISTER 20, you'd get
> "19-18" for (1<<19).  For (1<<18)|(1<<19), you'd get "18".
> 
> The diff is unfortunately hairy, but I hope you agree the code
> is a bit clearer.  I'm deliberately not consolidating the dump
> functions as IMHO that's too much a matter of taste.
> 
>   * ira-conflicts.c (print_hard_reg_set): Correct output for sets
>   including FIRST_PSEUDO_REGISTER - 1.
>   * ira-color.c (print_hard_reg_set): Ditto.
> 
> Ok to commit?
Not strictly a regression fix, but it's just dumping routines.  So OK
for the trunk.
jeff
> 



Re: [PATCH][AArch64] Improve popcount expansion

2020-02-12 Thread Richard Sandiford
Wilco Dijkstra  writes:
> The popcount expansion uses umov to extend the result and move it back
> to the integer register file.  If we model ADDV as a zero-extending
> operation, fmov can be used to move back to the integer side. This
> results in a ~0.5% speedup on deepsjeng on Cortex-A57.
>
> A typical __builtin_popcount expansion is now:
>
> fmovs0, w0
> cntv0.8b, v0.8b
> addvb0, v0.8b
> fmovw0, s0
>
> Bootstrap OK, passes regress.
>
> ChangeLog
> 2020-02-02  Wilco Dijkstra  
>
> gcc/
> * config/aarch64/aarch64.md (popcount2): Improve expansion.
> * config/aarch64/aarch64-simd.md
> (aarch64_zero_extend_reduc_plus_): New pattern.

I think reordering these and using:

* config/aarch64/aarch64.md (popcount2): Use it instead of
generating separate ADDV and zero_extend patterns.

would be clearer.  OK with that change, thanks.

Richard


> * config/aarch64/iterators.md (VDQV_E): New iterator.
> testsuite/
> * gcc.target/aarch64/popcnt2.c: New test.
>
> --
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 97f46f96968a6bc2f93bbc812931537b819b3b19..34765ff43c1a090a31e2aed64ce95510317ab8c3
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2460,6 +2460,17 @@ (define_insn "aarch64_reduc_plus_internal"
>[(set_attr "type" "neon_reduc_add")]
>  )
>
> +;; ADDV with result zero-extended to SI/DImode (for popcount).
> +(define_insn "aarch64_zero_extend_reduc_plus_"
> + [(set (match_operand:GPI 0 "register_operand" "=w")
> +   (zero_extend:GPI
> +(unspec: [(match_operand:VDQV_E 1 "register_operand" "w")]
> + UNSPEC_ADDV)))]
> + "TARGET_SIMD"
> + "add\\t%0, %1."
> +  [(set_attr "type" "neon_reduc_add")]
> +)
> +
>  (define_insn "aarch64_reduc_plus_internalv2si"
>   [(set (match_operand:V2SI 0 "register_operand" "=w")
> (unspec:V2SI [(match_operand:V2SI 1 "register_operand" "w")]
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 86c2cdfc7973f4b964ba233cfbbe369b24e0ac10..5edc76ee14b55b2b4323530e10bd22b3ffca483e
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4829,7 +4829,6 @@ (define_expand "popcount2"
>  {
>rtx v = gen_reg_rtx (V8QImode);
>rtx v1 = gen_reg_rtx (V8QImode);
> -  rtx r = gen_reg_rtx (QImode);
>rtx in = operands[1];
>rtx out = operands[0];
>if(mode == SImode)
> @@ -4843,8 +4842,7 @@ (define_expand "popcount2"
>  }
>emit_move_insn (v, gen_lowpart (V8QImode, in));
>emit_insn (gen_popcountv8qi2 (v1, v));
> -  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
> -  emit_insn (gen_zero_extendqi2 (out, r));
> +  emit_insn (gen_aarch64_zero_extend_reduc_plus_v8qi (out, v1));
>DONE;
>  })
>
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 
> fc973086cb91ae0dc54eeeb0b832d522539d7982..926779bf2442fa60d184ef17308f91996d6e8d1b
>  100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -208,6 +208,9 @@ (define_mode_iterator VDQV [V8QI V16QI V4HI V8HI V4SI 
> V2DI])
>  ;; Advanced SIMD modes (except V2DI) for Integer reduction across lanes.
>  (define_mode_iterator VDQV_S [V8QI V16QI V4HI V8HI V4SI])
>
> +;; Advanced SIMD modes for Integer reduction across lanes (zero/sign 
> extended).
> +(define_mode_iterator VDQV_E [V8QI V16QI V4HI V8HI])
> +
>  ;; All double integer narrow-able modes.
>  (define_mode_iterator VDN [V4HI V2SI DI])
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt2.c 
> b/gcc/testsuite/gcc.target/aarch64/popcnt2.c
> new file mode 100644
> index 
> ..e321858afa4d6ecb6fc7348f39f6e5c6c0c46147
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt2.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +unsigned
> +foo (int x)
> +{
> +  return __builtin_popcount (x);
> +}
> +
> +unsigned long
> +foo1 (int x)
> +{
> +  return __builtin_popcount (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {popcount} } } */
> +/* { dg-final { scan-assembler-times {cnt\t} 2 } } */
> +/* { dg-final { scan-assembler-times {fmov} 4 } } */
> +/* { dg-final { scan-assembler-not {umov} } } */
> +/* { dg-final { scan-assembler-not {uxtw} } } */
> +/* { dg-final { scan-assembler-not {sxtw} } } */


Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Hi Richard,
>
> Right, so this is an alternative approach using costs - Combine won't try to
> duplicate instructions if it increases costs, so increasing the ctz cost to 2
> instructions (which is the correct cost for ctz anyway)

...agreed...

> ensures we still get efficient code for this example:
>
> [AArch64] Set ctz rtx_cost (PR93565)
>
> Although GCC should understand the limited range of clz/ctz/cls results,
> Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary
> sign extension.  Avoid this by setting the cost for ctz to be higher than
> that of a simple ALU instruction.  Deepsjeng performance improves by ~0.6%.
>
> Bootstrap OK.
>
> ChangeLog:
> 2020-02-12  Wilco Dijkstra  
>
> PR rtl-optimization/93565
> * config/aarch64/aarch64.c (aarch64_rtx_costs): Add CTZ costs.
>
> * gcc.target/aarch64/pr93565.c: New test.

OK, thanks.  Could you remove the bit about combine behaving oddly when
you commit though?  I think this was simply a target bug and combine
was being given duff information.

Richard

>
> --
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> e40750380cce202473da3cf572ebdbc28a4ecc06..7426629d6c973c06640f75d3de53a2815ff40f1b
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11459,6 +11459,13 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int 
> outer ATTRIBUTE_UNUSED,
>
>return false;
>
> +case CTZ:
> +  *cost = COSTS_N_INSNS (2);
> +
> +  if (speed)
> +*cost += extra_cost->alu.clz + extra_cost->alu.rev;
> +  return false;
> +
>  case COMPARE:
>op0 = XEXP (x, 0);
>op1 = XEXP (x, 1);
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c 
> b/gcc/testsuite/gcc.target/aarch64/pr93565.c
> new file mode 100644
> index 
> ..7200f80d1bb161f6a058cc6591f61b6b75cf1749
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr93565.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +static const unsigned long long magic = 0x03f08c5392f756cdULL;
> +
> +static const char table[64] = {
> + 0,  1, 12,  2, 13, 22, 17,  3,
> +14, 33, 23, 36, 18, 58, 28,  4,
> +62, 15, 34, 26, 24, 48, 50, 37,
> +19, 55, 59, 52, 29, 44, 39,  5,
> +63, 11, 21, 16, 32, 35, 57, 27,
> +61, 25, 47, 49, 54, 51, 43, 38,
> +10, 20, 31, 56, 60, 46, 53, 42,
> + 9, 30, 45, 41,  8, 40,  7,  6,
> +};
> +
> +static inline int ctz1 (unsigned long  b)
> +{
> +  unsigned long lsb = b & -b;
> +  return table[(lsb * magic) >> 58];
> +}
> +
> +void f (unsigned long x, int *p)
> +{
> +  if (x != 0)
> +{
> +  int a = ctz1 (x);
> +  *p = a | p[a];
> +}
> +}
> +
> +/* { dg-final { scan-assembler-times "rbit\t" 1 } } */
> +/* { dg-final { scan-assembler-times "clz\t" 1 } } */
> +


[committed] Clean up various dead patterns, expanders, splitters and peepholes on the H8.

2020-02-12 Thread Jeff Law
I needed some mindless work yesterday, so I took the opportunity to do
some light cleanups on the H8 port in the hopes that removing
unnecessary cruft will make cc0 conversion easier.

In this round I'm removing all the expanders, patterns and splitters
that have been disabled (in some cases for ~15 years) as well as
removing the peepholes that widen QI/HI mode pushes to SImode.

I verified none of those peepholes trigger during the course of
building libgcc, newlib or across the testsuite (I won't list all the
multilibs, there's many :-)  I didn't feel such rigor was needed for
the previously disabled patterns, expanders and splitters.

While we're in stage4, this is isolated to the H8 target which is
currently scheduled for deprecation.  So it's seems safe enough :-)

There'll be more of these cleanups coming.  Removing the unused
combiner patterns in particular is useful to simplify the cc0
conversion.

Installing on the trunk momentarily.

Jeff
commit e5cc04a73a3e212114ca9725911eaaa66d32303c
Author: Jeff Law 
Date:   Wed Feb 12 10:35:12 2020 -0700

Clean up dead patterns, splitters, expanders and peepholes on the H8 port.

* config/h8300/h8300.md (cpymemsi, movmd): Remove dead patterns,
expanders, splits, etc.
(movmd_internal_, movmd splitter, movstr, movsd): Likewise.
(stpcpy_internal_, stpcpy splitter): Likewise.
(peepholes to convert QI/HI mode pushes to SI mode pushes): 
Likewise.
* config/h8300/h8300.c (h8300_swap_into_er6): Remove unused 
function.
(h8300_swap_out_of_er6, h8sx_emit_movmd): Likewise
* config/h8300/h8300-protos.h (h8300_swap_into_er6): Remove unused
function prototype.
(h8300_swap_out_of_er6, h8sx_emit_movmd): Likewise.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6fa4768d816..9ff0a943e4f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,16 @@
+2020-02-12  Jeff Law  
+
+   * config/h8300/h8300.md (cpymemsi, movmd): Remove dead patterns,
+   expanders, splits, etc.
+   (movmd_internal_, movmd splitter, movstr, movsd): Likewise.
+   (stpcpy_internal_, stpcpy splitter): Likewise.
+   (peepholes to convert QI/HI mode pushes to SI mode pushes): Likewise.
+   * config/h8300/h8300.c (h8300_swap_into_er6): Remove unused function.
+   (h8300_swap_out_of_er6, h8sx_emit_movmd): Likewise
+   * config/h8300/h8300-protos.h (h8300_swap_into_er6): Remove unused
+   function prototype.
+   (h8300_swap_out_of_er6, h8sx_emit_movmd): Likewise.
+
 2020-02-12  Jakub Jelinek  
 
PR target/93670
diff --git a/gcc/config/h8300/h8300-protos.h b/gcc/config/h8300/h8300-protos.h
index 0cbb091390d..2416741e76a 100644
--- a/gcc/config/h8300/h8300-protos.h
+++ b/gcc/config/h8300/h8300-protos.h
@@ -109,9 +109,6 @@ extern unsigned inth8300_insn_length_from_table 
(rtx_insn *, rtx *);
 extern const char *output_h8sx_shift (rtx *, int, int);
 extern boolh8300_operands_match_p (rtx *);
 extern boolh8sx_mergeable_memrefs_p (rtx, rtx);
-extern boolh8sx_emit_movmd (rtx, rtx, rtx, HOST_WIDE_INT);
-extern voidh8300_swap_into_er6 (rtx);
-extern voidh8300_swap_out_of_er6 (rtx);
 extern poly_int64  h8300_push_rounding (poly_int64);
 
 #endif /* ! GCC_H8300_PROTOS_H */
diff --git a/gcc/config/h8300/h8300.c b/gcc/config/h8300/h8300.c
index def8be344af..d2fea04d8e4 100644
--- a/gcc/config/h8300/h8300.c
+++ b/gcc/config/h8300/h8300.c
@@ -2647,145 +2647,6 @@ h8300_operands_match_p (rtx *operands)
   return false;
 }
 
-/* Try using movmd to move LENGTH bytes from memory region SRC to memory
-   region DEST.  The two regions do not overlap and have the common
-   alignment given by ALIGNMENT.  Return true on success.
-
-   Using movmd for variable-length moves seems to involve some
-   complex trade-offs.  For instance:
-
-  - Preparing for a movmd instruction is similar to preparing
-   for a memcpy.  The main difference is that the arguments
-   are moved into er4, er5 and er6 rather than er0, er1 and er2.
-
-  - Since movmd clobbers the frame pointer, we need to save
-   and restore it somehow when frame_pointer_needed.  This can
-   sometimes make movmd sequences longer than calls to memcpy().
-
-  - The counter register is 16 bits, so the instruction is only
-   suitable for variable-length moves when sizeof (size_t) == 2.
-   That's only true in normal mode.
-
-  - We will often lack static alignment information.  Falling back
-   on movmd.b would likely be slower than calling memcpy(), at least
-   for big moves.
-
-   This function therefore only uses movmd when the length is a
-   known constant, and only then if -fomit-frame-pointer is in
-   effect or if we're not optimizing for size.
-
-   At the moment the function uses movmd for all in-range constants,
-   but it might be better to fall back on memcpy() for large 

New French PO file for 'cpplib' (version 10.1-b20200209)

2020-02-12 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the French team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/fr.po

(This file, 'cpplib-10.1-b20200209.fr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Contents of PO file 'cpplib-10.1-b20200209.fr.po'

2020-02-12 Thread Translation Project Robot


cpplib-10.1-b20200209.fr.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.



[committed] c++: Add new test [PR88819]

2020-02-12 Thread Marek Polacek
Fixed by r10-1975-g59febe0ece37bedab7f42ae51b9f2b7a372d2950.

2020-02-12  Marek Polacek  

PR c++/88819
* g++.dg/cpp2a/nontype-class32.C: New test.
---
 gcc/testsuite/ChangeLog  |  5 +
 gcc/testsuite/g++.dg/cpp2a/nontype-class32.C | 10 ++
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class32.C

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 968b903411d..e2f590b48f4 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2020-02-12  Marek Polacek  
+
+   PR c++/88819
+   * g++.dg/cpp2a/nontype-class32.C: New test.
+
 2020-02-12  Marek Polacek  
 
PR c++/93684 - ICE-on-invalid with broken attribute.
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class32.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class32.C
new file mode 100644
index 000..6a4dac5c90e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class32.C
@@ -0,0 +1,10 @@
+// PR c++/88819
+// { dg-do compile { target c++2a } }
+
+template class TT, class R = TT <0>> struct A 
+{
+  template struct B {};
+};
+template struct C {};
+
+A> a;

base-commit: e428a9cf85a8bdde9d031a215e10bd96eb3b789a
-- 
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



[PATCH] c++: Fix hashing and testing for equality of ATOMIC_CONST_EXPRs

2020-02-12 Thread Patrick Palka
Two equal atomic constraint expressions do not necessarily share the same tree,
so we can't assume that two ATOMIC_CONST_EXPRs are equal if and only if they
point to the same tree.  The main consequence of this invalid assumption is that
the constraint subsumption checker may reject a valid partial specialization for
a constrained template because the checker decides that the constraints on the
partial specialization do not subsume the constraints on the primary template,
as in the test case below.  Another consequence is that it probably makes the
constraint caches less effective.

This patch changes atomic_constraints_identical_p to test for equality of
ATOMIC_CONST_EXPRs using cp_tree_equal, and changes hash_atomic_constraint to
hash the ATOMIC_CONST_EXPR with iterative_hash_template_arg.  Originally I used
template_args_equal but cp_tree_equal seems to suffice.

Does this look like an OK solution to this issue?  Are the testcase adjustments
valid?

gcc/cp/ChangeLog:

* constraint.cc (atomic_constraints_identical_p): Use cp_tree_equal to
test for equality between ATOMIC_CONST_EXPRs.
(hash_atomic_constraint): Use iterative_hash_template to hash the
ATOMIC_CONST_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/variadic4.C: Adjust test to no longere expect an
error.
* g++.dg/cpp2a/concepts-pr84551.C: Likewise.
* g++.dg/cpp2a/concepts-partial-spec7.C: New test.
---
 gcc/cp/constraint.cc  |  4 +-
 gcc/testsuite/g++.dg/concepts/variadic4.C |  4 +-
 .../g++.dg/cpp2a/concepts-partial-spec7.C | 40 +++
 gcc/testsuite/g++.dg/cpp2a/concepts-pr84551.C |  2 +-
 4 files changed, 44 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 58044cd0f9d..935c3556272 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -933,7 +933,7 @@ atomic_constraints_identical_p (tree t1, tree t2)
   gcc_assert (TREE_CODE (t1) == ATOMIC_CONSTR);
   gcc_assert (TREE_CODE (t2) == ATOMIC_CONSTR);
 
-  if (ATOMIC_CONSTR_EXPR (t1) != ATOMIC_CONSTR_EXPR (t2))
+  if (!cp_tree_equal (ATOMIC_CONSTR_EXPR (t1), ATOMIC_CONSTR_EXPR (t2)))
 return false;
 
   if (!parameter_mapping_equivalent_p (t1, t2))
@@ -981,7 +981,7 @@ hash_atomic_constraint (tree t)
   gcc_assert (TREE_CODE (t) == ATOMIC_CONSTR);
 
   /* Hash the identity of the expression.  */
-  hashval_t val = htab_hash_pointer (ATOMIC_CONSTR_EXPR (t));
+  hashval_t val = iterative_hash_template_arg (ATOMIC_CONSTR_EXPR (t), 0);
 
   /* Hash the targets of the parameter map.  */
   tree p = ATOMIC_CONSTR_MAP (t);
diff --git a/gcc/testsuite/g++.dg/concepts/variadic4.C 
b/gcc/testsuite/g++.dg/concepts/variadic4.C
index d6eea49b958..b073bcce5e9 100644
--- a/gcc/testsuite/g++.dg/concepts/variadic4.C
+++ b/gcc/testsuite/g++.dg/concepts/variadic4.C
@@ -12,9 +12,7 @@ struct zip;
 
 template
 requires requires { typename list; } // && (Sequence && ...)
-struct zip {}; // { dg-error "does not specialize" }
-// The constraints of the specialization and the sequence are not
-// comparable; the specializations are unordered.
+struct zip {};
 
 int main()
 {
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C
new file mode 100644
index 000..ed77ed4d362
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C
@@ -0,0 +1,40 @@
+// { dg-do compile { target c++2a } }
+
+template
+  inline constexpr bool foo = true;
+
+template
+  inline constexpr bool bar = true;
+
+// The following two partial specializations are both valid, but the constraint
+// subsumption checker used to compare ATOMIC_CONSTR_EXPRs by comparing their
+// pointers, leading to it accepting the partial specialization of test1 and
+// rejecting the partial specialization of test2: in the test1 case, the
+// TEMPLATE_ID_EXPR of foo in the constraints was shared between the primary
+// template and partial specialization, and in the test2 case the
+// TEMPLATE_ID_EXPRs of foo in the constraints were different (but equal)
+// trees.
+
+template
+  concept baz = foo;
+
+template
+  requires baz
+  struct test1
+  { };
+
+template
+  requires baz && bar
+  struct test1
+  { };
+
+template
+  requires foo
+  struct test2
+  { };
+
+template
+  requires foo && bar
+  struct test2
+  { };
+
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-pr84551.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-pr84551.C
index e40796f4975..e7498437178 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-pr84551.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-pr84551.C
@@ -8,4 +8,4 @@ template requires C class TT> struct A 
{};
 
 template requires true struct B {};
 
-A a;// { dg-error "" }
+A a;
-- 
2.25.0.191.gde93cc14ab



Re: [PATCH][AArch64] Improve clz patterns

2020-02-12 Thread Wilco Dijkstra
Hi Richard,

Right, so this is an alternative approach using costs - Combine won't try to
duplicate instructions if it increases costs, so increasing the ctz cost to 2
instructions (which is the correct cost for ctz anyway) ensures we still get
efficient code for this example:

[AArch64] Set ctz rtx_cost (PR93565)

Although GCC should understand the limited range of clz/ctz/cls results,
Combine sometimes behaves oddly and duplicates ctz to remove an unnecessary
sign extension.  Avoid this by setting the cost for ctz to be higher than
that of a simple ALU instruction.  Deepsjeng performance improves by ~0.6%.

Bootstrap OK.

ChangeLog:
2020-02-12  Wilco Dijkstra  

PR rtl-optimization/93565
* config/aarch64/aarch64.c (aarch64_rtx_costs): Add CTZ costs.

* gcc.target/aarch64/pr93565.c: New test.

--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
e40750380cce202473da3cf572ebdbc28a4ecc06..7426629d6c973c06640f75d3de53a2815ff40f1b
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11459,6 +11459,13 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer 
ATTRIBUTE_UNUSED,
 
   return false;
 
+case CTZ:
+  *cost = COSTS_N_INSNS (2);
+
+  if (speed)
+   *cost += extra_cost->alu.clz + extra_cost->alu.rev;
+  return false;
+
 case COMPARE:
   op0 = XEXP (x, 0);
   op1 = XEXP (x, 1);
diff --git a/gcc/testsuite/gcc.target/aarch64/pr93565.c 
b/gcc/testsuite/gcc.target/aarch64/pr93565.c
new file mode 100644
index 
..7200f80d1bb161f6a058cc6591f61b6b75cf1749
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr93565.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+static const unsigned long long magic = 0x03f08c5392f756cdULL;
+
+static const char table[64] = {
+ 0,  1, 12,  2, 13, 22, 17,  3,
+14, 33, 23, 36, 18, 58, 28,  4,
+62, 15, 34, 26, 24, 48, 50, 37,
+19, 55, 59, 52, 29, 44, 39,  5,
+63, 11, 21, 16, 32, 35, 57, 27,
+61, 25, 47, 49, 54, 51, 43, 38,
+10, 20, 31, 56, 60, 46, 53, 42,
+ 9, 30, 45, 41,  8, 40,  7,  6,
+};
+
+static inline int ctz1 (unsigned long  b)
+{
+  unsigned long lsb = b & -b;
+  return table[(lsb * magic) >> 58];
+}
+
+void f (unsigned long x, int *p)
+{
+  if (x != 0)
+{
+  int a = ctz1 (x);
+  *p = a | p[a];
+}
+}
+
+/* { dg-final { scan-assembler-times "rbit\t" 1 } } */
+/* { dg-final { scan-assembler-times "clz\t" 1 } } */
+




Re: [PATCH] c++: Emit DFP typeinfos even when DFP is disabled [PR92906]

2020-02-12 Thread Jonathan Wakely

On 12/02/20 12:31 +0100, Jakub Jelinek wrote:

Hi!

Before Joseph's changes when compiling
libstdc++-v3/libsupc++/fundamental_type_info.cc
we were emitting
_ZTIPDd, _ZTIPDe, _ZTIPDf, _ZTIPKDd, _ZTIPKDe, _ZTIPKDf, _ZTIDd, _ZTIDe, _ZTIDf
symbols even when DFP wasn't usable, but now we don't and thus those 9
symbols @@CXXABI_1.3.4 are gone from libstdc++.  While nothing could
probably use it (except perhaps dlsym etc.), various tools don't really like
symbols disappearing from symbol versioned shared libraries with stable ABI.
Adding those in assembly would be possible, but would be a portability
nightmare (the PR has something Red Hat uses in libstdc++_nonshared.a, but that
can handle only a handful of linux ELF targets we care about).
So, instead this patch hacks up the FE, so that it emits those, but in a way
that won't make the DFP types available again on targets that don't support
them.

Bootstrapped/regtested on x86_64-linux and i686-linux (where it doesn't
change anything, because DFP is supported), aarch64-linux (where it fixed
FAIL: libstdc++-abi/abi_check
) and armv7hl-linux (where abi_check didn't fail because we don't have
baseline file, but comparing abilist of unpatched and patched libstdc++.so.6
shows:
+_ZTIDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8
+_ZTIDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8
+_ZTIDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8
+_ZTIPDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
+_ZTIPDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
+_ZTIPDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
+_ZTIPKDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
+_ZTIPKDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
+_ZTIPKDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
).  Ok for trunk?


It seems less hacky than the aliases that would be needed in
libstdc++.so to solve it purely in the library.



Re: [PATCH] configure: Re-disable building cross-gdbserver

2020-02-12 Thread Pedro Alves
On 2/11/20 9:01 PM, Maciej W. Rozycki wrote:
> On Tue, 11 Feb 2020, Tom Tromey wrote:
> 
>> Maciej> Correct fallout from commit 919adfe84092 ("Move gdbserver to top 
>> level") 
>> Maciej> and revert to not building `gdbserver' in a cross-configuration, 
>> that is 
>> Maciej> where host != target, matching the documented behaviour.  We have no 
>> way 
>> Maciej> to support non-native `gdbserver', and native `gdbserver' is usually 
>> of 
>> Maciej> no use with cross-GDB of the chosen host.
>>
>> Pedro had a different way to do this, that keeps the decision under
>> gdbserver's control:
>>
>> https://sourceware.org/ml/gdb-patches/2020-02/msg00383.html
> 
>  That's actually quite similar to what I considered first, before I 
> changed my mind.  Whatever.

Doing it in gdbserver/ has the advantage that it stays under gdbserver's
control, so it doesn't need syncing code with the gcc tree.  I know of at
least one off-tree port that uses gdbserver in a host != target scenario,
so I imagine that this condition will evolve over time.

Also, this way, changes in this area don't require running autoconf to
regenerate configure.

I'm not seeing any downside.

> 
>  However I would expect `exit' not to be what we want in a sourced script 
> (I did this differently; see below).

Good point, somehow did not think of that.
It worked in my patch because we source the script in a sub-shell.
But it's clearer/better to not rely on that.

>  
>  case "${host}" in
> +  ${target})
> + gdbserver_host=${host}
> + ;;
> +  *)
> + gdbserver_host=NONE
> + ;;
if/else reads more to-the-point to me, so I tweaked it that
way, and merged it in (to binutils-gdb), like below.

I'm sorry for not noticing your earlier patch.

>From f20e3e823d56e54ffe56792ea6a2fe947c2dec0d Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" 
Date: Wed, 12 Feb 2020 13:50:30 +
Subject: [PATCH] Disable gdbserver on host != target configurations

Correct fallout from commit 919adfe84092 ("Move gdbserver to top level")
and revert to not building `gdbserver' in a cross-configuration, that is
where host != target, matching the documented behaviour.  We have no way
to support non-native `gdbserver', and native `gdbserver' is usually of
no use with cross-GDB of the chosen host.

gdbserver/ChangeLog:
2020-02-12  Maciej W. Rozycki 
Pedro Alves  

Skip building gdbserver in a cross-configuration.
* configure.srv: Set $gdbserver_host depending on whether $target
is $host.  Use $gdbserver_host instead of $host.
---
 gdbserver/ChangeLog |  7 +++
 gdbserver/configure.srv | 11 +--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/gdbserver/ChangeLog b/gdbserver/ChangeLog
index 09707067730..709ef23674c 100644
--- a/gdbserver/ChangeLog
+++ b/gdbserver/ChangeLog
@@ -1,3 +1,10 @@
+2020-02-12  Maciej W. Rozycki 
+   Pedro Alves  
+
+   Skip building gdbserver in a cross-configuration.
+   * configure.srv: Set $gdbserver_host depending on whether $target
+   is $host.  Use $gdbserver_host instead of $host.
+
 2020-02-11  Simon Marchi  
 
* configure: Re-generate.
diff --git a/gdbserver/configure.srv b/gdbserver/configure.srv
index 2e83cbdc07f..375ac0aeb2a 100644
--- a/gdbserver/configure.srv
+++ b/gdbserver/configure.srv
@@ -33,9 +33,16 @@ ipa_ppc_linux_regobj="powerpc-32l-ipa.o 
powerpc-altivec32l-ipa.o powerpc-vsx32l-
 # these files over and over again.
 srv_linux_obj="linux-low.o nat/linux-osdata.o nat/linux-procfs.o 
nat/linux-ptrace.o nat/linux-waitpid.o nat/linux-personality.o 
nat/linux-namespaces.o fork-child.o nat/fork-inferior.o"
 
-# Input is taken from the "${host}" variable.
+# Input is taken from the "${host}" and "${target}" variables.
 
-case "${host}" in
+# GDBserver can only debug native programs.
+if test "${target}" = "${host}"; then
+gdbserver_host=${host}
+else
+gdbserver_host=
+fi
+
+case "${gdbserver_host}" in
   aarch64*-*-linux*)   srv_tgtobj="linux-aarch64-low.o"
srv_tgtobj="$srv_tgtobj nat/aarch64-linux-hw-point.o"
srv_tgtobj="$srv_tgtobj linux-aarch32-low.o"

base-commit: 38de8abe21fe17c31888094bd860a84f88cb5749
-- 
2.14.5



Re: [PATCH] i386: Skip ENDBR32 at nested function entry

2020-02-12 Thread H.J. Lu
On Mon, Feb 10, 2020 at 12:01 PM Uros Bizjak  wrote:
>
> On Mon, Feb 10, 2020 at 8:53 PM H.J. Lu  wrote:
> >
> > On Mon, Feb 10, 2020 at 11:40 AM Uros Bizjak  wrote:
> > >
> > > On Mon, Feb 10, 2020 at 8:22 PM H.J. Lu  wrote:
> > > >
> > > > Since nested function isn't only called directly, there is ENDBR32 at
> > > > function entry and we need to skip it for direct jump in trampoline.
> > >
> > > Hm, I'm afraid I don't understand this comment. Can you perhaps rephrase 
> > > it?
> > >
> >
> > ix86_trampoline_init has
> >
> >  /* Compute offset from the end of the jmp to the target function.
> >  In the case in which the trampoline stores the static chain on
> >  the stack, we need to skip the first insn which pushes the
> >  (call-saved) register static chain; this push is 1 byte.  */
> >   offset += 5;
> >   disp = expand_binop (SImode, sub_optab, fnaddr,
> >plus_constant (Pmode, XEXP (m_tramp, 0),
> >   offset - (MEM_P (chain) ? 1 : 0)),
> >NULL_RTX, 1, OPTAB_DIRECT);
> >   emit_move_insn (mem, disp);
> >
> > Without CET, we got
> >
> > 011 :
> >   11: 56push   %esi
> >   12: 55push   %ebp   << trampoline jumps here.
> >   13: 89 e5mov%esp,%ebp
> >   15: 83 ec 08  sub$0x8,%esp
> >
> > With CET, if bar isn't only called directly, we got
> >
> > 0015 :
> >   15: f3 0f 1e fb  endbr32
> >   19: 56push   %esi
> >   1a: 55push   %ebp    trampoline jumps here.
> >   1b: 89 e5mov%esp,%ebp
> >   1d: 83 ec 08  sub$0x8,%esp
> >
> > We need to add 4 bytes for trampoline to skip endbr32.
> >
> > Here is the updated patch to check if nested function isn't only
> > called directly,
>
> Please figure out the final patch. I don't want to waste my time
> reviewing different version every half hour. Ping me in a couple of
> days.

This is the final version:

https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00586.html

You can try the testcase in the patch on any machine with CET binutils
since ENDBR32 is nop on none-CET machines.  Without this patch,
the test will fail.

Thanks.

-- 
H.J.


[PATCH] c++: Emit DFP typeinfos even when DFP is disabled [PR92906]

2020-02-12 Thread Jakub Jelinek
Hi!

Before Joseph's changes when compiling
libstdc++-v3/libsupc++/fundamental_type_info.cc
we were emitting
_ZTIPDd, _ZTIPDe, _ZTIPDf, _ZTIPKDd, _ZTIPKDe, _ZTIPKDf, _ZTIDd, _ZTIDe, _ZTIDf
symbols even when DFP wasn't usable, but now we don't and thus those 9
symbols @@CXXABI_1.3.4 are gone from libstdc++.  While nothing could
probably use it (except perhaps dlsym etc.), various tools don't really like
symbols disappearing from symbol versioned shared libraries with stable ABI.
Adding those in assembly would be possible, but would be a portability
nightmare (the PR has something Red Hat uses in libstdc++_nonshared.a, but that
can handle only a handful of linux ELF targets we care about).
So, instead this patch hacks up the FE, so that it emits those, but in a way
that won't make the DFP types available again on targets that don't support
them.

Bootstrapped/regtested on x86_64-linux and i686-linux (where it doesn't
change anything, because DFP is supported), aarch64-linux (where it fixed
FAIL: libstdc++-abi/abi_check
) and armv7hl-linux (where abi_check didn't fail because we don't have
baseline file, but comparing abilist of unpatched and patched libstdc++.so.6
shows:
+_ZTIDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8
+_ZTIDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8
+_ZTIDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 8
+_ZTIPDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
+_ZTIPDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
+_ZTIPDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
+_ZTIPKDd@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
+_ZTIPKDe@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
+_ZTIPKDf@@CXXABI_1.3.4 OBJECT WEAK DEFAULT 16
).  Ok for trunk?

2020-02-12  Jakub Jelinek  

PR libstdc++/92906
* cp-tree.h (enum cp_tree_index): Add CPTI_FALLBACK_DFLOAT32_TYPE,
CPTI_FALLBACK_DFLOAT64_TYPE and CPTI_FALLBACK_DFLOAT128_TYPE.
(fallback_dfloat32_type, fallback_dfloat64_type,
fallback_dfloat128_type): Define.
* mangle.c (write_builtin_type): Handle fallback_dfloat*_type like
dfloat*_type_node.
* rtti.c (emit_support_tinfos): Emit DFP typeinfos even when dfp
is disabled for compatibility.

--- gcc/cp/cp-tree.h.jj 2020-02-10 15:02:05.031846484 +0100
+++ gcc/cp/cp-tree.h2020-02-11 23:00:25.561410042 +0100
@@ -206,6 +206,10 @@ enum cp_tree_index
 
 CPTI_SOURCE_LOCATION_IMPL,
 
+CPTI_FALLBACK_DFLOAT32_TYPE,
+CPTI_FALLBACK_DFLOAT64_TYPE,
+CPTI_FALLBACK_DFLOAT128_TYPE,
+
 CPTI_MAX
 };
 
@@ -366,6 +370,12 @@ extern GTY(()) tree cp_global_trees[CPTI
 
 #define access_default_nodenull_node
 
+/* Variant of dfloat{32,64,128}_type_node only used for fundamental
+   rtti purposes if DFP is disabled.  */
+#define fallback_dfloat32_type 
cp_global_trees[CPTI_FALLBACK_DFLOAT32_TYPE]
+#define fallback_dfloat64_type 
cp_global_trees[CPTI_FALLBACK_DFLOAT64_TYPE]
+#define fallback_dfloat128_type
cp_global_trees[CPTI_FALLBACK_DFLOAT128_TYPE]
+
 
 #include "name-lookup.h"
 
--- gcc/cp/mangle.c.jj  2020-01-21 09:13:43.339634944 +0100
+++ gcc/cp/mangle.c 2020-02-11 22:59:28.466265009 +0100
@@ -2569,11 +2569,11 @@ write_builtin_type (tree type)
write_char ('d');
   else if (type == long_double_type_node)
write_char ('e');
-  else if (type == dfloat32_type_node)
+  else if (type == dfloat32_type_node || type == fallback_dfloat32_type)
write_string ("Df");
-  else if (type == dfloat64_type_node)
+  else if (type == dfloat64_type_node || type == fallback_dfloat64_type)
write_string ("Dd");
-  else if (type == dfloat128_type_node)
+  else if (type == dfloat128_type_node || type == fallback_dfloat128_type)
write_string ("De");
   else
gcc_unreachable ();
--- gcc/cp/rtti.c.jj2020-01-21 09:13:43.359634642 +0100
+++ gcc/cp/rtti.c   2020-02-11 22:59:28.467264994 +0100
@@ -1588,6 +1588,20 @@ emit_support_tinfos (void)
   }
   for (tree t = registered_builtin_types; t; t = TREE_CHAIN (t))
 emit_support_tinfo_1 (TREE_VALUE (t));
+  /* For compatibility, emit DFP typeinfos even when DFP isn't enabled,
+ because we've emitted that in the past.  */
+  if (!targetm.decimal_float_supported_p ())
+{
+  gcc_assert (dfloat32_type_node == NULL_TREE
+ && dfloat64_type_node == NULL_TREE
+ && dfloat128_type_node == NULL_TREE);
+  fallback_dfloat32_type = make_node (REAL_TYPE);
+  fallback_dfloat64_type = make_node (REAL_TYPE);
+  fallback_dfloat128_type = make_node (REAL_TYPE);
+  emit_support_tinfo_1 (fallback_dfloat32_type);
+  emit_support_tinfo_1 (fallback_dfloat64_type);
+  emit_support_tinfo_1 (fallback_dfloat128_type);
+}
   input_location = saved_loc;
 }
 

Jakub



Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-02-12 Thread Richard Biener
On Wed, 12 Feb 2020, Segher Boessenkool wrote:

> On Wed, Feb 12, 2020 at 09:12:58AM +0100, Richard Biener wrote:
> > On Tue, 11 Feb 2020, Segher Boessenkool wrote:
> > > Basic block partitioning has wildly disproportionate fallout in all
> > > later passes, both in terms of what those *do* (or don't, if partitioning
> > > is enabled), and of impact on the code (not to mention developer time).
> > > 
> > > Maybe the implementation can be improved, but probably we should do this
> > > in a different way altogether.  The current situation is not good.
> > 
> > I think the expectation that you can go back to CFG layout mode
> > and then work with CFG layout tools after we've lowered to CFG RTL
> > is simply bogus.
> 
> Partitioning is also quite problematic if you do not use cfglayout
> mode.  For example, in shrink-wrapping.  It prevents a lot there.
> 
> > Yeah, you can probably do analysis things but
> > I wouldn't be surprised if a CFG RTL -> CFG layout -> CFG RTL cycle 
> > can wreck things.  Undoubtedly doing CFG manipulations is not going
> > to work since CFG layout does not respect CFG RTL restrictions.
> 
> Doing CFG manipulations on CFG RTL mode directly is almost impossible
> to do correctly.
> 
> For example, bb-reorder.  Which is a much more important optimisation
> than partitioning, btw.

BB reorder switches back and forth as well ... :/

I think both are closely enough related that we probably should do
partitioning from within the same framework?  OTOH BB reorder
happens _much_ later.

> > Partitioning simply uncovered latent bugs, there's nothing wrong
> > with it IMHO.
> 
> I don't agree.  The whole way EDGE_CROSSING works hinders all other
> optimisations a lot.

I'm not sure if it's there for correctness reasons or just to
help checking that nothing "undoes" the partitioning decision.

Richard.

> 
> Segher


Re: [PATCH] i386: Fix up vec_extract_lo* patterns [PR93670]

2020-02-12 Thread Uros Bizjak
On Wed, Feb 12, 2020 at 10:27 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The VEXTRACT* insns have way too many different CPUID feature flags (ATT
> syntax)
> vextractf128 $imm, %ymm, %xmm/mem   AVX
> vextracti128 $imm, %ymm, %xmm/mem   AVX2
> vextract{f,i}32x4 $imm, %ymm, %xmm/mem {k}{z}   AVX512VL+AVX512F
> vextract{f,i}32x4 $imm, %zmm, %xmm/mem {k}{z}   AVX512F
> vextract{f,i}64x2 $imm, %ymm, %xmm/mem {k}{z}   AVX512VL+AVX512DQ
> vextract{f,i}64x2 $imm, %zmm, %xmm/mem {k}{z}   AVX512DQ
> vextract{f,i}32x8 $imm, %zmm, %ymm/mem {k}{z}   AVX512DQ
> vextract{f,i}64x4 $imm, %zmm, %ymm/mem {k}{z}   AVX512F
>
> As the testcase shows and the patch too, we didn't get it right in all
> cases.
>
> The first hunk is about avx512vl_vextractf128v8s[if] incorrectly
> requiring TARGET_AVX512DQ.  The corresponding insn is the first
> vextract{f,i}32x4 above, so it requires VL+F, and the builtins have it
> correct (TARGET_AVX512VL implies TARGET_AVX512F):
> BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8sf, 
> "__builtin_ia32_extractf32x4_256_mask", IX86_BUILTIN_EXTRACTF32X4_256, 
> UNKNOWN, (int) V4SF_FTYPE_V8SF_INT_V4SF_UQI)
> BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8si, 
> "__builtin_ia32_extracti32x4_256_mask", IX86_BUILTIN_EXTRACTI32X4_256, 
> UNKNOWN, (int) V4SI_FTYPE_V8SI_INT_V4SI_UQI)
> We only need TARGET_AVX512DQ for avx512vl_vextractf128v4d[if].
>
> The second hunk is about vec_extract_lo_v16s[if]{,_mask}.  These are using
> the vextract{f,i}32x8 insns (AVX512DQ above), but we weren't requiring that,
> but instead incorrectly && 1 for non-masked and && (64 == 64 && 
> TARGET_AVX512VL)
> for masked insns.  This is extraction from ZMM, so it doesn't need VL for
> anything.  The hunk actually only requires TARGET_AVX512DQ when the insn
> is masked, if it is not masked, when TARGET_AVX512DQ isn't available we can
> use vextract{f,i}64x4 instead which is available already in TARGET_AVX512F
> and does the same thing, extracts the low 256 bits from 512 bits vector
> (often we split it into just nothing, but there are some special cases like
> when using xmm16+ when we can't without AVX512VL).
>
> The last hunk is about vec_extract_lo_v8s[if]{,_mask}.  The non-_mask
> suffixed ones are ok already and just split into nothing (lowpart subreg).
> The masked ones were incorrectly requiring TARGET_AVX512VL and
> TARGET_AVX512DQ, when we only need TARGET_AVX512VL.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2020-02-12  Jakub Jelinek  
>
> PR target/93670
> * config/i386/sse.md (VI48F_256_DQ): New mode iterator.
> (avx512vl_vextractf128): Use it instead of VI48F_256.  Remove
> TARGET_AVX512DQ from condition.
> (vec_extract_lo_): Use 
> instead of  in condition.  If
> TARGET_AVX512DQ is false, emit vextract*64x4 instead of
> vextract*32x8.
> (vec_extract_lo_): Drop 
> from condition.
>
> * gcc.target/i386/avx512vl-pr93670.c: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2020-02-11 14:54:38.017593464 +0100
> +++ gcc/config/i386/sse.md  2020-02-11 15:50:59.629130828 +0100
> @@ -8719,13 +8719,16 @@ (define_insn "vec_extract_hi_ (set_attr "prefix" "evex")
> (set_attr "mode" "")])
>
> +(define_mode_iterator VI48F_256_DQ
> +  [V8SI V8SF (V4DI "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ")])
> +
>  (define_expand "avx512vl_vextractf128"
>[(match_operand: 0 "nonimmediate_operand")
> -   (match_operand:VI48F_256 1 "register_operand")
> +   (match_operand:VI48F_256_DQ 1 "register_operand")
> (match_operand:SI 2 "const_0_to_1_operand")
> (match_operand: 3 "nonimm_or_0_operand")
> (match_operand:QI 4 "register_operand")]
> -  "TARGET_AVX512DQ && TARGET_AVX512VL"
> +  "TARGET_AVX512VL"
>  {
>rtx (*insn)(rtx, rtx, rtx, rtx);
>rtx dest = operands[0];
> @@ -8793,14 +8796,19 @@ (define_insn "vec_extract_lo_   (const_int 4) (const_int 5)
>   (const_int 6) (const_int 7)])))]
>"TARGET_AVX512F
> -   && 
> +   && 
> && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
>  {
>if (
>|| (!TARGET_AVX512VL
>   && !REG_P (operands[0])
>   && EXT_REX_SSE_REG_P (operands[1])))
> -return "vextract32x8\t{$0x0, %1, 
> %0|%0, %1, 0x0}";
> +{
> +  if (TARGET_AVX512DQ)
> +   return "vextract32x8\t{$0x0, %1, 
> %0|%0, %1, 0x0}";
> +  else
> +   return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}";
> +}
>else
>  return "#";
>  }
> @@ -8910,7 +8918,7 @@ (define_insn "vec_extract_lo_   (parallel [(const_int 0) (const_int 1)
>  (const_int 2) (const_int 3)])))]
>"TARGET_AVX
> -   &&  && 
> +   && 
> && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
>  {
>if ()
> --- gcc/testsuite/gcc.target/i386/avx512vl-pr93670.c.jj 2020-02-11 
> 16:00:14.874930873 +0100
> +++ 

Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-02-12 Thread Segher Boessenkool
On Wed, Feb 12, 2020 at 09:12:58AM +0100, Richard Biener wrote:
> On Tue, 11 Feb 2020, Segher Boessenkool wrote:
> > Basic block partitioning has wildly disproportionate fallout in all
> > later passes, both in terms of what those *do* (or don't, if partitioning
> > is enabled), and of impact on the code (not to mention developer time).
> > 
> > Maybe the implementation can be improved, but probably we should do this
> > in a different way altogether.  The current situation is not good.
> 
> I think the expectation that you can go back to CFG layout mode
> and then work with CFG layout tools after we've lowered to CFG RTL
> is simply bogus.

Partitioning is also quite problematic if you do not use cfglayout
mode.  For example, in shrink-wrapping.  It prevents a lot there.

> Yeah, you can probably do analysis things but
> I wouldn't be surprised if a CFG RTL -> CFG layout -> CFG RTL cycle 
> can wreck things.  Undoubtedly doing CFG manipulations is not going
> to work since CFG layout does not respect CFG RTL restrictions.

Doing CFG manipulations on CFG RTL mode directly is almost impossible
to do correctly.

For example, bb-reorder.  Which is a much more important optimisation
than partitioning, btw.

> Partitioning simply uncovered latent bugs, there's nothing wrong
> with it IMHO.

I don't agree.  The whole way EDGE_CROSSING works hinders all other
optimisations a lot.


Segher


Re: [cris-decc0 0/14] A set of compare-elimination-fixes.

2020-02-12 Thread Eric Botcazou
> I just rebased and updated the vendors/axis branch
> axis/cris-decc0 with the following commits, which should bring
> back compare-elimination results to that of cc0 on master.

Nice work!  An example of transition done properly...

> With the exception of the bit-test patterns (btst / btstq which
> is more of a "combine" matter), everything is centered around
> working together with the "cmpelim" pass with the help of
> define_subst attributes.

Feel free to further tweak the cmpelim pass if need be; I did it for the 
Visium so there is a precedent. :-)

> No performance tests yet though, but I expect axis/cris-decc0 to
> be a win over master, since as I've mentioned before, I see
> improvements in register-allocation already in libgcc, which
> should get back what's lost in all the special patterns I
> deleted.  I haven't looked into the cause, but it shouldn't
> surprise anyone that there's some noticeable goodies inside
> something to the effect of #ifndef HAVE_cc0, even with IRA.
> (Conversion to LRA is way down on the TODO list.)

For the Visium, the transition was a win overall, with sporadic regressions in 
delay slot filling.  The transition to LRA looks more problematic.

-- 
Eric Botcazou


[PATCH] i386: Fix up vec_extract_lo* patterns [PR93670]

2020-02-12 Thread Jakub Jelinek
Hi!

The VEXTRACT* insns have way too many different CPUID feature flags (ATT
syntax)
vextractf128 $imm, %ymm, %xmm/mem   AVX
vextracti128 $imm, %ymm, %xmm/mem   AVX2
vextract{f,i}32x4 $imm, %ymm, %xmm/mem {k}{z}   AVX512VL+AVX512F
vextract{f,i}32x4 $imm, %zmm, %xmm/mem {k}{z}   AVX512F
vextract{f,i}64x2 $imm, %ymm, %xmm/mem {k}{z}   AVX512VL+AVX512DQ
vextract{f,i}64x2 $imm, %zmm, %xmm/mem {k}{z}   AVX512DQ
vextract{f,i}32x8 $imm, %zmm, %ymm/mem {k}{z}   AVX512DQ
vextract{f,i}64x4 $imm, %zmm, %ymm/mem {k}{z}   AVX512F

As the testcase shows and the patch too, we didn't get it right in all
cases.

The first hunk is about avx512vl_vextractf128v8s[if] incorrectly
requiring TARGET_AVX512DQ.  The corresponding insn is the first
vextract{f,i}32x4 above, so it requires VL+F, and the builtins have it
correct (TARGET_AVX512VL implies TARGET_AVX512F):
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8sf, 
"__builtin_ia32_extractf32x4_256_mask", IX86_BUILTIN_EXTRACTF32X4_256, UNKNOWN, 
(int) V4SF_FTYPE_V8SF_INT_V4SF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8si, 
"__builtin_ia32_extracti32x4_256_mask", IX86_BUILTIN_EXTRACTI32X4_256, UNKNOWN, 
(int) V4SI_FTYPE_V8SI_INT_V4SI_UQI)
We only need TARGET_AVX512DQ for avx512vl_vextractf128v4d[if].

The second hunk is about vec_extract_lo_v16s[if]{,_mask}.  These are using
the vextract{f,i}32x8 insns (AVX512DQ above), but we weren't requiring that,
but instead incorrectly && 1 for non-masked and && (64 == 64 && TARGET_AVX512VL)
for masked insns.  This is extraction from ZMM, so it doesn't need VL for
anything.  The hunk actually only requires TARGET_AVX512DQ when the insn
is masked, if it is not masked, when TARGET_AVX512DQ isn't available we can
use vextract{f,i}64x4 instead which is available already in TARGET_AVX512F
and does the same thing, extracts the low 256 bits from 512 bits vector
(often we split it into just nothing, but there are some special cases like
when using xmm16+ when we can't without AVX512VL).

The last hunk is about vec_extract_lo_v8s[if]{,_mask}.  The non-_mask
suffixed ones are ok already and just split into nothing (lowpart subreg).
The masked ones were incorrectly requiring TARGET_AVX512VL and
TARGET_AVX512DQ, when we only need TARGET_AVX512VL.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-02-12  Jakub Jelinek  

PR target/93670
* config/i386/sse.md (VI48F_256_DQ): New mode iterator.
(avx512vl_vextractf128): Use it instead of VI48F_256.  Remove
TARGET_AVX512DQ from condition.
(vec_extract_lo_): Use 
instead of  in condition.  If
TARGET_AVX512DQ is false, emit vextract*64x4 instead of
vextract*32x8.
(vec_extract_lo_): Drop 
from condition.

* gcc.target/i386/avx512vl-pr93670.c: New test.

--- gcc/config/i386/sse.md.jj   2020-02-11 14:54:38.017593464 +0100
+++ gcc/config/i386/sse.md  2020-02-11 15:50:59.629130828 +0100
@@ -8719,13 +8719,16 @@ (define_insn "vec_extract_hi_")])
 
+(define_mode_iterator VI48F_256_DQ
+  [V8SI V8SF (V4DI "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ")])
+
 (define_expand "avx512vl_vextractf128"
   [(match_operand: 0 "nonimmediate_operand")
-   (match_operand:VI48F_256 1 "register_operand")
+   (match_operand:VI48F_256_DQ 1 "register_operand")
(match_operand:SI 2 "const_0_to_1_operand")
(match_operand: 3 "nonimm_or_0_operand")
(match_operand:QI 4 "register_operand")]
-  "TARGET_AVX512DQ && TARGET_AVX512VL"
+  "TARGET_AVX512VL"
 {
   rtx (*insn)(rtx, rtx, rtx, rtx);
   rtx dest = operands[0];
@@ -8793,14 +8796,19 @@ (define_insn "vec_extract_lo_
+   && 
&& ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if (
   || (!TARGET_AVX512VL
  && !REG_P (operands[0])
  && EXT_REX_SSE_REG_P (operands[1])))
-return "vextract32x8\t{$0x0, %1, 
%0|%0, %1, 0x0}";
+{
+  if (TARGET_AVX512DQ)
+   return "vextract32x8\t{$0x0, %1, 
%0|%0, %1, 0x0}";
+  else
+   return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}";
+}
   else
 return "#";
 }
@@ -8910,7 +8918,7 @@ (define_insn "vec_extract_lo_ && 
+   && 
&& ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
 {
   if ()
--- gcc/testsuite/gcc.target/i386/avx512vl-pr93670.c.jj 2020-02-11 
16:00:14.874930873 +0100
+++ gcc/testsuite/gcc.target/i386/avx512vl-pr93670.c2020-02-11 
15:59:01.252019025 +0100
@@ -0,0 +1,77 @@
+/* PR target/93670 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512vl -mno-avx512dq" } */
+
+#include 
+
+__m128i
+f1 (__m256i x)
+{
+  return _mm256_extracti32x4_epi32 (x, 0);
+}
+
+__m128i
+f2 (__m256i x, __m128i w, __mmask8 m)
+{
+  return _mm256_mask_extracti32x4_epi32 (w, m, x, 0);
+}
+
+__m128i
+f3 (__m256i x, __mmask8 m)
+{
+  return _mm256_maskz_extracti32x4_epi32 (m, x, 0);
+}
+
+__m128
+f4 (__m256 x)
+{
+  return _mm256_extractf32x4_ps (x, 0);
+}
+
+__m128
+f5 (__m256 x, __m128 w, __mmask8 m)

Re: [PATCH] [arm] Implement Armv8.1-M low overhead loops

2020-02-12 Thread Roman Zhuykov
Hello!

11.02.2020 16:40, Andrea Corallo wrote:
> Hi Richard,
>
> "Richard Earnshaw (lists)"  writes:
>
>>> gcc/ChangeLog:
>>> 2020-??-??  Andrea Corallo  
>>> 2020-??-??  Mihail-Calin Ionescu  
>>> 2020-??-??  Iain Apreotesei  
>>>  * config/arm/arm.c (TARGET_INVALID_WITHIN_DOLOOP):
>>>  (arm_invalid_within_doloop): Implement invalid_within_doloop hook.
>>>  * config/arm/arm.h (TARGET_HAVE_LOB): Add new macro.
>>>  * config/arm/thumb2.md (*doloop_end, doloop_begin, dls_insn):
>>>  Add new patterns.
>>>  * config/arm/unspecs.md: Add new unspec.
>>>
>> A date should only appear before the first author in a multi-author
>> patch, other authors should then be indented to align with the name of
>> that first author.
> Ack
This patch is stage1 material, right?
>
>> +(define_insn "*doloop_end"
>> +  [(parallel [(set (pc)
>> +   (if_then_else
>> +   (ne (reg:SI LR_REGNUM) (const_int 1))
>> + (label_ref (match_operand 0 "" ""))
>> + (pc)))
>> +  (set (reg:SI LR_REGNUM)
>> +   (plus:SI (reg:SI LR_REGNUM) (const_int -1)))])]
>> +  "TARGET_32BIT && TARGET_HAVE_LOB && !flag_modulo_sched"
>> +  "le\tlr, %l0")
I'm not an expert in .md files, but having that "!flag_modulo_sched"
condition seems wrong to me.  What was the issue on SMS side to add that?
Currently, there are fake doloop_end pattern on ARM.  It is generated
only when flag_modulo_sched is set and actually expands to more than one
instruction.  This old approach have its pros and cons.  When we
HAVE_LOB, target allows us to use a real doloop_end instruction, fake
one is not needed at all.  In this case compiler should use real
instruction regardless whether SMS in on or off.

I hope in stage1 after upgrading modulo scheduler, we will restart old
discussion about removing fake doloop_end pattern for ARM:
https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01812.html
https://gcc.gnu.org/ml/gcc-patches/2012-01/msg00195.html
Aarch64 also have such a fake pattern since 2014, probably its removal
also will be considered.

Roman
>> Is it deliberate that this pattern name has a '*' prefix?  doloop_end
>> is a named expansion pattern according to md.texi.
> Yes, this should be expanded already by the define_expand we have in
> thumb2.md.  Perhaps I'll call it 'doloop_end_internal' and add a
> comment.
>
>> Also, hard-coded register names should be prefixed with '%|' (so
>> "%|lr", not just "lr"), just in case the assembler dialect requires
>> something (ELF doesn't but others have).  Also for dls_insn.
> Ack
>
>> For the tests, your 'require-effective-taret' tests look insufficient
>> to prevent problems when testing a multilib environment, you'll need
>> (at least) checks that a) passing -marm has not happened and b) that
>> the architecture, or a specific CPU isn't being passed on the command
>> line.
> Ack
>
> Thanks for reviewing I'll update the patch.
>
> Andrea



[PATCH] testsuite/93697 fix inconsistent warning in testcase

2020-02-12 Thread Richard Biener
The warning was emitted inconsistently on targets, so disable it since
the testcase was for an ICE.

2020-02-12  Richard Biener  

PR testsuite/93697
* gcc.dg/pr93661.c: Pass -w, remove dg-warning.
---
 gcc/testsuite/gcc.dg/pr93661.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr93661.c b/gcc/testsuite/gcc.dg/pr93661.c
index e311ba545c4..bc77fcab69e 100644
--- a/gcc/testsuite/gcc.dg/pr93661.c
+++ b/gcc/testsuite/gcc.dg/pr93661.c
@@ -1,9 +1,9 @@
 /* { dg-do compile } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -w" } */
 
 int f ()
 {
   unsigned x = 0x;
-  __builtin_memset (1+(char *) , 0, -1); /* { dg-warning "maximum object 
size" } */
+  __builtin_memset (1+(char *) , 0, -1);
   return (x != 0xf000);
 }
-- 
2.16.4


Re: [RFC] [c-family] PR92867 - Add returns_arg attribute

2020-02-12 Thread Richard Biener
On Wed, Feb 12, 2020 at 7:52 AM Prathamesh Kulkarni
 wrote:
>
> On Tue, 4 Feb 2020 at 14:54, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 3 Feb 2020 at 14:56, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 3 Feb 2020 at 14:41, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Thu, 30 Jan 2020 at 19:17, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Thu, Jan 30, 2020 at 11:49 AM Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, 29 Jan 2020 at 14:38, Richard Biener 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Jan 28, 2020 at 1:02 PM Jakub Jelinek  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Tue, Jan 28, 2020 at 05:09:36PM +0530, Prathamesh Kulkarni 
> > > > > > > > wrote:
> > > > > > > > > On Tue, 28 Jan 2020 at 17:00, Jakub Jelinek 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 28, 2020 at 04:56:59PM +0530, Prathamesh 
> > > > > > > > > > Kulkarni wrote:
> > > > > > > > > > > Thanks for the suggestions. In the attached patch I 
> > > > > > > > > > > bumped up value of
> > > > > > > > > > > ERF_RETURNS_ARG_MASK
> > > > > > > > > > > to UINT_MAX >> 2, and use highest two bits for 
> > > > > > > > > > > ERF_NOALIAS and ERF_RETURNS_ARG.
> > > > > > > > > > > And use fn spec "Z" to store the argument number 
> > > > > > > > > > > in fn-spec format.
> > > > > > > > > > > Does that look OK ?
> > > > > > > > > >
> > > > > > > > > > No.
> > > > > > > > > >
> > > > > > > > > > +#define ERF_RETURN_ARG_MASK(UINT_MAX >> 2)
> > > > > > > > > >
> > > > > > > > > >  /* Nonzero if the return value is equal to the argument 
> > > > > > > > > > number
> > > > > > > > > > flags & ERF_RETURN_ARG_MASK.  */
> > > > > > > > > > -#define ERF_RETURNS_ARG(1 << 2)
> > > > > > > > > > +#define ERF_RETURNS_ARG(1 << 
> > > > > > > > > > (BITS_PER_WORD - 2))
> > > > > > > > > >
> > > > > > > > > > How is size of host int related to BITS_PER_WORD?  Not to 
> > > > > > > > > > mention that
> > > > > > > > > > if BITS_PER_WORD is 64 and host int is 32-bit, 1 << (64 - 
> > > > > > > > > > 2) is UB.
> > > > > > > > > Oops sorry. I should have used HOST_BITS_PER_INT.
> > > > > > > > > I assume that'd be correct ?
> > > > > > > >
> > > > > > > > It still wouldn't, 1 << (HOST_BITS_PER_INT - 1) is negative 
> > > > > > > > number, you'd
> > > > > > > > need either 1U and verify all ERF_* flags uses, or avoid using 
> > > > > > > > the sign bit.
> > > > > > > > The patch has other issues, you don't verify that the argnum 
> > > > > > > > fits into
> > > > > > > > the bits (doesn't overflow into the other ERF_* bits), in
> > > > > > > > +  char *s = (char *) xmalloc (sizeof (char) * BITS_PER_WORD);
> > > > > > > > +  s[0] = 'Z';
> > > > > > > > +  sprintf (s + 1, "%lu", argnum);
> > > > > > > > 1) sizeof (char) is 1 by definition
> > > > > > > > 2) it is pointless to allocate it and then deallocate (just use 
> > > > > > > > automatic
> > > > > > > > array)
> > > > > > > > 3) it is unclear how is BITS_PER_WORD related to the length of 
> > > > > > > > decimal
> > > > > > > > encoded string + Z char + terminating '\0'.  The usual way is 
> > > > > > > > for unsigned
> > > > > > > > numbers to estimate number of digits by counting 3 digits per 
> > > > > > > > each 8 bits,
> > > > > > > > in your case of course + 2.
> > > > > > > >
> > > > > > > > More importantly, the "fn spec" attribute isn't used just in
> > > > > > > > gimple_call_return_flags, but also in e.g. 
> > > > > > > > gimple_call_arg_flags which
> > > > > > > > assumes that the return stuff in there is a single char and the 
> > > > > > > > reaming
> > > > > > > > chars are for argument descriptions, or in decl_return_flags 
> > > > > > > > which you
> > > > > > > > haven't modified.
> > > > > > > >
> > > > > > > > I must say I fail to see the point in trying to glue this 
> > > > > > > > together into the
> > > > > > > > "fn spec" argument so incompatibly, why can't we handle the fn 
> > > > > > > > spec with its
> > > > > > > > 1-4 returns_arg and returns_arg attribute with arbitrary 
> > > > > > > > position
> > > > > > > > side-by-side?
> > > > > > >
> > > > > > > Yeah, I wouldn't have added "fn spec" for "returns_arg" but kept 
> > > > > > > the
> > > > > > > query interface thus access it via gimple_call_return_flags and 
> > > > > > > use
> > > > > > > ERF_*.  For the flags adjustment just up the maximum argument
> > > > > > > to 1<<15 then the argument number is also nicely aligned, no need
> > > > > > > to do fancy limiting that depends on the host.  For too large
> > > > > > > argument numbers just warn the attribute is ignored.  I'd say even
> > > > > > > a max of 255 is sane just the existing limit is a bit too low.
> > > > > > Hi,
> > > > > > Thanks for the suggestions. In the attached patch, I use TREE_VALUE
> > > > > > (attr) to store/retrieve
> > > > > > arbitrary argument position, and have bumped ERF_RETURNS_ARG_MASK 
> > > > > > to 

Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-02-12 Thread Richard Biener
On Tue, 11 Feb 2020, Segher Boessenkool wrote:

> Hi!
> 
> On Tue, Feb 11, 2020 at 03:46:05PM +0300, Roman Zhuykov wrote:
> > Hmm, even when trying to move it just few passes earlier many years ago,
> > got another opinion:
> > https://gcc.gnu.org/ml/gcc-patches/2011-10/msg01526.html
> > Although without such a move we still have annoying issues which RTL
> > folks can't solve, see e.q.
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93264#c2
> 
> Basic block partitioning has wildly disproportionate fallout in all
> later passes, both in terms of what those *do* (or don't, if partitioning
> is enabled), and of impact on the code (not to mention developer time).
> 
> Maybe the implementation can be improved, but probably we should do this
> in a different way altogether.  The current situation is not good.

I think the expectation that you can go back to CFG layout mode
and then work with CFG layout tools after we've lowered to CFG RTL
is simply bogus.  Yeah, you can probably do analysis things but
I wouldn't be surprised if a CFG RTL -> CFG layout -> CFG RTL cycle 
can wreck things.  Undoubtedly doing CFG manipulations is not going
to work since CFG layout does not respect CFG RTL restrictions.

Partitioning simply uncovered latent bugs, there's nothing wrong
with it IMHO.

Richard.


Re: [PATCH 0/4 GCC11] IVOPTs consider step cost for different forms when unrolling

2020-02-12 Thread Richard Biener
On Tue, 11 Feb 2020, Segher Boessenkool wrote:

> On Tue, Feb 11, 2020 at 02:58:47PM +0100, Richard Biener wrote:
> > On Tue, 11 Feb 2020, Roman Zhuykov wrote:
> > > 11.02.2020 11:01, Richard Biener wrote:
> > > Sound good, but IMHO modulo scheduler is not the best choice to be the
> > > first step implementing such a concept.
> > 
> > True ;)   But since the context of this thread is unrolling ...
> > Not sure how you'd figure the unroll factor to apply if you want
> > to do unrolling within a classical scheduling framework?  Maybe
> > unroll as much as you can fill slots until the last instruction
> > of the first iteration retires?
> 
> That will be terrible on register-rich architectures: it *already* is
> problematic how often some things are unrolled, blindly unrolling more
> would make things worse.  We need to unroll more where it helps, but
> less where it does not.  For that we need a good cost/benefit estimate.

True.  For x86 we tried but did not come up with a sensible estimate
(probably the x86 uarchs are way too complicated to understand).

Richard.