[PATCH v2] C++: add type checking for static local vector variable in template

2021-09-17 Thread wangpc via Gcc-patches
This patch moves verify_type_context from start_decl_1 to cp_finish_decl to
do more type checking such as static local vector variable in C++ template.

2021-08-06  wangpc  

gcc/cp/ChangeLog

* decl.c (start_decl_1): Move verify_type_context to ...
(cp_finish_decl): ... to here.

gcc/testsuite/ChangeLog

* g++.target/aarch64/sve/static-var-in-template.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90111e4c786..deaa6c56a8f 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -5491,13 +5491,6 @@ start_decl_1 (tree decl, bool initialized)
   cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
 }
 
-  if (is_global_var (decl))
-{
-  type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
-  ? TCTX_THREAD_STORAGE
-  : TCTX_STATIC_STORAGE);
-  verify_type_context (input_location, context, TREE_TYPE (decl));
-}
   if (initialized)
 /* Is it valid for this decl to have an initializer at all?  */
 {
@@ -7520,6 +7513,14 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
   && DECL_INITIALIZED_IN_CLASS_P (decl))
 check_static_variable_definition (decl, type);
 
+  if (!processing_template_decl && VAR_P (decl) && is_global_var (decl))
+{
+  type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
+  ? TCTX_THREAD_STORAGE
+  : TCTX_STATIC_STORAGE);
+  verify_type_context (input_location, context, TREE_TYPE (decl));
+}
+
   if (init && TREE_CODE (decl) == FUNCTION_DECL)
 {
   tree clone;
diff --git a/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C 
b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
new file mode 100644
index 000..74237ff7c57
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+#include 
+
+template 
+void f()
+{
+static svbool_t pg = svwhilelt_b64(0, N);
+}
+
+int main(int argc, char **argv)
+{
+f<2>();
+return 0;
+}
+
+/* { dg-error "SVE type 'svbool_t' does not have a fixed size" "" { target 
*-*-* } 0 } */
-- 
2.33.0.windows.1



Re: [PATCH/RFC 1/2] WPD: Enable whole program devirtualization

2021-09-17 Thread Jason Merrill via Gcc-patches

On 9/16/21 22:29, Feng Xue OS wrote:

On 9/16/21 05:25, Feng Xue OS via Gcc-patches wrote:

This and following patches are composed to enable full devirtualization
under whole program assumption (so also called whole-program
devirtualization, WPD for short), which is an enhancement to current
speculative devirtualization. The base of the optimization is how to
identify class type that is local in terms of whole-program scope, at
least  those class types in libstdc++ must be excluded in some way.
Our means is to use typeinfo symbol as identity marker of a class since
it is unique and always generated once the class or its derived type
is instantiated somewhere, and rely on symbol resolution by
lto-linker-plugin to detect whether  a typeinfo is referenced by regular
object/library, which indirectly tells class types are escaped or not.
The RFC at https://gcc.gnu.org/pipermail/gcc/2021-August/237132.html
gives more details on that.

Bootstrapped/regtested on x86_64-linux and aarch64-linux.

Thanks,
Feng


2021-09-07  Feng Xue  

gcc/
   * common.opt (-fdevirtualize-fully): New option.
   * class.c (build_rtti_vtbl_entries): Force generation of typeinfo
   even -fno-rtti is specificied under full devirtualization.


This makes -fno-rtti useless; rather than this, you should warn about
the combination of flags and force flag_rtti on.  It also sounds like
you depend on the library not being built with -fno-rtti.


Although rtti is generated by front-end, we will remove it after lto symtab
merge, which is meant to keep same behavior as -fno-rtti.


Ah, the cp/ change is OK, then, with a comment about that.


Yes, regular library to be linked with should contain rtti data, otherwise
WPD could not deduce class type usage safely. By default, we can think
that it should work for libstdc++, but it probably becomes a problem for
user library, which might be avoided if we properly document this
requirement and suggest user doing that when using WPD.


Yes, I would expect that external libraries would be built with RTTI on 
to allow users to use RTTI features even if they aren't used within the 
library.  But it's good to document it as a requirement.



+ /* If a class with virtual base is only instantiated as
+subobjects of derived classes, and has no complete object in
+compilation unit, merely construction vtables will be involved,
+its primary vtable is really not needed, and subject to being
+removed.  So once a vtable node is encountered, for all
+polymorphic base classes of the vtable's context class, always
+force generation of primary vtable nodes when full
+devirtualization is enabled.  */


Why do you need the primary vtable if you're relying on RTTI info? 
Construction vtables will point to the same RTTI node.



+ /* Public class w/o key member function (or local class in a public
+inline function) requires COMDAT-like vtable so as to be shared
+among units.  But C++ privatizing via -fno-weak would introduce
+multiple static vtable copies for one class in merged lto symbol
+table.  This breaks one-to-one correspondence between class and
+vtable, and makes class liveness check become not that easy.  To
+be simple, we exclude such kind of class from our choice list.


Same question.  Also, why would you use -fno-weak?  Forcing multiple 
copies of things we're perfectly capable of combining seems like a 
strange choice.  You can privatize things with the symbol visibility 
controls or RTLD_LOCAL.


Jason



Re: [committed] Fortran: Prefer GCC internal macros to float.h in ISO_Fortran_binding.h (was: [PATCH, Fortran] Revert to non-multilib-specific ISO_Fortran_binding.h)

2021-09-17 Thread Gerald Pfeifer
On Fri, 17 Sep 2021, Tobias Burnus wrote:
> I have now committed the attached patch as r12-3621. It includes the
> patch by Sandra
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579372.html
> (approved 3 days ago) plus adding the "== 53" similar to above.

Thank you, Tobias; thank you, everyone!

> Hopefully, we can now close this issue.

My nightly tester passed the build and is currently running the
testsuite, so looks good.

Gerald


[PATCH] libstdc++: Implement C++20 atomic and atomic

2021-09-17 Thread Thomas Rodgers
From: Thomas Rodgers 

Let's try this one instead.

Signed-off-by: Thomas Rodgers 

libstdc++-v3/ChangeLog:
* acinclude.m4: Update ABI version.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.21): Do not match new _Sp_locker
constructor.
(GLIBCXX_3.4.30): Export _Sp_locker::_M_wait/_M_notify and new
constructor.
* include/bits/shared_ptr_atomic.h: define __cpp_lib_atomic_shared_ptr
feature test macro.
(_Sp_locker::_Sp_locker(const void*, bool): New constructor.
(_Sp_locker::_M_wait()), _Sp_locker::_M_notify()): New methods.
(_Sp_impl): New type.
(atomic>): New partial template specialization.
(atomic>): New partial template specialization.
* include/std/version: define __cpp_lib_atomic_shared_ptr feature
test macro.
* doc/xml/manual/abi.xml: New ABI version.
* src/c++11/Makefile.am: Compile src/c++11/shared_ptr.cc
-std=gnu++20.
* src/c++11/Makefile.in: Regenerate.
* src/c++11/shared_ptr.cc (_Sp_locker::_Sp_locker(const void*, bool),
_Sp_locker::_M_wait(), _Sp_locker::_M_notify(): Implement.
* testsuite/20_util/shared_ptr/atomic/4.cc: New test.
* testsuite/20_util/shared_ptr/atomic/5.cc: Likewise.
* testsuite/20_util/shared_ptr/atomic/atomic_shared_ptr.cc: Likewise.
* testuite/util/testsuite_abi.cc: New ABI version.
---
 libstdc++-v3/acinclude.m4 |   2 +-
 libstdc++-v3/config/abi/pre/gnu.ver   |  12 +-
 libstdc++-v3/configure|   2 +-
 libstdc++-v3/doc/xml/manual/abi.xml   |   1 +
 libstdc++-v3/include/bits/shared_ptr_atomic.h | 309 ++
 libstdc++-v3/include/std/version  |   1 +
 libstdc++-v3/src/c++11/Makefile.am|   6 +
 libstdc++-v3/src/c++11/Makefile.in|   6 +
 libstdc++-v3/src/c++11/shared_ptr.cc  |  86 -
 .../testsuite/20_util/shared_ptr/atomic/4.cc  |  28 ++
 .../testsuite/20_util/shared_ptr/atomic/5.cc  |  28 ++
 .../shared_ptr/atomic/atomic_shared_ptr.cc| 159 +
 libstdc++-v3/testsuite/util/testsuite_abi.cc  |   3 +-
 13 files changed, 637 insertions(+), 6 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/shared_ptr/atomic/4.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/shared_ptr/atomic/5.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/shared_ptr/atomic/atomic_shared_ptr.cc

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 90ecc4a87a2..30a4abb98b3 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -3798,7 +3798,7 @@ changequote([,])dnl
 fi
 
 # For libtool versioning info, format is CURRENT:REVISION:AGE
-libtool_VERSION=6:29:0
+libtool_VERSION=6:30:0
 
 # Everything parsed; figure out what files and settings to use.
 case $enable_symvers in
diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 5323c7f0604..727afd2d488 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1705,8 +1705,9 @@ GLIBCXX_3.4.21 {
 # std::ctype_base::blank
 _ZNSt10ctype_base5blankE;
 
-# std::_Sp_locker::*
-_ZNSt10_Sp_locker[CD]*;
+# std::_Sp_locker:: constructors and destructors
+_ZNSt10_Sp_lockerC*[^b];
+_ZNSt10_Sp_lockerD*;
 
 # std::notify_all_at_thread_exit
 
_ZSt25notify_all_at_thread_exitRSt18condition_variableSt11unique_lockISt5mutexE;
@@ -2397,6 +2398,13 @@ GLIBCXX_3.4.29 {
 
 } GLIBCXX_3.4.28;
 
+GLIBCXX_3.4.30 {
+  # std::_Sp_locker:: wait/notify support
+  _ZNSt10_Sp_lockerC*[b];
+  _ZNSt10_Sp_locker7_M_waitEv;
+  _ZNSt10_Sp_locker9_M_notifyEv;
+} GLIBCXX_3.4.29;
+
 # Symbols in the support library (libsupc++) have their own tag.
 CXXABI_1.3 {
 
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 13d52eb0c0e..67ee6db1bea 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -74684,7 +74684,7 @@ $as_echo "$as_me: WARNING: === Symbol versioning will 
be disabled." >&2;}
 fi
 
 # For libtool versioning info, format is CURRENT:REVISION:AGE
-libtool_VERSION=6:29:0
+libtool_VERSION=6:30:0
 
 # Everything parsed; figure out what files and settings to use.
 case $enable_symvers in
diff --git a/libstdc++-v3/doc/xml/manual/abi.xml 
b/libstdc++-v3/doc/xml/manual/abi.xml
index c2c0c028a8b..10bef12c768 100644
--- a/libstdc++-v3/doc/xml/manual/abi.xml
+++ b/libstdc++-v3/doc/xml/manual/abi.xml
@@ -348,6 +348,7 @@ compatible.
 GCC 9.3.0: GLIBCXX_3.4.28, CXXABI_1.3.12
 GCC 10.1.0: GLIBCXX_3.4.28, CXXABI_1.3.12
 GCC 11.1.0: GLIBCXX_3.4.29, CXXABI_1.3.13
+GCC 12.1.0: GLIBCXX_3.4.30, CXXABI_1.3.13
 
 
 
diff --git a/libstdc++-v3/include/bits/shared_ptr_atomic.h 
b/libstdc++-v3/include/bits/shared_ptr_atomic.h
index 6e94d83c46d..2aec3adac7c 100644
--- a/libstdc++-v3/include/bits/shared_ptr_atomic.h
+++ b/libstdc++-v3/include/bits/shared_ptr_atomic.h
@@ -32,6 +32,10 @@
 
 #include 
 

Re: [PATCH] introduce predicate analysis class

2021-09-17 Thread Jeff Law via Gcc-patches



On 9/17/2021 4:05 PM, Martin Sebor wrote:

On 9/2/21 10:28 AM, Jeff Law via Gcc-patches wrote:



On 8/30/2021 2:03 PM, Martin Sebor via Gcc-patches wrote:

The predicate analysis subset of the tree-ssa-uninit pass isn't
necessarily specific to the detection of uninitialized reads.
Suitably parameterized, the same core logic could be used in
other warning passes to improve their S/N ratio, or issue more
nuanced diagnostics (e.g., when an invalid access cannot be
ruled out but also need not in reality be unavoidable, issue
a "may be invalid" type of warning rather than "is invalid").

Separating the predicate analysis logic from the uninitialized
pass and exposing a narrow API should also make it easier to
understand and evolve each part independently of the other,
or replace one with a better implementation without modifying
the other.(*)

As the first step in this direction, the attached patch extracts
the predicate analysis logic out of the pass, turns the interface
into public class members, and hides the internals in either
private members or static functions defined in a new source file.
(**)

The changes should have no externally observable effect (i.e.,
should cause no changes in warnings), except on the contents of
the uninitialized dump.  While making the changes I enhanced
the dumps to help me follow the logic.  Turning some previously
free-standing functions into members involved changing their
signatures and adjusting their callers.  While making these
changes I also renamed some of them as well some variables for
improved clarity.  Finally, I moved declarations of locals
closer to their point of initialization.

Tested on x86_64-linux.  Besides the usual bootstrap/regtest
I also tentatively verified the generality of the new class
interfaces by making use of it in -Warray-bounds.  Besides there,
I'd like to make use of it in the new gimple-ssa-warn-access pass
and, longer term, any other flow-sensitive warnings that might
benefit from it.

Martin

[*] A review of open -Wuninitialized bugs I did while working
on this project made me aware of a number of opportunities to
improve the analyzer to reduce the number of false positives
-Wmaybe-uninitiailzed suffers from.

[**] The class isn't fully general and, like the uninit pass,
only works with PHI nodes.  I plan to generalize it to compute
the set of predicates between any two basic blocks.

gcc-predanal.diff

Factor predidacte analysis out of tree-ssa-uninit.c into its own 
module.


gcc/ChangeLog:

* Makefile.in (OBJS): Add gimple-predicate-analysis.o.
* tree-ssa-uninit.c (max_phi_args): Move to 
gimple-predicate-analysis.

(MASK_SET_BIT, MASK_TEST_BIT, MASK_EMPTY): Same.
(check_defs):
(can_skip_redundant_opnd):
(compute_uninit_opnds_pos): Adjust to namespace change.
(find_pdom): Move to gimple-predicate-analysis.cc.
(find_dom): Same.
(struct uninit_undef_val_t): New.
(is_non_loop_exit_postdominating): Move to 
gimple-predicate-analysis.cc.

(find_control_equiv_block): Same.
(MAX_NUM_CHAINS, MAX_CHAIN_LEN, MAX_POSTDOM_CHECK): Same.
(MAX_SWITCH_CASES): Same.
(compute_control_dep_chain): Same.
(find_uninit_use): Use predicate analyzer.
(struct pred_info): Move to gimple-predicate-analysis.
(convert_control_dep_chain_into_preds): Same.
(find_predicates): Same.
(collect_phi_def_edges): Same.
(warn_uninitialized_phi): Use predicate analyzer.
(find_def_preds): Move to gimple-predicate-analysis.
(dump_pred_info): Same.
(dump_pred_chain): Same.
(dump_predicates): Same.
(destroy_predicate_vecs): Remove.
(execute_late_warn_uninitialized): New.
(get_cmp_code): Move to gimple-predicate-analysis.
(is_value_included_in): Same.
(value_sat_pred_p): Same.
(find_matching_predicate_in_rest_chains): Same.
(is_use_properly_guarded): Same.
(prune_uninit_phi_opnds): Same.
(find_var_cmp_const): Same.
(use_pred_not_overlap_with_undef_path_pred): Same.
(pred_equal_p): Same.
(is_neq_relop_p): Same.
(is_neq_zero_form_p): Same.
(pred_expr_equal_p): Same.
(is_pred_expr_subset_of): Same.
(is_pred_chain_subset_of): Same.
(is_included_in): Same.
(is_superset_of): Same.
(pred_neg_p): Same.
(simplify_pred): Same.
(simplify_preds_2): Same.
(simplify_preds_3): Same.
(simplify_preds_4): Same.
(simplify_preds): Same.
(push_pred): Same.
(push_to_worklist): Same.
(get_pred_info_from_cmp): Same.
(is_degenerated_phi): Same.
(normalize_one_pred_1): Same.
(normalize_one_pred): Same.
(normalize_one_pred_chain): Same.
(normalize_preds): Same.
(can_one_predicate_be_invalidated_p): Same.
(can_chain_union_be_invalidated_p): Same.
(uninit_uses_cannot_happen): Same.
(pass_late_warn_uninitialized::execute): Define.
* gimple-predicate-analysis.cc: New file.
* gimple-predicate-analysis.h: New file.
Thanks for tackling this.  It's something I think 

[PATCH] [i386] Fix ICE in pass_rpad.

2021-09-17 Thread liuhongt via Gcc-patches
Besides conversion instructions, pass_rpad also handles scalar
sqrt/rsqrt/rcp/round instructions, while r12-3614 should only want to
handle conversion instructions, so fix it.

  Bootstrapped and regtest on x86_64-linux-gnu{-m32,} w/ configure
--enable-checking=yes,rtl,extra, failed tests are fixed.
  Ok for trunk?

New tests that PASS (8 tests):

gcc.target/i386/avx512f-vscalefpd-2.c execution test
gcc.target/i386/avx512f-vscalefps-2.c execution test
gcc.target/i386/avx512f-vscalefss-2.c execution test
gcc.target/i386/avx512fp16-vscalefph-1b.c execution test
gcc.target/i386/avx512fp16-vscalefsh-1b.c execution test
gcc.target/i386/avx512fp16vl-vscalefph-1b.c execution test
gcc.target/i386/avx512vl-vscalefpd-2.c execution test
gcc.target/i386/avx512vl-vscalefps-2.c execution test

Old tests that failed, that have disappeared (8 tests): (Eeek!)

gcc.target/i386/avx512f-vscalefpd-2.c (internal compiler error)
gcc.target/i386/avx512f-vscalefps-2.c (internal compiler error)
gcc.target/i386/avx512f-vscalefss-2.c (internal compiler error)
gcc.target/i386/avx512fp16-vscalefph-1b.c (internal compiler error)
gcc.target/i386/avx512fp16-vscalefsh-1b.c (internal compiler error)
gcc.target/i386/avx512fp16vl-vscalefph-1b.c (internal compiler error)
gcc.target/i386/avx512vl-vscalefpd-2.c (internal compiler error)
gcc.target/i386/avx512vl-vscalefps-2.c (internal compiler error)

gcc/ChangeLog:

* config/i386/i386-features.c (remove_partial_avx_dependency):
Restrict TARGET_USE_VECTOR_FP_CONVERTS and
TARGET_USE_VECTOR_CONVERTS to conversion instructions only.
---
 gcc/config/i386/i386-features.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/i386-features.c b/gcc/config/i386/i386-features.c
index a525a83afd3..001dc9c1053 100644
--- a/gcc/config/i386/i386-features.c
+++ b/gcc/config/i386/i386-features.c
@@ -2165,7 +2165,7 @@ make_pass_insert_endbr_and_patchable_area (gcc::context 
*ctxt)
 }
 
 /* At entry of the nearest common dominator for basic blocks with
-   conversions, generate a single
+   conversions/rcp/sqrt/rsqrt/round, generate a single
vxorps %xmmN, %xmmN, %xmmN
for all
vcvtss2sd  op, %xmmN, %xmmX
@@ -2211,13 +2211,27 @@ remove_partial_avx_dependency (void)
continue;
 
  /* Convert PARTIAL_XMM_UPDATE_TRUE insns, DF -> SF, SF -> DF,
-SI -> SF, SI -> DF, DI -> SF, DI -> DF, to vec_dup and
-vec_merge with subreg.  */
+SI -> SF, SI -> DF, DI -> SF, DI -> DF, sqrt, rsqrt, rcp,
+round, to vec_dup and vec_merge with subreg.  */
  rtx src = SET_SRC (set);
  rtx dest = SET_DEST (set);
  machine_mode dest_mode = GET_MODE (dest);
- machine_mode src_mode = GET_MODE (XEXP (src, 0));
+ bool convert_p = false;
+ switch (GET_CODE (src))
+   {
+   case FLOAT:
+   case FLOAT_EXTEND:
+   case FLOAT_TRUNCATE:
+   case UNSIGNED_FLOAT:
+ convert_p = true;
+ break;
+   default:
+ break;
+   }
 
+ /* Only hanlde conversion here.  */
+ machine_mode src_mode
+   = convert_p ? GET_MODE (XEXP (src, 0)) : VOIDmode;
  switch (src_mode)
{
case E_SFmode:
@@ -2233,6 +2247,7 @@ remove_partial_avx_dependency (void)
continue;
  break;
default:
+ gcc_assert (src_mode == E_VOIDmode);
  break;
}
 
-- 
2.27.0



[pushed] c++: improve lookup of member-qualified names

2021-09-17 Thread Jason Merrill via Gcc-patches
I've been working on the resolution of CWG1835 by P1787, which among many
other things clarified that a name after -> or . is looked up first in the
class of the object expression even if it's dependent.  This patch does not
make that change; this is a smaller change extracted from that work in
progress to make the lookup in the object type work better in cases where
unqualified lookup doesn't find anything.

Basically, if we see "t.foo::" we know that looking up foo in t needs to
find a type, so we build an implicit TYPENAME_TYPE for it.

This also implements the change from P1787 to assume that a name followed by
 < in a type-only context names a template, since the less-than operator
can't appear in a type context.  This makes some of the lines in dtor11.C
work.

I introduce the predicate 'dependentish_scope_p' for the case where the
current instantiation has dependent bases, so even though we can perform
name lookup, we can't conclude that a lookup failure is conclusive.

gcc/cp/ChangeLog:

* cp-tree.h (dependentish_scope_p): Declare.
* pt.c (dependentish_scope_p): New.
* parser.c (cp_parser_lookup_name): Return a TYPENAME_TYPE
for lookup of a type in a dependent object.
(cp_parser_template_id): Handle TYPENAME_TYPE.
(cp_parser_template_name): If we're looking for a type,
a name followed by < names a template.

gcc/testsuite/ChangeLog:

* g++.dg/template/dtor5.C: Adjust expected error.
* g++.dg/cpp23/lookup2.C: New test.
* g++.dg/template/dtor11.C: New test.
---
 gcc/cp/cp-tree.h   |  1 +
 gcc/cp/parser.c| 69 --
 gcc/cp/pt.c|  9 
 gcc/testsuite/g++.dg/cpp23/lookup2.C   |  6 +++
 gcc/testsuite/g++.dg/template/dtor11.C | 22 
 gcc/testsuite/g++.dg/template/dtor5.C  |  2 +-
 6 files changed, 92 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/lookup2.C
 create mode 100644 gcc/testsuite/g++.dg/template/dtor11.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 8df18c38d43..1fcd50c64fd 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7263,6 +7263,7 @@ extern tree maybe_get_template_decl_from_type_decl (tree);
 extern int processing_template_parmlist;
 extern bool dependent_type_p   (tree);
 extern bool dependent_scope_p  (tree);
+extern bool dependentish_scope_p   (tree);
 extern bool any_dependent_template_arguments_p  (const_tree);
 extern bool any_erroneous_template_args_p   (const_tree);
 extern bool dependent_template_p   (tree);
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 20f949edfe0..31bae6d8983 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -18187,6 +18187,16 @@ cp_parser_template_id (cp_parser *parser,
   if (TREE_CODE (template_id) == TEMPLATE_ID_EXPR)
SET_EXPR_LOCATION (template_id, combined_loc);
 }
+  else if (TREE_CODE (templ) == TYPE_DECL
+  && TREE_CODE (TREE_TYPE (templ)) == TYPENAME_TYPE)
+{
+  /* Some type template in dependent scope.  */
+  tree  = TYPENAME_TYPE_FULLNAME (TREE_TYPE (templ));
+  name = build_min_nt_loc (combined_loc,
+  TEMPLATE_ID_EXPR,
+  name, arguments);
+  template_id = templ;
+}
   else
 {
   /* If it's not a class-template or a template-template, it should be
@@ -18413,8 +18423,8 @@ cp_parser_template_name (cp_parser* parser,
 }
 
   /* cp_parser_lookup_name clears OBJECT_TYPE.  */
-  const bool scoped_p = ((parser->scope ? parser->scope
- : parser->context->object_type) != NULL_TREE);
+  tree scope = (parser->scope ? parser->scope
+   : parser->context->object_type);
 
   /* Look up the name.  */
   decl = cp_parser_lookup_name (parser, identifier,
@@ -18427,6 +18437,19 @@ cp_parser_template_name (cp_parser* parser,
 
   decl = strip_using_decl (decl);
 
+  /* 13.3 [temp.names] A < is interpreted as the delimiter of a
+template-argument-list if it follows a name that is not a
+conversion-function-id and
+- that follows the keyword template or a ~ after a nested-name-specifier or
+in a class member access expression, or
+- for which name lookup finds the injected-class-name of a class template
+or finds any declaration of a template, or
+- that is an unqualified name for which name lookup either finds one or
+more functions or finds nothing, or
+- that is a terminal name in a using-declarator (9.9), in a declarator-id
+(9.3.4), or in a type-only context other than a nested-name-specifier
+(13.8).  */
+
   /* If DECL is a template, then the name was a template-name.  */
   if (TREE_CODE (decl) == TEMPLATE_DECL)
 {
@@ -18454,11 +18477,7 @@ cp_parser_template_name (cp_parser* parser,
 }
   else
 {
-  /* The standard does not explicitly indicate whether a name 

Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS

2021-09-17 Thread Hongtao Liu via Gcc-patches
On Sat, Sep 18, 2021 at 7:50 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Fri, Sep 17, 2021 at 08:35:57AM +0200, Uros Bizjak via Gcc-patches wrote:
> > > > On Wed, Sep 15, 2021 at 10:10 AM  wrote:
> > > > >
> > > > > From: "H.J. Lu" 
> > > > >
> > > > > Check TARGET_USE_VECTOR_FP_CONVERTS or
> > > > TARGET_USE_VECTOR_CONVERTS when
> > > > > handling avx_partial_xmm_update attribute.  Don't convert AVX partial
> > > > > XMM register update if vector packed SSE conversion should be used.
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR target/101900
> > > > > * config/i386/i386-features.c (remove_partial_avx_dependency):
> > > > > Check TARGET_USE_VECTOR_FP_CONVERTS and
> > > > TARGET_USE_VECTOR_CONVERTS
> > > > > before generating vxorps.
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR target/101900
> > > > > * testsuite/gcc.target/i386/pr101900-1.c: New test.
> > > > > * testsuite/gcc.target/i386/pr101900-2.c: Likewise.
> > > > > * testsuite/gcc.target/i386/pr101900-3.c: Likewise.
> > > > > ---
> > > > >  gcc/config/i386/i386-features.c| 21 ++---
> > > > >  gcc/testsuite/gcc.target/i386/pr101900-1.c | 18 ++
> > > > > gcc/testsuite/gcc.target/i386/pr101900-2.c | 18 ++
> > > > > gcc/testsuite/gcc.target/i386/pr101900-3.c | 19 +++
> > > > >  4 files changed, 73 insertions(+), 3 deletions(-)  create mode 100644
> > > > > gcc/testsuite/gcc.target/i386/pr101900-1.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-2.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-3.c
> > > > >
> > > > > diff --git a/gcc/config/i386/i386-features.c
> > > > > b/gcc/config/i386/i386-features.c index 5a99ea7c046..ae5ea02a002
> > > > > 100644
> > > > > --- a/gcc/config/i386/i386-features.c
> > > > > +++ b/gcc/config/i386/i386-features.c
> > > > > @@ -2210,15 +2210,30 @@ remove_partial_avx_dependency (void)
> > > > >   != AVX_PARTIAL_XMM_UPDATE_TRUE)
> > > > > continue;
> > > > >
> > > > > - if (!v4sf_const0)
> > > > > -   v4sf_const0 = gen_reg_rtx (V4SFmode);
> > > > > -
> > > > >   /* Convert PARTIAL_XMM_UPDATE_TRUE insns, DF -> SF, SF -> 
> > > > > DF,
> > > > >  SI -> SF, SI -> DF, DI -> SF, DI -> DF, to vec_dup and
> > > > >  vec_merge with subreg.  */
> > > > >   rtx src = SET_SRC (set);
> > > > >   rtx dest = SET_DEST (set);
> > > > >   machine_mode dest_mode = GET_MODE (dest);
> > > > > + machine_mode src_mode;
> > > > > +
> > > > > + if (TARGET_USE_VECTOR_FP_CONVERTS)
> > > > > +   {
> > > > > + src_mode = GET_MODE (XEXP (src, 0));
> > > > > + if (src_mode == E_SFmode || src_mode == E_DFmode)
> > > > > +   continue;
> > > > > +   }
> > > > > +
> > > > > + if (TARGET_USE_VECTOR_CONVERTS)
> > > > > +   {
> > > > > + src_mode = GET_MODE (XEXP (src, 0));
> > > > > + if (src_mode == E_SImode || src_mode == E_DImode)
> > > > > +   continue;
> > > > > +   }
> > > > > +
> > > > > + if (!v4sf_const0)
> > > > > +   v4sf_const0 = gen_reg_rtx (V4SFmode);
> > > >
> > > > Please better move initialization of src_mode to the top of the new 
> > > > hunk, like:
> > > >
> > > > machine_mode src_mode = GET_MODE (XEXP (src, 0)); switch (src_mode) {
> > > >   case E_SFmode:
> > > >   case E_DFmode:
> > > > if (TARGET_USE_VECTOR_FP_CONVERTS)
> > > >   continue;
> > > > break;
> > > >   case E_SImode:
> > > >   case E_DImode:
> > > > if (TARGET_USE_VECTOR_CONVERTS)
> > > >   continue;
> > > > break;
> > > >   default:
> > > > break;
> > > > }
> > > >
> > > > or something like the above.
> > >
> > > Done, thanks for your good advice, I also rebased patch 4/4, since it is 
> > > based on patch 3/4.
>
> The above change broke
> +FAIL: gcc.target/i386/avx512f-vscalefpd-2.c (internal compiler error)
> +FAIL: gcc.target/i386/avx512f-vscalefpd-2.c (test for excess errors)
> +UNRESOLVED: gcc.target/i386/avx512f-vscalefpd-2.c compilation failed to 
> produce executable
> +FAIL: gcc.target/i386/avx512f-vscalefps-2.c (internal compiler error)
> +FAIL: gcc.target/i386/avx512f-vscalefps-2.c (test for excess errors)
> +UNRESOLVED: gcc.target/i386/avx512f-vscalefps-2.c compilation failed to 
> produce executable
> +FAIL: gcc.target/i386/avx512f-vscalefss-2.c (internal compiler error)
> +FAIL: gcc.target/i386/avx512f-vscalefss-2.c (test for excess errors)
> +UNRESOLVED: gcc.target/i386/avx512f-vscalefss-2.c compilation failed to 
> produce executable
> +FAIL: gcc.target/i386/avx512vl-vscalefpd-2.c (internal compiler error)
> +FAIL: gcc.target/i386/avx512vl-vscalefpd-2.c (test for excess errors)
> +UNRESOLVED: gcc.target/i386/avx512vl-vscalefpd-2.c compilation failed to 
> produce executable
> +FAIL: 

[PATCH 2/2] Update the section on binutils version

2021-09-17 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

LTO usage requires binutils 2.35 or newer due to
https://sourceware.org/PR25355.
This adds a note in the prerequisites page about it.

Ok?

gcc/ChangeLog:

* doc/install.texi: Add note about
binutils 2.35 is required for LTO usage.
---
 gcc/doc/install.texi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 88e453c3f6b..a141507c7b0 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -325,6 +325,9 @@ Necessary in some circumstances, optional in others.  See 
the
 host/target specific instructions for your platform for the exact
 requirements.
 
+Note binutils 2.35 or newer is required for LTO to work correctly
+with GNU libtool that includes doing a bootstrap with LTO enabled.
+
 @item gzip version 1.2.4 (or later) or
 @itemx bzip2 version 1.0.2 (or later)
 
-- 
2.17.1



[PATCH 1/2] Fix PR bootstrap/102389: --with-build-config=bootstrap-lto is broken

2021-09-17 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

So the problem here is that now the lto-plugin requires NM that works
with LTO to work so we need to pass down NM just like we do for ranlib
and ar.

OK? Bootstrapped and tested with --with-build-config=bootstrap-lto on 
aarch64-linux-gnu.
Note you need to use binutils 2.35 or later too due to 
ttps://sourceware.org/PR25355
(I will submit another patch to improve the installation instructions too).

config/ChangeLog:

PR bootstrap/102389
* bootstrap-lto-lean.mk: Handle NM like RANLIB AND AR.
* bootstrap-lto.mk: Likewise.
---
 config/bootstrap-lto-lean.mk | 6 --
 config/bootstrap-lto.mk  | 6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/config/bootstrap-lto-lean.mk b/config/bootstrap-lto-lean.mk
index 79cea50a4c6..42cb3394c70 100644
--- a/config/bootstrap-lto-lean.mk
+++ b/config/bootstrap-lto-lean.mk
@@ -9,9 +9,11 @@ STAGEfeedback_CFLAGS += -flto=jobserver
 # assumes the host supports the linker plugin
 LTO_AR = $$r/$(HOST_SUBDIR)/prev-gcc/gcc-ar$(exeext) 
-B$$r/$(HOST_SUBDIR)/prev-gcc/
 LTO_RANLIB = $$r/$(HOST_SUBDIR)/prev-gcc/gcc-ranlib$(exeext) 
-B$$r/$(HOST_SUBDIR)/prev-gcc/
+LTO_NM = $$r/$(HOST_SUBDIR)/prev-gcc/gcc-nm$(exeext) 
-B$$r/$(HOST_SUBDIR)/prev-gcc/
 
 LTO_EXPORTS = AR="$(LTO_AR)"; export AR; \
- RANLIB="$(LTO_RANLIB)"; export RANLIB;
-LTO_FLAGS_TO_PASS = AR="$(LTO_AR)" RANLIB="$(LTO_RANLIB)"
+ RANLIB="$(LTO_RANLIB)"; export RANLIB; \
+ NM="$(LTO_NM)"; export NM;
+LTO_FLAGS_TO_PASS = AR="$(LTO_AR)" RANLIB="$(LTO_RANLIB)" NM="$(LTO_NM)"
 
 do-compare = /bin/true
diff --git a/config/bootstrap-lto.mk b/config/bootstrap-lto.mk
index 4de07e5b226..1ddb1d870ba 100644
--- a/config/bootstrap-lto.mk
+++ b/config/bootstrap-lto.mk
@@ -9,10 +9,12 @@ STAGEfeedback_CFLAGS += -flto=jobserver -frandom-seed=1
 # assumes the host supports the linker plugin
 LTO_AR = $$r/$(HOST_SUBDIR)/prev-gcc/gcc-ar$(exeext) 
-B$$r/$(HOST_SUBDIR)/prev-gcc/
 LTO_RANLIB = $$r/$(HOST_SUBDIR)/prev-gcc/gcc-ranlib$(exeext) 
-B$$r/$(HOST_SUBDIR)/prev-gcc/
+LTO_NM = $$r/$(HOST_SUBDIR)/prev-gcc/gcc-nm$(exeext) 
-B$$r/$(HOST_SUBDIR)/prev-gcc/
 
 LTO_EXPORTS = AR="$(LTO_AR)"; export AR; \
- RANLIB="$(LTO_RANLIB)"; export RANLIB;
-LTO_FLAGS_TO_PASS = AR="$(LTO_AR)" RANLIB="$(LTO_RANLIB)"
+ RANLIB="$(LTO_RANLIB)"; export RANLIB; \
+ NM="$(LTO_NM)"; export NM;
+LTO_FLAGS_TO_PASS = AR="$(LTO_AR)" RANLIB="$(LTO_RANLIB)" NM="$(LTO_NM)"
 
 do-compare = $(SHELL) $(srcdir)/contrib/compare-lto $$f1 $$f2
 extra-compare = gcc/lto1$(exeext)
-- 
2.17.1



Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS

2021-09-17 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 17, 2021 at 08:35:57AM +0200, Uros Bizjak via Gcc-patches wrote:
> > > On Wed, Sep 15, 2021 at 10:10 AM  wrote:
> > > >
> > > > From: "H.J. Lu" 
> > > >
> > > > Check TARGET_USE_VECTOR_FP_CONVERTS or
> > > TARGET_USE_VECTOR_CONVERTS when
> > > > handling avx_partial_xmm_update attribute.  Don't convert AVX partial
> > > > XMM register update if vector packed SSE conversion should be used.
> > > >
> > > > gcc/
> > > >
> > > > PR target/101900
> > > > * config/i386/i386-features.c (remove_partial_avx_dependency):
> > > > Check TARGET_USE_VECTOR_FP_CONVERTS and
> > > TARGET_USE_VECTOR_CONVERTS
> > > > before generating vxorps.
> > > >
> > > > gcc/
> > > >
> > > > PR target/101900
> > > > * testsuite/gcc.target/i386/pr101900-1.c: New test.
> > > > * testsuite/gcc.target/i386/pr101900-2.c: Likewise.
> > > > * testsuite/gcc.target/i386/pr101900-3.c: Likewise.
> > > > ---
> > > >  gcc/config/i386/i386-features.c| 21 ++---
> > > >  gcc/testsuite/gcc.target/i386/pr101900-1.c | 18 ++
> > > > gcc/testsuite/gcc.target/i386/pr101900-2.c | 18 ++
> > > > gcc/testsuite/gcc.target/i386/pr101900-3.c | 19 +++
> > > >  4 files changed, 73 insertions(+), 3 deletions(-)  create mode 100644
> > > > gcc/testsuite/gcc.target/i386/pr101900-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-2.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-3.c
> > > >
> > > > diff --git a/gcc/config/i386/i386-features.c
> > > > b/gcc/config/i386/i386-features.c index 5a99ea7c046..ae5ea02a002
> > > > 100644
> > > > --- a/gcc/config/i386/i386-features.c
> > > > +++ b/gcc/config/i386/i386-features.c
> > > > @@ -2210,15 +2210,30 @@ remove_partial_avx_dependency (void)
> > > >   != AVX_PARTIAL_XMM_UPDATE_TRUE)
> > > > continue;
> > > >
> > > > - if (!v4sf_const0)
> > > > -   v4sf_const0 = gen_reg_rtx (V4SFmode);
> > > > -
> > > >   /* Convert PARTIAL_XMM_UPDATE_TRUE insns, DF -> SF, SF -> DF,
> > > >  SI -> SF, SI -> DF, DI -> SF, DI -> DF, to vec_dup and
> > > >  vec_merge with subreg.  */
> > > >   rtx src = SET_SRC (set);
> > > >   rtx dest = SET_DEST (set);
> > > >   machine_mode dest_mode = GET_MODE (dest);
> > > > + machine_mode src_mode;
> > > > +
> > > > + if (TARGET_USE_VECTOR_FP_CONVERTS)
> > > > +   {
> > > > + src_mode = GET_MODE (XEXP (src, 0));
> > > > + if (src_mode == E_SFmode || src_mode == E_DFmode)
> > > > +   continue;
> > > > +   }
> > > > +
> > > > + if (TARGET_USE_VECTOR_CONVERTS)
> > > > +   {
> > > > + src_mode = GET_MODE (XEXP (src, 0));
> > > > + if (src_mode == E_SImode || src_mode == E_DImode)
> > > > +   continue;
> > > > +   }
> > > > +
> > > > + if (!v4sf_const0)
> > > > +   v4sf_const0 = gen_reg_rtx (V4SFmode);
> > >
> > > Please better move initialization of src_mode to the top of the new hunk, 
> > > like:
> > >
> > > machine_mode src_mode = GET_MODE (XEXP (src, 0)); switch (src_mode) {
> > >   case E_SFmode:
> > >   case E_DFmode:
> > > if (TARGET_USE_VECTOR_FP_CONVERTS)
> > >   continue;
> > > break;
> > >   case E_SImode:
> > >   case E_DImode:
> > > if (TARGET_USE_VECTOR_CONVERTS)
> > >   continue;
> > > break;
> > >   default:
> > > break;
> > > }
> > >
> > > or something like the above.
> >
> > Done, thanks for your good advice, I also rebased patch 4/4, since it is 
> > based on patch 3/4.

The above change broke
+FAIL: gcc.target/i386/avx512f-vscalefpd-2.c (internal compiler error)
+FAIL: gcc.target/i386/avx512f-vscalefpd-2.c (test for excess errors)
+UNRESOLVED: gcc.target/i386/avx512f-vscalefpd-2.c compilation failed to 
produce executable
+FAIL: gcc.target/i386/avx512f-vscalefps-2.c (internal compiler error)
+FAIL: gcc.target/i386/avx512f-vscalefps-2.c (test for excess errors)
+UNRESOLVED: gcc.target/i386/avx512f-vscalefps-2.c compilation failed to 
produce executable
+FAIL: gcc.target/i386/avx512f-vscalefss-2.c (internal compiler error)
+FAIL: gcc.target/i386/avx512f-vscalefss-2.c (test for excess errors)
+UNRESOLVED: gcc.target/i386/avx512f-vscalefss-2.c compilation failed to 
produce executable
+FAIL: gcc.target/i386/avx512vl-vscalefpd-2.c (internal compiler error)
+FAIL: gcc.target/i386/avx512vl-vscalefpd-2.c (test for excess errors)
+UNRESOLVED: gcc.target/i386/avx512vl-vscalefpd-2.c compilation failed to 
produce executable
+FAIL: gcc.target/i386/avx512vl-vscalefps-2.c (internal compiler error)
+FAIL: gcc.target/i386/avx512vl-vscalefps-2.c (test for excess errors)
+UNRESOLVED: gcc.target/i386/avx512vl-vscalefps-2.c compilation failed to 
produce executable
when configured with --enable-checking=yes,rtl,extra, the error is:
during RTL pass: rpad

Re: [PATCH] rs6000: Parameterize some const values for density test

2021-09-17 Thread Segher Boessenkool
Hi!

On Wed, Sep 15, 2021 at 04:52:49PM +0800, Kewen.Lin wrote:
> This patch follows the discussion here[1], where Segher suggested
> parameterizing those exact magic constants for density heuristics,
> to make it easier to tweak if need.
> 
> Since these heuristics are quite internal, I make these parameters
> as undocumented and be mainly used by developers.

Okido.

> +  if (data->nloads > (unsigned int) rs6000_density_load_num_threshold
> +   && load_pct > (unsigned int) rs6000_density_load_pct_threshold)

Those variables are unsigned int already.  Don't cast please.

> +-param=rs6000-density-pct-threshold=
> +Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) 
> Init(85) IntegerRange(0, 99) Param

So make this and all other percentages (0, 100) please.

> +When costing for loop vectorization, we probably need to penalize the loop 
> body cost if the existing cost model may not adequately reflect delays from 
> unavailable vector resources.  We collect the cost for vectorized statements 
> and non-vectorized statements separately, check the proportion of vec_cost to 
> total cost of vec_cost and non vec_cost, and penalize only if the proportion 
> exceeds the threshold specified by this parameter.  The default value is 85.

It would be good if we can use line breaks in the source code for things
like this, but I don't think we can.  This message is mainly used for
"--help=param", and it is good there to have as short messages as you
can.  But given the nature of params you need quite a few words often,
and you do not want to say so little that things are no clear, either.

So, dunno :-)

Oksy for trunk with these fixes and what Bill mentioned in the other
thread.  Thanks!


Segher


Re: [PATCH] c++: error message for dependent template members [PR70417]

2021-09-17 Thread Anthony Sharp via Gcc-patches
And also re-attaching the patch!

On Fri, 17 Sept 2021 at 23:17, Anthony Sharp  wrote:
>
> Re-adding gcc-patches@gcc.gnu.org.
>
> -- Forwarded message -
> From: Anthony Sharp 
> Date: Fri, 17 Sept 2021 at 23:11
> Subject: Re: [PATCH] c++: error message for dependent template members 
> [PR70417]
> To: Jason Merrill 
>
>
> Hi Jason! Apologies for the delay.
>
> > This is basically core issue 1835, http://wg21.link/cwg1835
>
> > This was changed for C++23 by the paper "Declarations and where to find
> > them", http://wg21.link/p1787
>
> Interesting, I was not aware of that. I was very vaguely aware that a
> template-id in a class member access expression could be found by
> ordinary lookup (very bottom of here
> https://en.cppreference.com/w/cpp/language/dependent_name), but it's
> interesting to see it is deeper than I realised.
>
> > But in either case, whether create is in a dependent scope depends on
> > how we resolve impl::, we don't need to remember further back in the
> > expression.  So your dependent_expression_p parameter seems like the
> > wrong approach.  Note that when we're looking up the name after ->, the
> > type of the object expression is in parser->context->object_type.
>
> That's true. I think my thinking was that since it already got figured
> out in cp_parser_postfix_dot_deref_expression, which is where . and ->
> access expressions come from, I thought I might as well pass it
> through, since it seemed to work. But looking again, you're right,
> it's not really worth the hassle; might as well just call
> dependent_scope_p again.
>
> > The cases you fixed in symbol-summary.h are indeed mistakes, but not
> > ill-formed, so giving an error on them is wrong.  For example, here is a
> > well-formed program that is rejected with your patch:
>
> > template  void f(T t) { t.m(0); }
> > struct A { int m; } a;
> > int main() { f(a); }
>
> I suppose there was always going to be edge-cases when doing it the
> way I've done. But yes, it can be worked-around by making it a warning
> instead. Interestingly Clang doesn't trip up on that example, so I
> guess they must be examining it some other way (e.g. at instantiation
> time) - but that approach perhaps misses out on the slight performance
> improvement this seems to bring.
>
> > Now that we're writing C++, I'd prefer to avoid this kind of pattern in
> > favor of RAII, such as saved_token_sentinel.  If this is still relevant
> > after addressing the above comments.
>
> Sorry, it's the junior developer in me showing! So this confused me at
> first. After having mucked around a bit I tried using
> saved_token_sentinel but didn't see any benefit since it doesn't
> rollback on going out of scope, and I'll always want to rollback. I
> can call rollback directly, but then I might as well save and restore
> myself. So what I did was use it but also modify it slightly to
> rollback by default on going out of scope (in my mind that makes more
> sense, since if something goes wrong you wouldn't normally want to
> commit anything that happened [edit 1: unless committing was part of
> the whole sanity checking thing] [edit 2: well I guess you could also
> argue that since this is a parser afterall, we like to KEEP things
> sometimes]). But anyways, I made this configurable; it now has three
> modes - roll-back, commit or do nothing. Let me know if you think
> that's not the way to go.
>
> > This code doesn't handle skipping matched ()/{}/[] in the
> > template-argument-list.  You probably want to involve
> > cp_parser_skip_to_end_of_template_parameter_list somehow.
>
> Good point. It required some refactoring, but I have used it. Also,
> just putting it out there, this line from
> cp_parser_skip_to_end_of_template_parameter_list makes zero sense to
> me (why throw an error OR immediately return?), but I have worked
> around it, since it seems to break without it:
>
> > /* Are we ready, yet?  If not, issue error message.  */
> > if (cp_parser_require (parser, CPP_GREATER, RT_GREATER))
> >   return false;
>
> Last thing - I initially made a mistake. I put something like:
>
> (next_token->type == CPP_NAME
>  && MAYBE_CLASS_TYPE_P (parser->scope)
>  && !constructor_name_p (cp_expr (next_token->u.value,
>   
> next_token->location),
>parser->scope))
>
> Instead of:
>
> !(next_token->type == CPP_NAME
>   && MAYBE_CLASS_TYPE_P (parser->scope)
>   && constructor_name_p (cp_expr (next_token->u.value,
>   
> next_token->location),
>parser->scope))
>
> This meant a lot of things were being excluded that weren't supposed
> to be. Oops! Changing this opened up a whole new can of worms, so I
> had to make some changes to the main logic, but it just a little bit
> in the end.
>
> Regtested everything again and all seems fine. Bootstraps fine. Patch
> attached. 

Re: [PATCH] c++: error message for dependent template members [PR70417]

2021-09-17 Thread Anthony Sharp via Gcc-patches
Re-adding gcc-patches@gcc.gnu.org.

-- Forwarded message -
From: Anthony Sharp 
Date: Fri, 17 Sept 2021 at 23:11
Subject: Re: [PATCH] c++: error message for dependent template members [PR70417]
To: Jason Merrill 


Hi Jason! Apologies for the delay.

> This is basically core issue 1835, http://wg21.link/cwg1835

> This was changed for C++23 by the paper "Declarations and where to find
> them", http://wg21.link/p1787

Interesting, I was not aware of that. I was very vaguely aware that a
template-id in a class member access expression could be found by
ordinary lookup (very bottom of here
https://en.cppreference.com/w/cpp/language/dependent_name), but it's
interesting to see it is deeper than I realised.

> But in either case, whether create is in a dependent scope depends on
> how we resolve impl::, we don't need to remember further back in the
> expression.  So your dependent_expression_p parameter seems like the
> wrong approach.  Note that when we're looking up the name after ->, the
> type of the object expression is in parser->context->object_type.

That's true. I think my thinking was that since it already got figured
out in cp_parser_postfix_dot_deref_expression, which is where . and ->
access expressions come from, I thought I might as well pass it
through, since it seemed to work. But looking again, you're right,
it's not really worth the hassle; might as well just call
dependent_scope_p again.

> The cases you fixed in symbol-summary.h are indeed mistakes, but not
> ill-formed, so giving an error on them is wrong.  For example, here is a
> well-formed program that is rejected with your patch:

> template  void f(T t) { t.m(0); }
> struct A { int m; } a;
> int main() { f(a); }

I suppose there was always going to be edge-cases when doing it the
way I've done. But yes, it can be worked-around by making it a warning
instead. Interestingly Clang doesn't trip up on that example, so I
guess they must be examining it some other way (e.g. at instantiation
time) - but that approach perhaps misses out on the slight performance
improvement this seems to bring.

> Now that we're writing C++, I'd prefer to avoid this kind of pattern in
> favor of RAII, such as saved_token_sentinel.  If this is still relevant
> after addressing the above comments.

Sorry, it's the junior developer in me showing! So this confused me at
first. After having mucked around a bit I tried using
saved_token_sentinel but didn't see any benefit since it doesn't
rollback on going out of scope, and I'll always want to rollback. I
can call rollback directly, but then I might as well save and restore
myself. So what I did was use it but also modify it slightly to
rollback by default on going out of scope (in my mind that makes more
sense, since if something goes wrong you wouldn't normally want to
commit anything that happened [edit 1: unless committing was part of
the whole sanity checking thing] [edit 2: well I guess you could also
argue that since this is a parser afterall, we like to KEEP things
sometimes]). But anyways, I made this configurable; it now has three
modes - roll-back, commit or do nothing. Let me know if you think
that's not the way to go.

> This code doesn't handle skipping matched ()/{}/[] in the
> template-argument-list.  You probably want to involve
> cp_parser_skip_to_end_of_template_parameter_list somehow.

Good point. It required some refactoring, but I have used it. Also,
just putting it out there, this line from
cp_parser_skip_to_end_of_template_parameter_list makes zero sense to
me (why throw an error OR immediately return?), but I have worked
around it, since it seems to break without it:

> /* Are we ready, yet?  If not, issue error message.  */
> if (cp_parser_require (parser, CPP_GREATER, RT_GREATER))
>   return false;

Last thing - I initially made a mistake. I put something like:

(next_token->type == CPP_NAME
 && MAYBE_CLASS_TYPE_P (parser->scope)
 && !constructor_name_p (cp_expr (next_token->u.value,
  next_token->location),
   parser->scope))

Instead of:

!(next_token->type == CPP_NAME
  && MAYBE_CLASS_TYPE_P (parser->scope)
  && constructor_name_p (cp_expr (next_token->u.value,
  next_token->location),
   parser->scope))

This meant a lot of things were being excluded that weren't supposed
to be. Oops! Changing this opened up a whole new can of worms, so I
had to make some changes to the main logic, but it just a little bit
in the end.

Regtested everything again and all seems fine. Bootstraps fine. Patch
attached. Let me know if it needs anything else.

Thanks for the help,
Anthony



On Fri, 27 Aug 2021 at 22:33, Jason Merrill  wrote:
>
> On 8/20/21 12:56 PM, Anthony Sharp via Gcc-patches wrote:
> > Hi, hope everyone is well. I have a patch here for issue 70417
> > 

Re: [PATCH] rs6000: Parameterize some const values for density test

2021-09-17 Thread Segher Boessenkool
Hi!

On Fri, Sep 17, 2021 at 11:27:09AM -0500, Bill Schmidt wrote:
> On 9/15/21 3:52 AM, Kewen.Lin wrote:
> >+-param=rs6000-density-pct-threshold=
> >+Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) 
> >Init(85) IntegerRange(0, 99) Param
> >+When costing for loop vectorization, we probably need to penalize the 
> >loop body cost if the existing cost model may not adequately reflect 
> >delays from unavailable vector resources.  We collect the cost for 
> >vectorized statements and non-vectorized statements separately, check the 
> >proportion of vec_cost to total cost of vec_cost and non vec_cost, and 
> >penalize only if the proportion exceeds the threshold specified by this 
> >parameter.  The default value is 85.
> >+
> >+-param=rs6000-density-size-threshold=
> >+Target Undocumented Joined UInteger Var(rs6000_density_size_threshold) 
> >Init(70) IntegerRange(0, 99) Param
> 
> I think 99 is not a sufficient upper bound.  This is a counting value 
> that could in theory get much higher.  Can you set it to something 
> ridiculous like IntegerRange(0, 1000)?

It is a percentage.  (0,100) is the maximum that makes any sense :-)

It may be useful to make it a bit more sensitive than hundreds, but it
is a heuristic anyway, this will work fine.

But allowing 100 will be good.


Segher


Re: [PATCH] introduce predicate analysis class

2021-09-17 Thread Martin Sebor via Gcc-patches

On 9/2/21 10:28 AM, Jeff Law via Gcc-patches wrote:



On 8/30/2021 2:03 PM, Martin Sebor via Gcc-patches wrote:

The predicate analysis subset of the tree-ssa-uninit pass isn't
necessarily specific to the detection of uninitialized reads.
Suitably parameterized, the same core logic could be used in
other warning passes to improve their S/N ratio, or issue more
nuanced diagnostics (e.g., when an invalid access cannot be
ruled out but also need not in reality be unavoidable, issue
a "may be invalid" type of warning rather than "is invalid").

Separating the predicate analysis logic from the uninitialized
pass and exposing a narrow API should also make it easier to
understand and evolve each part independently of the other,
or replace one with a better implementation without modifying
the other.(*)

As the first step in this direction, the attached patch extracts
the predicate analysis logic out of the pass, turns the interface
into public class members, and hides the internals in either
private members or static functions defined in a new source file.
(**)

The changes should have no externally observable effect (i.e.,
should cause no changes in warnings), except on the contents of
the uninitialized dump.  While making the changes I enhanced
the dumps to help me follow the logic.  Turning some previously
free-standing functions into members involved changing their
signatures and adjusting their callers.  While making these
changes I also renamed some of them as well some variables for
improved clarity.  Finally, I moved declarations of locals
closer to their point of initialization.

Tested on x86_64-linux.  Besides the usual bootstrap/regtest
I also tentatively verified the generality of the new class
interfaces by making use of it in -Warray-bounds.  Besides there,
I'd like to make use of it in the new gimple-ssa-warn-access pass
and, longer term, any other flow-sensitive warnings that might
benefit from it.

Martin

[*] A review of open -Wuninitialized bugs I did while working
on this project made me aware of a number of opportunities to
improve the analyzer to reduce the number of false positives
-Wmaybe-uninitiailzed suffers from.

[**] The class isn't fully general and, like the uninit pass,
only works with PHI nodes.  I plan to generalize it to compute
the set of predicates between any two basic blocks.

gcc-predanal.diff

Factor predidacte analysis out of tree-ssa-uninit.c into its own module.

gcc/ChangeLog:

* Makefile.in (OBJS): Add gimple-predicate-analysis.o.
* tree-ssa-uninit.c (max_phi_args): Move to 
gimple-predicate-analysis.

(MASK_SET_BIT, MASK_TEST_BIT, MASK_EMPTY): Same.
(check_defs):
(can_skip_redundant_opnd):
(compute_uninit_opnds_pos): Adjust to namespace change.
(find_pdom): Move to gimple-predicate-analysis.cc.
(find_dom): Same.
(struct uninit_undef_val_t): New.
(is_non_loop_exit_postdominating): Move to 
gimple-predicate-analysis.cc.

(find_control_equiv_block): Same.
(MAX_NUM_CHAINS, MAX_CHAIN_LEN, MAX_POSTDOM_CHECK): Same.
(MAX_SWITCH_CASES): Same.
(compute_control_dep_chain): Same.
(find_uninit_use): Use predicate analyzer.
(struct pred_info): Move to gimple-predicate-analysis.
(convert_control_dep_chain_into_preds): Same.
(find_predicates): Same.
(collect_phi_def_edges): Same.
(warn_uninitialized_phi): Use predicate analyzer.
(find_def_preds): Move to gimple-predicate-analysis.
(dump_pred_info): Same.
(dump_pred_chain): Same.
(dump_predicates): Same.
(destroy_predicate_vecs): Remove.
(execute_late_warn_uninitialized): New.
(get_cmp_code): Move to gimple-predicate-analysis.
(is_value_included_in): Same.
(value_sat_pred_p): Same.
(find_matching_predicate_in_rest_chains): Same.
(is_use_properly_guarded): Same.
(prune_uninit_phi_opnds): Same.
(find_var_cmp_const): Same.
(use_pred_not_overlap_with_undef_path_pred): Same.
(pred_equal_p): Same.
(is_neq_relop_p): Same.
(is_neq_zero_form_p): Same.
(pred_expr_equal_p): Same.
(is_pred_expr_subset_of): Same.
(is_pred_chain_subset_of): Same.
(is_included_in): Same.
(is_superset_of): Same.
(pred_neg_p): Same.
(simplify_pred): Same.
(simplify_preds_2): Same.
(simplify_preds_3): Same.
(simplify_preds_4): Same.
(simplify_preds): Same.
(push_pred): Same.
(push_to_worklist): Same.
(get_pred_info_from_cmp): Same.
(is_degenerated_phi): Same.
(normalize_one_pred_1): Same.
(normalize_one_pred): Same.
(normalize_one_pred_chain): Same.
(normalize_preds): Same.
(can_one_predicate_be_invalidated_p): Same.
(can_chain_union_be_invalidated_p): Same.
(uninit_uses_cannot_happen): Same.
(pass_late_warn_uninitialized::execute): Define.
* gimple-predicate-analysis.cc: New file.
* gimple-predicate-analysis.h: New file.
Thanks for tackling this.  It's something I think we've needed for a 
long time.


I've only done a 

Re: [PATCH] rs6000: Modify the way for extra penalized cost

2021-09-17 Thread Segher Boessenkool
Hi!

On Thu, Sep 16, 2021 at 09:14:15AM +0800, Kewen.Lin wrote:
> The way with nunits * stmt_cost can get one much exaggerated
> penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320,
> that's why we need one bound.  To make it scale, this patch
> doesn't use nunits * stmt_cost any more, but it still keeps
> nunits since there are actually nunits scalar loads there.  So
> it uses one cost adjusted from stmt_cost, since the current
> stmt_cost sort of considers nunits, we can stablize the cost
> for big nunits and retain the cost for small nunits.  After
> some tries, this patch gets the adjusted cost as:
> 
> stmt_cost / (log2(nunits) * log2(nunits))

So for  V16QI it gives *16/(4*4) so *1
V8HI  it gives *8/(3*3)  so *8/9
V4SI  it gives *4/(2*2)  so *1
V2DI  it gives *2/(1*1)  so *2
and for V1TI  it gives *1/(0*0) which is UB (no, does not crash for us,
just gives wildly wrong answers; the div returns 0 on recent systems).

> For V16QI, the adjusted cost would be 1 and total penalized
> cost is 16, it isn't exaggerated.  For V2DI, the adjusted
> cost would be 2 and total penalized cost is 4, which is the
> same as before.  btw, I tried to use one single log2(nunits),
> but the penalized cost is still big enough and can't fix the
> degraded bmk blender_r.

Does it make sense to treat V2DI (and V2DF) as twice more expensive than
other vectors, which are all pretty much equal cost (except those that
end up with cost 0)?  If so, there are simpler ways to do that.

> +   int nunits_log2 = exact_log2 (nunits);
> +   gcc_assert (nunits_log2 > 0);
> +   unsigned int nunits_sq = nunits_log2 * nunits_log2;

>= 0

This of course is assuming nunits will always be a power of 2, but I'm
sure that we have many other places in the compiler assuming that
already, so that is fine.  And if one day this stops being true we will
get a nice ICE, pretty much the best we could hope for.

> +   unsigned int adjusted_cost = stmt_cost / nunits_sq;

But this can divide by 0.  Or are we somehow guaranteed that nunits
will never be 1?  Yes the log2 check above, sure, but that ICEs if this
is violated; is there anything that actually guarantees it is true?

> +   gcc_assert (adjusted_cost > 0);

I don't see how you guarantee this, either.


A magic crazy formula like this is no good.  If you want to make the
cost of everything but V2D* be the same, and that of V2D* be twice that,
that is a weird heuristic, but we can live with that perhaps.  But that
beats completely unexplained (and unexplainable) magic!

Sorry.


Segher


[Patch] Fortran/OpenMP: unconstrained/reproducible ordered modifier

2021-09-17 Thread Tobias Burnus

This patch adds Fortran support for the new OpenMP 5.1 unconstrained and 
reproducible
modifiers to ordered(concurrent).

This patch requires Jakub's patch to handle the middle-end (and C/C++) part,
which still has to be committed. The testcases are based on the C/C++ ones.

OK?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran/OpenMP: unconstrained/reproducible ordered modifier

gcc/fortran/ChangeLog:

	* gfortran.h (gfc_omp_clauses): Add order_unconstrained.
	* dump-parse-tree.c (show_omp_clauses): Dump it.
	* openmp.c (gfc_match_omp_clauses): Match unconstrained/reproducible
	modifiers to ordered(concurrent).
	(OMP_DISTRIBUTE_CLAUSES): Accept ordered clause.
	(resolve_omp_clauses): Reject ordered + order on same directive.
	* trans-openmp.c (gfc_trans_omp_clauses, gfc_split_omp_clauses): Pass
	on unconstrained modifier of ordered(concurrent).

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/order-5.f90: New test.
	* gfortran.dg/gomp/order-6.f90: New test.
	* gfortran.dg/gomp/order-7.f90: New test.
	* gfortran.dg/gomp/order-8.f90: New test.
	* gfortran.dg/gomp/order-9.f90: New test.

 gcc/fortran/dump-parse-tree.c  |   7 +-
 gcc/fortran/gfortran.h |   3 +-
 gcc/fortran/openmp.c   |  25 +-
 gcc/fortran/trans-openmp.c |   7 +
 gcc/testsuite/gfortran.dg/gomp/order-5.f90 | 129 +
 gcc/testsuite/gfortran.dg/gomp/order-6.f90 | 436 +
 gcc/testsuite/gfortran.dg/gomp/order-7.f90 |  59 
 gcc/testsuite/gfortran.dg/gomp/order-8.f90 |  61 
 gcc/testsuite/gfortran.dg/gomp/order-9.f90 |  35 +++
 9 files changed, 756 insertions(+), 6 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index a1df47c2f82..28eb09e261d 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1630,7 +1630,12 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
   if (omp_clauses->independent)
 fputs (" INDEPENDENT", dumpfile);
   if (omp_clauses->order_concurrent)
-fputs (" ORDER(CONCURRENT)", dumpfile);
+{
+  fputs (" ORDER(", dumpfile);
+  if (omp_clauses->order_unconstrained)
+	fputs ("UNCONSTRAINED:", dumpfile);
+  fputs ("CONCURRENT)", dumpfile);
+}
   if (omp_clauses->ordered)
 {
   if (omp_clauses->orderedc)
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index fdf556eef3d..8b91225d659 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1491,7 +1491,8 @@ typedef struct gfc_omp_clauses
   unsigned inbranch:1, notinbranch:1, nogroup:1;
   unsigned sched_simd:1, sched_monotonic:1, sched_nonmonotonic:1;
   unsigned simd:1, threads:1, depend_source:1, destroy:1, order_concurrent:1;
-  unsigned capture:1, grainsize_strict:1, num_tasks_strict:1;
+  unsigned order_unconstrained:1, capture:1, grainsize_strict:1;
+  unsigned num_tasks_strict:1;
   ENUM_BITFIELD (gfc_omp_sched_kind) sched_kind:3;
   ENUM_BITFIELD (gfc_omp_device_type) device_type:2;
   ENUM_BITFIELD (gfc_omp_memorder) memorder:3;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index a64b7f5aa10..9ee52d6b0ea 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -2369,9 +2369,23 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	  break;
 	case 'o':
 	  if ((mask & OMP_CLAUSE_ORDER)
-	  && !c->order_concurrent
-	  && gfc_match ("order ( concurrent )") == MATCH_YES)
+	  && (m = gfc_match_dupl_check (!c->order_concurrent, "order ("))
+		 != MATCH_NO)
 	{
+	  if (m == MATCH_ERROR)
+		goto error;
+	  if (gfc_match (" reproducible : concurrent )") == MATCH_YES
+		  || gfc_match (" concurrent )") == MATCH_YES)
+		;
+	  else if (gfc_match (" unconstrained : concurrent )") == MATCH_YES)
+		c->order_unconstrained = true;
+	  else
+		{
+		  gfc_error ("Expected ORDER(CONCURRENT) at %C "
+			 "with optional % or "
+			 "% modifier");
+		  goto error;
+		}
 	  c->order_concurrent = true;
 	  continue;
 	}
@@ -3475,7 +3489,8 @@ cleanup:
| OMP_CLAUSE_SHARED | OMP_CLAUSE_REDUCTION)
 #define OMP_DISTRIBUTE_CLAUSES \
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE		\
-   | OMP_CLAUSE_LASTPRIVATE | OMP_CLAUSE_COLLAPSE | OMP_CLAUSE_DIST_SCHEDULE)
+   | OMP_CLAUSE_LASTPRIVATE | OMP_CLAUSE_COLLAPSE | OMP_CLAUSE_DIST_SCHEDULE \
+   | OMP_CLAUSE_ORDER)
 #define OMP_SINGLE_CLAUSES \
   (omp_mask (OMP_CLAUSE_PRIVATE) | OMP_CLAUSE_FIRSTPRIVATE)
 #define OMP_ORDERED_CLAUSES \
@@ -5643,7 +5658,9 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
   if (omp_clauses->orderedc && omp_clauses->orderedc < omp_clauses->collapse)
 gfc_error ("ORDERED clause parameter is less than COLLAPSE at %L",
 	   >loc);
-
+  if 

[r12-3630 Regression] FAIL: gcc.target/i386/auto-init-3.c scan-assembler-times fldz 3 on Linux/x86_64

2021-09-17 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

896fec24c8ef59b3520f5ded69dcd5bcf643c1f9 is the first bad commit
commit 896fec24c8ef59b3520f5ded69dcd5bcf643c1f9
Author: qing zhao 
Date:   Fri Sep 17 10:33:47 2021 -0700

testsuite: Fix gcc.target/i386/auto-init-* tests.

caused

FAIL: gcc.target/i386/auto-init-3.c scan-assembler-times fldz 3

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-3630/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/auto-init-3.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/auto-init-3.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH, committed] PR fortran/102366] [10/11/12 Regression] large arrays no longer become static

2021-09-17 Thread Harald Anlauf via Gcc-patches
The attempt to fix a misleading warning lead to a regression that prevented
putting large variables in the main into static storage.  So instead of
preventing the move, we now disable the useless warning for variables in
the main.

Regtested on x86_64-pc-linux-gnu.  The patch was ok'ed in the PR by Jakub.
Pushed to mainline; will backport to affected branches.

Note, however, that the Fortran 2018 standard has:

F2018  8.5.16  SAVE attribute

(4) A variable, common block, or procedure pointer declared in the scoping
unit of a main program, [...] implicitly has the SAVE attribute

We already have code that sets IMPLICIT_SAVE for variables e.g. in
(sub)modules, but for code such as

  real(kind=4) :: a(10)
  a=1.0
end

(with and without PROGRAM statement) the array turns out to be too small to
currently get moved to static storage.  I get in decl.c::match_attr_spec:
gfc_state_stack->state == COMP_NONE, which defeated my attempts to an
ultimate solution.

I have opened

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102390

to track this.

Thanks,
Harald

commit 51166eb2c534692c3c7779def24f83c8c3811b98
Author: Harald Anlauf 
Date:   Fri Sep 17 21:45:33 2021 +0200

Fortran - (large) arrays in the main shall be static

gcc/fortran/ChangeLog:

PR fortran/102366
* trans-decl.c (gfc_finish_var_decl): Disable the warning message
for variables moved from stack to static storange if they are
declared in the main, but allow the move to happen.

gcc/testsuite/ChangeLog:

PR fortran/102366
* gfortran.dg/pr102366.f90: New test.

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index bed61e2325d..3bd8a0fe935 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -743,7 +743,6 @@ gfc_finish_var_decl (tree decl, gfc_symbol * sym)

   /* Keep variables larger than max-stack-var-size off stack.  */
   if (!(sym->ns->proc_name && sym->ns->proc_name->attr.recursive)
-  && !(sym->ns->proc_name && sym->ns->proc_name->attr.is_main_program)
   && !sym->attr.automatic
   && sym->attr.save != SAVE_EXPLICIT
   && sym->attr.save != SAVE_IMPLICIT
@@ -757,7 +756,9 @@ gfc_finish_var_decl (tree decl, gfc_symbol * sym)
 	  || sym->attr.allocatable)
   && !DECL_ARTIFICIAL (decl))
 {
-  if (flag_max_stack_var_size > 0)
+  if (flag_max_stack_var_size > 0
+	  && !(sym->ns->proc_name
+	   && sym->ns->proc_name->attr.is_main_program))
 	gfc_warning (OPT_Wsurprising,
 		 "Array %qs at %L is larger than limit set by "
 		 "%<-fmax-stack-var-size=%>, moved from stack to static "
diff --git a/gcc/testsuite/gfortran.dg/pr102366.f90 b/gcc/testsuite/gfortran.dg/pr102366.f90
new file mode 100644
index 000..d002f64a8ae
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr102366.f90
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! { dg-options "-fdump-tree-original -Wall" }
+! { dg-final { scan-tree-dump-times "static real" 1 "original" } }
+! PR fortran/102366 - large arrays no longer become static
+
+program p
+  real(kind=4) :: a(16776325)
+  a=1.0
+end


Re: [committed] libstdc++: Fix last std::tuple constructor missing 'constexpr' [PR102270]

2021-09-17 Thread Jonathan Wakely via Gcc-patches

On 17/09/21 20:47 +0100, Jonathan Wakely wrote:

On 17/09/21 12:25 +0100, Jonathan Wakely wrote:

On 16/09/21 23:07 +0100, Jonathan Wakely wrote:

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102270
* include/std/tuple (_Head_base, _Tuple_impl): Add
_GLIBCXX20_CONSTEXPR to allocator-extended constructors.
(tuple<>::swap(tuple&)): Add _GLIBCXX20_CONSTEXPR.
* testsuite/20_util/tuple/cons/102270.C: New test.


Oops, this test has a .C extension, and so doesn't actually run (and
if I rename it to 102270.cc it fails).

Fix incoming ...


Fixed at r12-3637.

Tested x86_64-linux. Committed to trunk.


With the patch attached this time ...


commit 1fa2c5a695bb962ffcf8abed49f69cdcc59d0e61
Author: Jonathan Wakely 
Date:   Fri Sep 17 12:27:02 2021

libstdc++: Fix last std::tuple constructor missing 'constexpr' [PR102270]

Also rename the test so it actually runs.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102270
* include/std/tuple (_Tuple_impl): Add constexpr to constructor
missed in previous patch.
* testsuite/20_util/tuple/cons/102270.C: Moved to...
* testsuite/20_util/tuple/cons/102270.cc: ...here.
* testsuite/util/testsuite_allocator.h (SimpleAllocator): Add
constexpr to constructor so it can be used for C++20 tests.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 6f0dc6346e1..120c80a2b78 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -330,6 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	{ }
 
   template
+	_GLIBCXX20_CONSTEXPR
 	_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
 		const _Head& __head, const _Tail&... __tail)
 	: _Inherited(__tag, __a, __tail...),
diff --git a/libstdc++-v3/testsuite/20_util/tuple/cons/102270.C b/libstdc++-v3/testsuite/20_util/tuple/cons/102270.cc
similarity index 95%
rename from libstdc++-v3/testsuite/20_util/tuple/cons/102270.C
rename to libstdc++-v3/testsuite/20_util/tuple/cons/102270.cc
index 998329817c7..5500cacab6d 100644
--- a/libstdc++-v3/testsuite/20_util/tuple/cons/102270.C
+++ b/libstdc++-v3/testsuite/20_util/tuple/cons/102270.cc
@@ -56,6 +56,9 @@ constexpr bool construct_using_allocator()
 
   std::tuple t1a1b(std::allocator_arg, a, 1, i, 1, i);
 
+  const int c = 0;
+  std::tuple tii(std::allocator_arg, a, c, c);
+
   return true;
 }
 static_assert( construct_using_allocator() );
diff --git a/libstdc++-v3/testsuite/util/testsuite_allocator.h b/libstdc++-v3/testsuite/util/testsuite_allocator.h
index 1f7912ea6eb..b5b402858a6 100644
--- a/libstdc++-v3/testsuite/util/testsuite_allocator.h
+++ b/libstdc++-v3/testsuite/util/testsuite_allocator.h
@@ -514,7 +514,7 @@ namespace __gnu_test
 {
   typedef Tp value_type;
 
-  SimpleAllocator() noexcept { }
+  constexpr SimpleAllocator() noexcept { }
 
   template 
 SimpleAllocator(const SimpleAllocator&) { }


[HELP Needed!][PATCH] testsuite: Fix gcc.target/aarch64/auto-init-* tests.

2021-09-17 Thread Qing Zhao via Gcc-patches
Hi,

There are much less issues with aarch64/auto-init-* test cases.
Different -march values (from ‘armv8-a’, ‘armv8.1-a’, till ‘armv8.6-a’, 
‘armv8-r’) do not change the pattern match.

Only 

1. -mabi=ilp32/lp64 impact two of the testing cases “auto-init-2.c” and 
“auto-init-padding-5.c”.
2. -fstack-clash-protection impact two of the testing cases “auto-init-1.c” and 
“auto-init-7.c”.

Naturally the patch for this set is:

A. Adjust the patterns for ilp32 or lp64 in “auto-init-2.c” and 
“auto-init-padding-5.c”;
B. Add -fno-stack-protector to “auto-init-1.c” and “auto-init-7.c”


The above A fixed issue 1, however, the above B did not fix issue 2. 

If I fixed “auto-init-1.c” as:

diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c 
b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
index 0fa4708..a38d91b 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
@@ -1,6 +1,6 @@
 /* Verify zero initialization for integer and pointer type automatic 
variables.  */
 /* { dg-do compile } */
-/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand" } */
+/* { dg-options "-ftrivial-auto-var-init=zero -fdump-rtl-expand 
-fno-stack-protector" } */
 
 #ifndef __cplusplus
 # define bool _Bool

So, I took a look at the log file of the testing, and found that, If I tested 
it as:

 make check-gcc 
RUNTESTFLAGS='--target_board=unix\{-mabi=lp64/-fstack-clash-protection/-fstack-protector-all\}
 aarch64.exp=auto-init*’

In the log file, I got:

/home/qinzhao/Work/GCC/build_git/gcc/xgcc 
-B/home/qinzhao/Work/GCC/build_git/gcc/ 
/home/qinzhao/Work/GCC/latest_gcc_git/gcc/testsuite/gcc.target/aarch64/auto-init-1.c
  -fdiagnostics-plain-output  -ftrivial-auto-var-init=zero -fdump-rtl-expand 
-fno-stack-protector -ffat-lto-objects -S  -mabi=lp64 -fstack-clash-protection 
-fstack-protector-all  -o auto-init-1.s

From it, we can see, that the options that were passed through RUNTESTFLAGS 
“mabi-lp64 -fstack-clash-protection -fstack-protector-all” were appended AFTER 
the options inside the testing case through “dg-options”. As a result, the 
option “-fno-stack-protector” did not have any impact at all.

What’s the expected behavior for the order of these options? Should options 
through RUNTESTFLAGS be appended BEFORE or AFTER the options through testing 
cases?

For X86, the options through RUNTESTFLAGS are added BEFORE the options through 
testing cases. Therefore adding “-fno-stack-protector” has the expected result.

Is this a bug in aarch64 testing suite?

Thanks.

Qing




[committed] libstdc++: Add 'noexcept' to path::iterator members

2021-09-17 Thread Jonathan Wakely via Gcc-patches
All path::iterator operations are non-throwing.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (path::iterator): Add noexcept to all
member functions and friend functions.
(distance): Add noexcept.
(advance): Add noexcept and inline.
* include/experimental/bits/fs_path.h (path::iterator):
Add noexcept to all member functions.

Tested x86_64-linux. Committed to trunk.

commit 42eff613d0c10f88dc7a44b14981876401a09981
Author: Jonathan Wakely 
Date:   Fri Sep 17 12:28:35 2021

libstdc++: Add 'noexcept' to path::iterator members

All path::iterator operations are non-throwing.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (path::iterator): Add noexcept to all
member functions and friend functions.
(distance): Add noexcept.
(advance): Add noexcept and inline.
* include/experimental/bits/fs_path.h (path::iterator):
Add noexcept to all member functions.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 235d1df748f..92f7cbbe357 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -884,33 +884,42 @@ namespace __detail
 using pointer  = const path*;
 using iterator_category= std::bidirectional_iterator_tag;
 
-iterator() : _M_path(nullptr), _M_cur(), _M_at_end() { }
+iterator() noexcept : _M_path(nullptr), _M_cur(), _M_at_end() { }
 
 iterator(const iterator&) = default;
 iterator& operator=(const iterator&) = default;
 
-reference operator*() const;
-pointer   operator->() const { return std::__addressof(**this); }
+reference operator*() const noexcept;
+pointer   operator->() const noexcept { return std::__addressof(**this); }
 
-iterator& operator++();
-iterator  operator++(int) { auto __tmp = *this; ++*this; return __tmp; }
+iterator& operator++() noexcept;
 
-iterator& operator--();
-iterator  operator--(int) { auto __tmp = *this; --*this; return __tmp; }
+iterator  operator++(int) noexcept
+{ auto __tmp = *this; ++*this; return __tmp; }
 
-friend bool operator==(const iterator& __lhs, const iterator& __rhs)
+iterator& operator--() noexcept;
+
+iterator  operator--(int) noexcept
+{ auto __tmp = *this; --*this; return __tmp; }
+
+friend bool
+operator==(const iterator& __lhs, const iterator& __rhs) noexcept
 { return __lhs._M_equals(__rhs); }
 
-friend bool operator!=(const iterator& __lhs, const iterator& __rhs)
+friend bool
+operator!=(const iterator& __lhs, const iterator& __rhs) noexcept
 { return !__lhs._M_equals(__rhs); }
 
   private:
 friend class path;
 
-bool _M_is_multi() const { return _M_path->_M_type() == _Type::_Multi; }
+bool
+_M_is_multi() const noexcept
+{ return _M_path->_M_type() == _Type::_Multi; }
 
 friend difference_type
 __path_iter_distance(const iterator& __first, const iterator& __last)
+noexcept
 {
   __glibcxx_assert(__first._M_path != nullptr);
   __glibcxx_assert(__first._M_path == __last._M_path);
@@ -923,7 +932,7 @@ namespace __detail
 }
 
 friend void
-__path_iter_advance(iterator& __i, difference_type __n)
+__path_iter_advance(iterator& __i, difference_type __n) noexcept
 {
   if (__n == 1)
++__i;
@@ -938,15 +947,15 @@ namespace __detail
}
 }
 
-iterator(const path* __path, path::_List::const_iterator __iter)
+iterator(const path* __path, path::_List::const_iterator __iter) noexcept
 : _M_path(__path), _M_cur(__iter), _M_at_end()
 { }
 
-iterator(const path* __path, bool __at_end)
+iterator(const path* __path, bool __at_end) noexcept
 : _M_path(__path), _M_cur(), _M_at_end(__at_end)
 { }
 
-bool _M_equals(iterator) const;
+bool _M_equals(iterator) const noexcept;
 
 const path*_M_path;
 path::_List::const_iterator _M_cur;
@@ -1266,7 +1275,7 @@ namespace __detail
   }
 
   inline path::iterator
-  path::begin() const
+  path::begin() const noexcept
   {
 if (_M_type() == _Type::_Multi)
   return iterator(this, _M_cmpts.begin());
@@ -1274,7 +1283,7 @@ namespace __detail
   }
 
   inline path::iterator
-  path::end() const
+  path::end() const noexcept
   {
 if (_M_type() == _Type::_Multi)
   return iterator(this, _M_cmpts.end());
@@ -1282,10 +1291,10 @@ namespace __detail
   }
 
   inline path::iterator&
-  path::iterator::operator++()
+  path::iterator::operator++() noexcept
   {
 __glibcxx_assert(_M_path != nullptr);
-if (_M_path->_M_type() == _Type::_Multi)
+if (_M_is_multi())
   {
__glibcxx_assert(_M_cur != _M_path->_M_cmpts.end());
++_M_cur;
@@ -1299,10 +1308,10 @@ namespace __detail
   }
 
   inline path::iterator&
-  

[committed] libstdc++: Fix last std::tuple constructor missing 'constexpr' [PR102270]

2021-09-17 Thread Jonathan Wakely via Gcc-patches

On 17/09/21 12:25 +0100, Jonathan Wakely wrote:

On 16/09/21 23:07 +0100, Jonathan Wakely wrote:

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102270
* include/std/tuple (_Head_base, _Tuple_impl): Add
_GLIBCXX20_CONSTEXPR to allocator-extended constructors.
(tuple<>::swap(tuple&)): Add _GLIBCXX20_CONSTEXPR.
* testsuite/20_util/tuple/cons/102270.C: New test.


Oops, this test has a .C extension, and so doesn't actually run (and
if I rename it to 102270.cc it fails).

Fix incoming ...


Fixed at r12-3637.

Tested x86_64-linux. Committed to trunk.



[committed] libstdc++: Fix last std::tuple constructor missing 'constexpr' [PR102270]

2021-09-17 Thread Jonathan Wakely via Gcc-patches
Also rename the test so it actually runs.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102270
* include/std/tuple (_Tuple_impl): Add constexpr to constructor
missed in previous patch.
* testsuite/20_util/tuple/cons/102270.C: Moved to...
* testsuite/20_util/tuple/cons/102270.cc: ...here.
* testsuite/util/testsuite_allocator.h (SimpleAllocator): Add
constexpr to constructor so it can be used for C++20 tests.

Tested powerpc64le-linux. Committed to trunk.

git mailpatch 42eff613d0c10f88dc7a44b14981876401a09981
commit 1fa2c5a695bb962ffcf8abed49f69cdcc59d0e61
Author: Jonathan Wakely 
Date:   Fri Sep 17 12:27:02 2021

libstdc++: Fix last std::tuple constructor missing 'constexpr' [PR102270]

Also rename the test so it actually runs.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102270
* include/std/tuple (_Tuple_impl): Add constexpr to constructor
missed in previous patch.
* testsuite/20_util/tuple/cons/102270.C: Moved to...
* testsuite/20_util/tuple/cons/102270.cc: ...here.
* testsuite/util/testsuite_allocator.h (SimpleAllocator): Add
constexpr to constructor so it can be used for C++20 tests.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 6f0dc6346e1..120c80a2b78 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -330,6 +330,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{ }
 
   template
+   _GLIBCXX20_CONSTEXPR
_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
const _Head& __head, const _Tail&... __tail)
: _Inherited(__tag, __a, __tail...),
diff --git a/libstdc++-v3/testsuite/20_util/tuple/cons/102270.C 
b/libstdc++-v3/testsuite/20_util/tuple/cons/102270.cc
similarity index 95%
rename from libstdc++-v3/testsuite/20_util/tuple/cons/102270.C
rename to libstdc++-v3/testsuite/20_util/tuple/cons/102270.cc
index 998329817c7..5500cacab6d 100644
--- a/libstdc++-v3/testsuite/20_util/tuple/cons/102270.C
+++ b/libstdc++-v3/testsuite/20_util/tuple/cons/102270.cc
@@ -56,6 +56,9 @@ constexpr bool construct_using_allocator()
 
   std::tuple t1a1b(std::allocator_arg, a, 1, i, 1, i);
 
+  const int c = 0;
+  std::tuple tii(std::allocator_arg, a, c, c);
+
   return true;
 }
 static_assert( construct_using_allocator() );
diff --git a/libstdc++-v3/testsuite/util/testsuite_allocator.h 
b/libstdc++-v3/testsuite/util/testsuite_allocator.h
index 1f7912ea6eb..b5b402858a6 100644
--- a/libstdc++-v3/testsuite/util/testsuite_allocator.h
+++ b/libstdc++-v3/testsuite/util/testsuite_allocator.h
@@ -514,7 +514,7 @@ namespace __gnu_test
 {
   typedef Tp value_type;
 
-  SimpleAllocator() noexcept { }
+  constexpr SimpleAllocator() noexcept { }
 
   template 
 SimpleAllocator(const SimpleAllocator&) { }


Re: [PATCH] [og10] OpenACC: Remove unnecessary barriers (gimple worker partitioning/broadcast)

2021-09-17 Thread Thomas Schwinge
Hi!

On 2020-06-29T13:17:30-0700, Julian Brown  wrote:
> This is an optimisation for middle-end worker-partitioning support (used
> to support multiple workers on AMD GCN).  At present, [...]

Thanks.  (... but I have not verified the algorithmic/behavioral
changes.)

I've removed (trivial) now-untrue 'ARG_UNUSED' markers.

Already earlier, I had settled on using "gang-private" (instead of
"gangprivate", "gang private", "ganglocal", "gang-local", "gang local",
etc.), so I've made such (trivial) changes.

Pushed to master branch commit 2961ac45b9e19523958757e607d11c5893d6368b
"openacc: Remove unnecessary barriers (gimple worker
partitioning/broadcast)", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 2961ac45b9e19523958757e607d11c5893d6368b Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Mon, 29 Jun 2020 13:17:30 -0700
Subject: [PATCH] openacc: Remove unnecessary barriers (gimple worker
 partitioning/broadcast)

This is an optimisation for middle-end worker-partitioning support (used
to support multiple workers on AMD GCN).  At present, barriers may be
emitted in cases where they aren't needed and cannot be optimised away.
This patch stops the extraneous barriers from being emitted in the
first place.

One exception to the above (where the barrier is still needed) is for
predicated blocks of code that perform a write to gang-private shared
memory from one worker.  We must execute a barrier before other workers
read that shared memory location.

gcc/
	* config/gcn/gcn.c (gimple.h): Include.
	(gcn_fork_join): Emit barrier for worker-level joins.
	* omp-oacc-neuter-broadcast.cc (find_local_vars_to_propagate): Add
	writes_gang_private bitmap parameter. Set bit for blocks
	containing gang-private variable writes.
	(worker_single_simple): Don't emit barrier after predicated block.
	(worker_single_copy): Don't emit barrier if we're not broadcasting
	anything and the block contains no gang-private writes.
	(neuter_worker_single): Don't predicate blocks that only contain
	NOPs or internal marker functions.  Pass has_gang_private_write
	argument to worker_single_copy.
	(oacc_do_neutering): Add writes_gang_private bitmap handling.
---
 gcc/config/gcn/gcn.c |  11 ++-
 gcc/omp-oacc-neuter-broadcast.cc | 112 ---
 2 files changed, 94 insertions(+), 29 deletions(-)

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index b1bfdeac7b6..2a3fc96c1ee 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -51,6 +51,7 @@
 #include "intl.h"
 #include "rtl-iter.h"
 #include "dwarf2.h"
+#include "gimple.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -5133,10 +5134,14 @@ gcn_oacc_dim_pos (int dim)
 /* Implement TARGET_GOACC_FORK_JOIN.  */
 
 static bool
-gcn_fork_join (gcall *ARG_UNUSED (call), const int *ARG_UNUSED (dims),
-	   bool ARG_UNUSED (is_fork))
+gcn_fork_join (gcall *call, const int dims[], bool is_fork)
 {
-  /* GCN does not need to expand fork/join markers at the RTL level.  */
+  tree arg = gimple_call_arg (call, 2);
+  unsigned axis = TREE_INT_CST_LOW (arg);
+
+  if (!is_fork && axis == GOMP_DIM_WORKER && dims[axis] != 1)
+return true;
+
   return false;
 }
 
diff --git a/gcc/omp-oacc-neuter-broadcast.cc b/gcc/omp-oacc-neuter-broadcast.cc
index e0bd01311ee..aa5990ed7a1 100644
--- a/gcc/omp-oacc-neuter-broadcast.cc
+++ b/gcc/omp-oacc-neuter-broadcast.cc
@@ -769,16 +769,19 @@ static void
 find_local_vars_to_propagate (parallel_g *par, unsigned outer_mask,
 			  hash_set *partitioned_var_uses,
 			  hash_set *gang_private_vars,
+			  bitmap writes_gang_private,
 			  vec *prop_set)
 {
   unsigned mask = outer_mask | par->mask;
 
   if (par->inner)
 find_local_vars_to_propagate (par->inner, mask, partitioned_var_uses,
-  gang_private_vars, prop_set);
+  gang_private_vars, writes_gang_private,
+  prop_set);
   if (par->next)
 find_local_vars_to_propagate (par->next, outer_mask, partitioned_var_uses,
-  gang_private_vars, prop_set);
+  gang_private_vars, writes_gang_private,
+  prop_set);
 
   if (!(mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)))
 {
@@ -799,8 +802,7 @@ find_local_vars_to_propagate (parallel_g *par, unsigned outer_mask,
 		  if (!VAR_P (var)
 		  || is_global_var (var)
 		  || AGGREGATE_TYPE_P (TREE_TYPE (var))
-		  || !partitioned_var_uses->contains (var)
-		  || gang_private_vars->contains (var))
+		  || !partitioned_var_uses->contains (var))
 		continue;
 
 		  if (stmt_may_clobber_ref_p (stmt, var))
@@ -814,6 +816,14 @@ find_local_vars_to_propagate (parallel_g *par, unsigned outer_mask,
 			  fprintf (dump_file, "\n");
 			}
 
+		  if (gang_private_vars->contains (var))
+			{
+			  /* If we 

Re: [PATCH] [og10] OpenACC: Shared memory layout optimisation

2021-09-17 Thread Thomas Schwinge
Hi!

On 2020-06-29T13:16:52-0700, Julian Brown  wrote:
> This patch implements an algorithm to lay out local data-share (LDS) space.  
> It currently works for AMD GCN.  [...]

Thanks.  (... but I have not verified the algorithmic/behavioral changes
in detail.)

I've merged in PR96334 "openacc: Unshare reduction temporaries for GCN".

I've removed some (trivial) left-over/dead code.

Already earlier, I had settled on using "gang-private" (instead of
"gangprivate", "gang private", "ganglocal", "gang-local", "gang local",
etc.), so I've made such (trivial) changes, including renaming GCN target
'-mgang-local-size=[...]' to '-mgang-private-size=[...]', which is only
used here:

> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c
> @@ -0,0 +1,79 @@
> +/* { dg-additional-options "-foffload=-mgang-local-size=64" } */

..., which I've clarified to:

+/* To avoid 'error: shared-memory region overflow':
+   { dg-additional-options 
"-foffload-options=amdgcn-amdhsa=-mgang-private-size=64" { target 
openacc_radeon_accel_selected } }
+*/

(Would be good to have test cases testing for the error conditions
handled in these changes here.)

The the clean-up items from my earlier email remain to be done still.

Pushed to master branch commit 2a3f9f6532bb21d8ab6f16fbe9ee603f6b1405f2
"openacc: Shared memory layout optimisation", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 2a3f9f6532bb21d8ab6f16fbe9ee603f6b1405f2 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Mon, 29 Jun 2020 13:16:52 -0700
Subject: [PATCH] openacc: Shared memory layout optimisation

This patch implements an algorithm to lay out local data-share (LDS)
space.  It currently works for AMD GCN.  At the moment, LDS is used for
three things:

  1. Gang-private variables
  2. Reduction temporaries (accumulators)
  3. Broadcasting for worker partitioning

After the patch is applied, (2) and (3) are placed at preallocated
locations in LDS, and (1) continues to be handled by the backend (as it
is at present prior to this patch being applied). LDS now looks like this:

  +--+ (gang-private size + 1024, = 1536)
  | free space   |
  |...   |
  | - - - - - - -|
  | worker bcast |
  +--+
  | reductions   |
  +--+ <<< -mgang-private-size= (def. 512)
  | gang-private |
  |vars  |
  +--+ (32)
  | low LDS vars |
  +--+ LDS base

So, gang-private space is fixed at a constant amount at compile time
(which can be increased with a command-line switch if necessary
for some given code). The layout algorithm takes out a slice of the
remainder of usable space for reduction vars, and uses the rest for
worker partitioning.

The partitioning algorithm works as follows.

 1. An "adjacency" set is built up for each basic block that might
do a broadcast. This is calculated by starting at each such block,
and doing a recursive DFS walk over successors to find the next
block (or blocks) that *also* does a broadcast
(dfs_broadcast_reachable_1).

 2. The adjacency set is inverted to get adjacent predecessor blocks also.

 3. Blocks that will perform a broadcast are sorted by size of that
broadcast: the biggest blocks are handled first.

 4. A splay tree structure is used to calculate the spans of LDS memory
that are already allocated by the blocks adjacent to this one
(merge_ranges{,_1}.

 5. The current block's broadcast space is allocated from the first free
span not allocated in the splay tree structure calculated above
(first_fit_range). This seems to work quite nicely and efficiently
with the splay tree structure.

 6. Continue with the next-biggest broadcast block until we're done.

In this way, "adjacent" broadcasts will not use the same piece of
LDS memory.

PR96334 "openacc: Unshare reduction temporaries for GCN" got merged in:

The GCN backend uses tree nodes like MEM((__lds TYPE *) )
for reduction temporaries. Unlike e.g. var decls and SSA names, these
nodes cannot be shared during gimplification, but are so in some
circumstances. This is detected when appropriate --enable-checking
options are used. This patch unshares such nodes when they are reused
more than once.

gcc/
	* config/gcn/gcn-protos.h
	(gcn_goacc_create_worker_broadcast_record): Update prototype.
	* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Use
	preallocated block of LDS memory.  Do not cache/share decls for
	reduction temporaries between invocations.
	(gcn_goacc_reduction_teardown): Unshare VAR on second use.
	(gcn_goacc_create_worker_broadcast_record): Add OFFSET parameter
	and return temporary LDS space at that offset.  Return pointer in
	"sender" case.
	* config/gcn/gcn.c (acc_lds_size, gang_private_hwm, 

Re: [PATCH] [og10] OpenACC: Turn off worker partitioning if num_workers==1

2021-09-17 Thread Thomas Schwinge
Hi!

On 2020-06-29T13:16:51-0700, Julian Brown  wrote:
> This patch turns off the middle-end worker-partitioning support if the
> number of workers for an outlined offload function is one.  In that case,
> we do not need to perform the broadcasting/neutering code transformation.

ACK, thanks.

> --- a/gcc/omp-offload.c
> +++ b/gcc/omp-offload.c
> @@ -2165,7 +2165,20 @@ public:
>/* opt_pass methods: */
>virtual bool gate (function *)
>{
> -return flag_openacc && targetm.goacc.worker_partitioning;
> +if (!flag_openacc || !targetm.goacc.worker_partitioning)
> +  return false;
> +
> +tree attr = oacc_get_fn_attrib (current_function_decl);
> +
> +if (!attr)
> +  /* Not an offloaded function.  */
> +  return false;

This last check implies that code in
'gcc/omp-oacc-neuter-broadcast.cc:execute_omp_oacc_neuter_broadcast'
ought to be simplified which currently does:

tree attr = oacc_get_fn_attrib (current_function_decl);
if (attr)

..., which now is always-true.

> +
> +int worker_dim
> +  = oacc_get_fn_dim_size (current_function_decl, GOMP_DIM_WORKER);
> +
> +/* No worker partitioning if we know the number of workers is 1.  */
> +return worker_dim != 1;
>};

Pushed to master branch commit 82792cc407d7a7ab99f37e8501d19be2e6164e50
"openacc: Turn off worker partitioning if num_workers==1", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 82792cc407d7a7ab99f37e8501d19be2e6164e50 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Mon, 29 Jun 2020 13:16:51 -0700
Subject: [PATCH] openacc: Turn off worker partitioning if num_workers==1

This patch turns off the middle-end worker-partitioning support if the
number of workers for an outlined offload function is one.  In that case,
we do not need to perform the broadcasting/neutering code transformation.

	gcc/
	* omp-oacc-neuter-broadcast.cc
	(pass_omp_oacc_neuter_broadcast::gate): Disable if num_workers is
	1.
	(execute_omp_oacc_neuter_broadcast): Adjust.

Co-Authored-By: Thomas Schwinge 
---
 gcc/omp-oacc-neuter-broadcast.cc | 47 +---
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/gcc/omp-oacc-neuter-broadcast.cc b/gcc/omp-oacc-neuter-broadcast.cc
index d48627a6940..3fe92248c4e 100644
--- a/gcc/omp-oacc-neuter-broadcast.cc
+++ b/gcc/omp-oacc-neuter-broadcast.cc
@@ -1378,18 +1378,17 @@ execute_omp_oacc_neuter_broadcast ()
 
   /* If this is a routine, calculate MASK as if the outer levels are already
  partitioned.  */
-  tree attr = oacc_get_fn_attrib (current_function_decl);
-  if (attr)
-{
-  tree dims = TREE_VALUE (attr);
-  unsigned ix;
-  for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
-	{
-	  tree allowed = TREE_PURPOSE (dims);
-	  if (allowed && integer_zerop (allowed))
-	mask |= GOMP_DIM_MASK (ix);
-	}
-}
+  {
+tree attr = oacc_get_fn_attrib (current_function_decl);
+tree dims = TREE_VALUE (attr);
+unsigned ix;
+for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims))
+  {
+	tree allowed = TREE_PURPOSE (dims);
+	if (allowed && integer_zerop (allowed))
+	  mask |= GOMP_DIM_MASK (ix);
+  }
+  }
 
   parallel_g *par = omp_sese_discover_pars (_stmt_map);
   populate_single_mode_bitmaps (par, worker_single, vector_single, mask, 0);
@@ -1506,11 +1505,27 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *)
+  virtual bool gate (function *fun)
   {
-return (flag_openacc
-	&& targetm.goacc.create_worker_broadcast_record);
-  };
+if (!flag_openacc)
+  return false;
+
+if (!targetm.goacc.create_worker_broadcast_record)
+  return false;
+
+/* Only relevant for OpenACC offloaded functions.  */
+tree attr = oacc_get_fn_attrib (fun->decl);
+if (!attr)
+  return false;
+
+/* Not relevant for 'num_workers(1)'.  */
+int worker_dim
+  = oacc_get_fn_dim_size (fun->decl, GOMP_DIM_WORKER);
+if (worker_dim == 1)
+  return false;
+
+return true;
+  }
 
   virtual unsigned int execute (function *)
 {
-- 
2.33.0



Re: [PATCH] [og10] OpenACC: Shared memory layout optimisation

2021-09-17 Thread Thomas Schwinge
Hi!

On 2020-06-29T13:16:52-0700, Julian Brown  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c

That one already/currently works without any code changes.

> @@ -0,0 +1,79 @@
> +/* { dg-additional-options "-foffload=-mgang-local-size=64" } */

... just without that directive, obviously.

Pushed to master branch commit 8251f90e87f67e09f5203e8edd77bfe73b68a54d
"Add 'libgomp.oacc-c-c++-common/broadcast-many.c'", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 8251f90e87f67e09f5203e8edd77bfe73b68a54d Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Mon, 29 Jun 2020 13:16:52 -0700
Subject: [PATCH] Add 'libgomp.oacc-c-c++-common/broadcast-many.c'

libgomp/
	* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: New test.
---
 .../broadcast-many.c  | 77 +++
 1 file changed, 77 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c
new file mode 100644
index 000..d763a754a11
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c
@@ -0,0 +1,77 @@
+#include 
+#include 
+
+#define LOCAL(n) double n = input;
+#define LOCALS(n) LOCAL(n##1) LOCAL(n##2) LOCAL(n##3) LOCAL(n##4) \
+		  LOCAL(n##5) LOCAL(n##6) LOCAL(n##7) LOCAL(n##8)
+#define LOCALS2(n) LOCALS(n##a) LOCALS(n##b) LOCALS(n##c) LOCALS(n##d) \
+		   LOCALS(n##e) LOCALS(n##f) LOCALS(n##g) LOCALS(n##h)
+
+#define USE(n) n
+#define USES(n,OP) USE(n##1) OP USE(n##2) OP USE(n##3) OP USE (n##4) OP \
+		   USE(n##5) OP USE(n##6) OP USE(n##7) OP USE (n##8)
+#define USES2(n,OP) USES(n##a,OP) OP USES(n##b,OP) OP USES(n##c,OP) OP \
+		USES(n##d,OP) OP USES(n##e,OP) OP USES(n##f,OP) OP \
+		USES(n##g,OP) OP USES(n##h,OP)
+
+int main (void)
+{
+  int ret;
+  int input = 1;
+
+  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
+  {
+int w = 0;
+LOCALS2(h);
+
+#pragma acc loop worker reduction(+:w)
+for (int i = 0; i < 32; i++)
+  {
+	int u = USES2(h,+);
+	w += u;
+  }
+
+printf ("w=%d\n", w);
+/* { dg-output "w=2048(\n|\r\n|\r)" } */
+
+LOCALS2(i);
+
+#pragma acc loop worker reduction(+:w)
+for (int i = 0; i < 32; i++)
+  {
+	int u = USES2(i,+);
+	w += u;
+  }
+
+printf ("w=%d\n", w);
+/* { dg-output "w=4096(\n|\r\n|\r)" } */
+
+LOCALS2(j);
+LOCALS2(k);
+
+#pragma acc loop worker reduction(+:w)
+for (int i = 0; i < 32; i++)
+  {
+	int u = USES2(j,+);
+	w += u;
+  }
+
+printf ("w=%d\n", w);
+/* { dg-output "w=6144(\n|\r\n|\r)" } */
+
+#pragma acc loop worker reduction(+:w)
+for (int i = 0; i < 32; i++)
+  {
+	int u = USES2(k,+);
+	w += u;
+  }
+
+ret = (w == 64 * 32 * 4);
+printf ("w=%d\n", w);
+/* { dg-output "w=8192(\n|\r\n|\r)" } */
+  }
+
+  assert (ret);
+
+  return 0;
+}
-- 
2.33.0



Re: [PING^2] Re: Fix 'hash_table::expand' to destruct stale Value objects

2021-09-17 Thread Jonathan Wakely via Gcc-patches
On Fri, 17 Sep 2021, 16:52 Thomas Schwinge,  wrote:

> Hi!
>
> On 2021-09-17T15:03:18+0200, Richard Biener 
> wrote:
> > On Fri, Sep 17, 2021 at 2:39 PM Jonathan Wakely 
> wrote:
> >> On Fri, 17 Sept 2021 at 13:08, Richard Biener
> >>  wrote:
> >> > On Fri, Sep 17, 2021 at 1:17 PM Thomas Schwinge <
> tho...@codesourcery.com> wrote:
> >> > > On 2021-09-10T10:00:25+0200, I wrote:
> >> > > > On 2021-09-01T19:31:19-0600, Martin Sebor via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> >> > > >> On 8/30/21 4:46 AM, Thomas Schwinge wrote:
> >> > > >>> Ping -- we still need to plug the memory leak; see patch
> attached, [...]
>
> >> > > > Ping for formal approval (and review for using proper
> >> > > > C++ terminology in the 'gcc/hash-table.h:hash_table::expand'
> source code
> >> > > > comment that I'm adding).  Patch again attached, for easy
> reference.
>
> >> > I'm happy when a C++ literate approves the main change which I quote
> as
> >> >
> >> >   new ((void*) q) value_type (std::move (x));
> >> > +
> >> > + /* Manually invoke destructor of original object, to
> counterbalance
> >> > +object constructed via placement new.  */
> >> > + x.~value_type ();
> >> >
> >> > but I had the impression that std::move already "moves away" from the
> source?
> >>
> >> It just casts the argument to an rvalue reference, which allows the
> >> value_type constructor to steal its guts.
> >>
> >> > That said, the dance above looks iffy, there must be a nicer way to
> "move"
> >> > an object in C++?
> >>
> >> The code above is doing two things: transfer the resources from x to a
> >> new object at location *q, and then destroy x.
> >>
> >> The first part (moving its resources) has nothing to do with
> >> destruction. An object still needs to be destroyed, even if its guts
> >> have been moved to another object.
> >>
> >> The second part is destroying the object, to end its lifetime. You
> >> wouldn't usually call a destructor explicitly, because it would be
> >> done automatically at the end of scope for objects on the stack, or
> >> done by delete when you free obejcts on the heap. This is a special
> >> case where the object's lifetime is manually managed in storage that
> >> is manually managed.
>
> ACK, and happy that I understood this correctly.
>
> And, thanks for providing some proper C++-esque wording, which I
> summarized to replace my original source code comment.
>
> >> > What happens if the dtor is deleted btw?
> >>
> >> If the destructor is deleted you have created an unusable type that
> >> cannot be stored in containers. It can only be created using new, and
> >> then never destroyed. If you play stupid games, you win stupid prizes.
> >> Don't do that.
>
> Haha!  ;-)
>
> And, by the way, as I understood this: if the destructor is "trivial"
> (which includes POD types, for example), the explicit destructor call
> here is a no-op.
>

Right.

And you can even do x.~value_type(); for things which aren't classes and
don't have any destructor at all, not even a trivial one. So in a template
function, if the template argument T is int or char or long*, it's ok to do
t.~T(). This is called a pseudo-destructor call (because scalar types like
int don't actually have a destructor). This will also be a no-op.

This allows you to write the same template code for any types* and it will
correctly destroy them, whether they have a non-trivial destructor that
does real work, or a trivial one, or if they are not even classes and have
no destructor at all.

* Well, nearly any types... You can't do it if the destructor is deleted,
as Richi asked about, or private, and you can't do it for non-object types
(references, functions, void) but that's ok because you can't store them in
a container anyway.


Re: [Patch] Fortran: Fix -Wno-missing-include-dirs handling [PR55534]

2021-09-17 Thread Tobias Burnus

I seemingly messed up a bit in previous patch – corrected version attached.

OK?

Tobias

PS: Due to now enabling the missing-include-dir warning also for cpp,the 
following
warning show up during build. This seems to be specific to libgfortran building,
libgomp works and real-world code also does not seem to be affected:
: Error: /x86_64-pc-linux-gnu/include: No such file or 
directory [-Werror=missing-include-dirs]
: Error: /x86_64-pc-linux-gnu/sys-include: No such file or 
directory [-Werror=missing-include-dirs]
: Error: finclude: No such file or directory 
[-Werror=missing-include-dirs]

The latter is due to the driver adding '-fintrinsic-modules-path finclude'
when calling f951. I think the rest is a side effect of running with -B
and other build trickery.

The warnings do not trigger when compiling the Fortran file in libgomp nor for
a quick real-world case (which uses gfortran in a normal way not with -B etc.).
Thus, I think it should be fine.
Alternatively, we could think of reducing the noisiness. Thoughts?

PPS: Besides this, the following still applies:

On 17.09.21 15:02, Tobias Burnus wrote:

Short version:
* -Wno-missing-include-dirs  had no effect as the warning was always on
* For CPP-only options like -idirafter, no warning was shown.

This patch tries to address both, i.e. cpp's include-dir diagnostics are
shown as well – and silencing the diagnostic works as well.

OK for mainline?

Tobias

PS:  BACKGROUND and LONG DESCRIPTION

C/C++ by default have disabled the -Wmissing-include-dirs warning.
Fortran by default has that warning enabled.

The value is actually stored at two places (cf. c-family/c.opt):
  Wmissing-include-dirs
  ... CPP(warn_missing_include_dirs)
Var(cpp_warn_missing_include_dirs) Init(0)

For CPP, that value always needs to initialized – and it is used
in gcc/incpath.c as
  cpp_options *opts = cpp_get_options (pfile);
  if (opts->warn_missing_include_dirs &&
cur->user_supplied_p)
cpp_warning (pfile, CPP_W_MISSING_INCLUDE_DIRS, "%s: %s",

Additionally, there is cpp_warn_missing_include_dirs which is used by
Fortran – and which consists of
  global_options.x_cpp_warn_missing_include_dirs
  global_options_set._cpp_warn_missing_include_dirs

The flag processing happens as described in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55534#c11
in short:
  - First gfc_init_options is called
  - Then for reach command-line option gfc_handle_option
  - Finally gfc_post_options

Currently:
- gfc_init_options: Sets cpp_warn_missing_include_dirs
  (unconditionally as unset)
- gfc_handle_option: Always warns about the missing include dir
- before gfc_post_options: set_option is called, which sets
  cpp_warn_missing_include_dirs – but that's too late.

Additionally, as mentioned above – pfile's warn_missing_include_dirs
is never properly set.

 * * *

This patch fixes those issues:
* Now -Wno-missing-include-dirs does silence the warnings
* CPP now also properly does warn.

Example (initial version):
$ gfortran-trunk ../empty.f90 -c -cpp -idirafter /fdaf/ -I bar/
-Wmissing-include-dirs
f951: Warning: Nonexistent include directory ‘bar//’
[-Wmissing-include-dirs]
: Warning: /fdaf/: No such file or directory
: Warning: bar/: No such file or directory

In order to avoid the double output for -I, I disabled the Fortran
output if
CPP is enabled. Additionally, I had to use the cpp_reason_option_codes to
print the flag in brackets.
Fixed/final output is:

: Warning: /fdaf/: No such file or directory
[-Wmissing-include-dirs]
: Warning: bar/: No such file or directory
[-Wmissing-include-dirs]


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix -Wno-missing-include-dirs handling [PR55534]

gcc/fortran/ChangeLog:

	PR fortran/55534
	* cpp.c: Define GCC_C_COMMON_C for #include "options.h" to make
	cpp_reason_option_codes available.
	(gfc_cpp_register_include_paths): Make static, set pfile's
	warn_missing_include_dirs and move before caller.
	(gfc_cpp_init_cb): New, cb code moved from ...
	(gfc_cpp_init_0): ... here.
	(gfc_cpp_post_options): Call gfc_cpp_init_cb.
	(cb_cpp_diagnostic_cpp_option): New. As implemented in c-family
	to match CppReason flags to -W... names.
	(cb_cpp_diagnostic): Use it to replace single special case.
	* cpp.h (gfc_cpp_register_include_paths): Remove as now static.
	* gfortran.h (gfc_check_include_dirs): New prototype.
	* options.c (gfc_init_options): Don't set -Wmissing-include-dirs.
	(gfc_post_options): Set it here after commandline processing.
	* scanner.c (gfc_do_check_include_dirs, gfc_check_include_dirs):
	New. Diagnostic moved from ...
	(add_path_to_list): ... here, which came before cmdline processing.
	* scanner.h (struct gfc_directorylist): Reorder for alignment issues,
	add new 'bool warn'.


[COMMITTED] Provide a relation oracle for paths.

2021-09-17 Thread Andrew MacLeod via Gcc-patches
This patch provides a path-oracle which complies with the 
relation_oracle base class and is used to track any relations that are 
found along a specific path.


There are upcoming threader changes which utilize this oracle.

As paths are walked, gimple_range_fold can be used to automatically 
register any relations that are discovered on this path and allowing the 
normal kind of relational folding to happen.  This also allows 
equivalencies to be entered between PHI defs and the use on the PHI edge 
that the path traverses.


Furthermore, it can be used in conjunction with a normal ranger 
dominance based relation oracle which can be used to automatically 
utilize any relations/equivalences that are active at the start of the 
path an integrate seemlessly with the path relation list.


Bootstrapped on x86_64-pc-linux-gnu with no regressions, pushed. Client 
threader code coming shortly.


Andrew



>From 534c5352a02485a41ebfb133b42edbbecba7eba3 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Fri, 17 Sep 2021 09:48:35 -0400
Subject: [PATCH 2/2] Provide a relation oracle for paths.

This provides a path_oracle class which can optionally be used in conjunction
with another oracle to track relations on a path as it is walked.

	* value-relation.cc (class equiv_chain): Move to header file.
	(path_oracle::path_oracle): New.
	(path_oracle::~path_oracle): New.
	(path_oracle::register_relation): New.
	(path_oracle::query_relation): New.
	(path_oracle::reset_path): New.
	(path_oracle::dump): New.
	* value-relation.h (class equiv_chain): Move to here.
	(class path_oracle): New.
---
 gcc/value-relation.cc | 188 +++---
 gcc/value-relation.h  |  51 +++-
 2 files changed, 224 insertions(+), 15 deletions(-)

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 3e077d38a11..d370f93d128 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -190,9 +190,6 @@ relation_transitive (relation_kind r1, relation_kind r2)
 
 // -
 
-// This class represents an equivalency set, and contains a link to the next
-// one in the list to be searched.
-
 // The very first element in the m_equiv chain is actually just a summary
 // element in which the m_names bitmap is used to indicate that an ssa_name
 // has an equivalence set in this block.
@@ -201,16 +198,6 @@ relation_transitive (relation_kind r1, relation_kind r2)
 // which has the bit for SSA_NAME set. Then scan for the equivalency set in
 // that block.   No previous lists need be searched.
 
-class equiv_chain
-{
-public:
-  bitmap m_names;		// ssa-names in equiv set.
-  basic_block m_bb;		// Block this belongs to
-  equiv_chain *m_next;		// Next in block list.
-  void dump (FILE *f) const;	// Show names in this list.
-  equiv_chain *find (unsigned ssa);
-};
-
 // If SSA has an equivalence in this list, find and return it.
 // Otherwise return NULL.
 
@@ -1172,3 +1159,178 @@ relation_oracle::debug () const
 {
   dump (stderr);
 }
+
+path_oracle::path_oracle (relation_oracle *oracle)
+{
+  m_root = oracle;
+  bitmap_obstack_initialize (_bitmaps);
+  obstack_init (_chain_obstack);
+
+  // Initialize header records.
+  m_equiv.m_names = BITMAP_ALLOC (_bitmaps);
+  m_equiv.m_bb = NULL;
+  m_equiv.m_next = NULL;
+  m_relations.m_names = BITMAP_ALLOC (_bitmaps);
+  m_relations.m_head = NULL;
+}
+
+path_oracle::~path_oracle ()
+{
+  obstack_free (_chain_obstack, NULL);
+  bitmap_obstack_release (_bitmaps);
+}
+
+// Return the equiv set for SSA, and if there isn't one, check for equivs
+// starting in block BB.
+
+const_bitmap
+path_oracle::equiv_set (tree ssa, basic_block bb)
+{
+  // Check the list first.
+  equiv_chain *ptr = m_equiv.find (SSA_NAME_VERSION (ssa));
+  if (ptr)
+return ptr->m_names;
+
+  // Otherwise defer to the root oracle.
+  if (m_root)
+return m_root->equiv_set (ssa, bb);
+
+  // Allocate a throw away bitmap if there isn't a root oracle.
+  bitmap tmp = BITMAP_ALLOC (_bitmaps);
+  bitmap_set_bit (tmp, SSA_NAME_VERSION (ssa));
+  return tmp;
+}
+
+// Register an equivalence between SSA1 and SSA2 resolving unkowns from
+// block BB.
+
+void
+path_oracle::register_equiv (basic_block bb, tree ssa1, tree ssa2)
+{
+  const_bitmap equiv_1 = equiv_set (ssa1, bb);
+  const_bitmap equiv_2 = equiv_set (ssa2, bb);
+
+  // Check if they are the same set, if so, we're done.
+  if (bitmap_equal_p (equiv_1, equiv_2))
+return;
+
+  // Don't mess around, simply create a new record and insert it first.
+  bitmap b = BITMAP_ALLOC (_bitmaps);
+  bitmap_copy (b, equiv_1);
+  bitmap_ior_into (b, equiv_2);
+
+  equiv_chain *ptr = (equiv_chain *) obstack_alloc (_chain_obstack,
+		sizeof (equiv_chain));
+  ptr->m_names = b;
+  ptr->m_bb = NULL;
+  ptr->m_next = m_equiv.m_next;
+  m_equiv.m_next = ptr;
+  bitmap_ior_into (m_equiv.m_names, b);
+}
+
+// Register relation K between SSA1 and SSA2, resolving unknowns by
+// querying 

[COMMITTED] Virtualize relation oracle and various cleanups.

2021-09-17 Thread Andrew MacLeod via Gcc-patches

This patch cleans up a number of things in the relation oracle.

First it virtualizes a base relation_oracle class, and then standardizes 
the equivalency oracle to utilize that API, and then restructures the 
old relation oracle into a dominance based oracle which inherits from 
the equivalency oracle.  This simplifies some of the code, and paves the 
way for a the next patch which provides a path-based oracle which the 
threader will use to maintain relations on paths.


The only externally functional change is the requirements for 
equivalence have been tightened up slightly such that we will not report 
an equivalence if each ssa name is not in the other's equivalence set.  
There is a forthcoming path which firms this up even more, but this is 
enough for the moment.


Bootstrapped on x86_64-pc-linux-gnu with no regressions, pushed.

Andrew






>From 3674d8e6fc6305507ed50b501f049f25f868458a Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 15 Sep 2021 14:43:51 -0400
Subject: [PATCH 1/2] Virtualize relation oracle and various cleanups.

Standardize equiv_oracle API onto the new relation_oracle virtual base, and
then have dom_oracle inherit from that.
equiv_set always returns an equivalency set now, never NULL.
EQ_EXPR requires symmetry now.  Each SSA name must be in the other equiv set.
Shuffle some routines around, simplify.

	* gimple-range-cache.cc (ranger_cache::ranger_cache): Create a DOM
	based oracle.
	* gimple-range-fold.cc (fur_depend::register_relation): Use
	register_stmt/edge routines.
	* value-relation.cc (equiv_chain::find): Relocate from equiv_oracle.
	(equiv_oracle::equiv_oracle): Create self equivalence cache.
	(equiv_oracle::~equiv_oracle): Release same.
	(equiv_oracle::equiv_set): Return entry from self equiv cache if there
	are no equivalences.
	(equiv_oracle::find_equiv_block): Move list find to equiv_chain.
	(equiv_oracle::register_relation): Rename from register_equiv.
	(relation_chain_head::find_relation): Relocate from dom_oracle.
	(relation_oracle::register_stmt): New.
	(relation_oracle::register_edge): New.
	(dom_oracle::*): Rename from relation_oracle.
	(dom_oracle::register_relation): Adjust to call equiv_oracle.
	(dom_oracle::set_one_relation): Split from register_relation.
	(dom_oracle::register_transitives): Consolidate 2 methods.
	(dom_oracle::find_relation_block): Move core to relation_chain.
	(dom_oracle::query_relation): Rename from find_relation_dom and adjust.
	* value-relation.h (class relation_oracle): New pure virtual base.
	(class equiv_oracle): Inherit from relation_oracle and adjust.
	(class dom_oracle): Rename from old relation_oracle and adjust.
---
 gcc/gimple-range-cache.cc |   2 +-
 gcc/gimple-range-fold.cc  |   4 +-
 gcc/value-relation.cc | 316 +++---
 gcc/value-relation.h  |  62 +---
 4 files changed, 206 insertions(+), 178 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index facf981c15d..fbf0f95eef9 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -760,7 +760,7 @@ ranger_cache::ranger_cache ()
   m_temporal = new temporal_cache;
   // If DOM info is available, spawn an oracle as well.
   if (dom_info_available_p (CDI_DOMINATORS))
-  m_oracle = new relation_oracle ();
+  m_oracle = new dom_oracle ();
 else
   m_oracle = NULL;
 
diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 7cf8830fc5d..997d02dd4b9 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -195,7 +195,7 @@ void
 fur_depend::register_relation (gimple *s, relation_kind k, tree op1, tree op2)
 {
   if (m_oracle)
-m_oracle->register_relation (s, k, op1, op2);
+m_oracle->register_stmt (s, k, op1, op2);
 }
 
 // Register a relation on an edge if there is an oracle.
@@ -204,7 +204,7 @@ void
 fur_depend::register_relation (edge e, relation_kind k, tree op1, tree op2)
 {
   if (m_oracle)
-m_oracle->register_relation (e, k, op1, op2);
+m_oracle->register_edge (e, k, op1, op2);
 }
 
 // This version of fur_source will pick a range up from a list of ranges
diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index ba01d298521..3e077d38a11 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -199,7 +199,7 @@ relation_transitive (relation_kind r1, relation_kind r2)
 // This allows for much faster traversal of the DOM chain, as a search for
 // SSA_NAME simply requires walking the DOM chain until a block is found
 // which has the bit for SSA_NAME set. Then scan for the equivalency set in
-// that block.   No previous blcoks need be searched.
+// that block.   No previous lists need be searched.
 
 class equiv_chain
 {
@@ -208,8 +208,26 @@ public:
   basic_block m_bb;		// Block this belongs to
   equiv_chain *m_next;		// Next in block list.
   void dump (FILE *f) const;	// Show names in this list.
+  equiv_chain *find (unsigned ssa);
 };
 
+// If SSA has an equivalence in this list, find and return 

[PATCH] libstdc++: Implement C++20 atomic and atomic

2021-09-17 Thread Thomas Rodgers
From: Thomas Rodgers 

Signed-off-by: Thomas Rodgers 

libstdc++-v3/ChangeLog:
* config/abi/pre/gnu.ver (GLIBCXX_3.4.21): Do not match new _Sp_locker
constructor.
(GLIBCXX_3.4.30): Export _Sp_locker::_M_wait/_M_notify and new
constructor.
* include/bits/shared_ptr_atomic.h: define __cpp_lib_atomic_shared_ptr
feature test macro.
(_Sp_locker::_Sp_locker(const void*, bool): New constructor.
(_Sp_locker::_M_wait()), _Sp_locker::_M_notify()): New methods.
(_Sp_impl): New type.
(atomic>): New partial template specialization.
(atomic>): New partial template specialization.
* include/std/version: define __cpp_lib_atomic_shared_ptr feature
test macro.
* src/c++11/Makefile.am: Compile src/c++11/shared_ptr.cc
-std=gnu++20.
* src/c++11/Makefile.in: Regenerate.
* src/c++11/shared_ptr.cc (_Sp_locker::_Sp_locker(const void*, bool),
_Sp_locker::_M_wait(), _Sp_locker::_M_notify(): Implement.
* testsuite/20_util/shared_ptr/atomic/4.cc: New test.
* testsuite/20_util/shared_ptr/atomic/5.cc: Likewise.
* testsuite/20_util/shared_ptr/atomic/atomic_shared_ptr.cc: Likewise.
---
 libstdc++-v3/config/abi/pre/gnu.ver   |  12 +-
 libstdc++-v3/include/bits/shared_ptr_atomic.h | 309 ++
 libstdc++-v3/include/std/version  |   1 +
 libstdc++-v3/src/c++11/Makefile.am|   6 +
 libstdc++-v3/src/c++11/Makefile.in|   6 +
 libstdc++-v3/src/c++11/shared_ptr.cc  |  86 -
 .../testsuite/20_util/shared_ptr/atomic/4.cc  |  28 ++
 .../testsuite/20_util/shared_ptr/atomic/5.cc  |  28 ++
 .../shared_ptr/atomic/atomic_shared_ptr.cc| 159 +
 9 files changed, 632 insertions(+), 3 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/shared_ptr/atomic/4.cc
 create mode 100644 libstdc++-v3/testsuite/20_util/shared_ptr/atomic/5.cc
 create mode 100644 
libstdc++-v3/testsuite/20_util/shared_ptr/atomic/atomic_shared_ptr.cc

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 5323c7f0604..727afd2d488 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1705,8 +1705,9 @@ GLIBCXX_3.4.21 {
 # std::ctype_base::blank
 _ZNSt10ctype_base5blankE;
 
-# std::_Sp_locker::*
-_ZNSt10_Sp_locker[CD]*;
+# std::_Sp_locker:: constructors and destructors
+_ZNSt10_Sp_lockerC*[^b];
+_ZNSt10_Sp_lockerD*;
 
 # std::notify_all_at_thread_exit
 
_ZSt25notify_all_at_thread_exitRSt18condition_variableSt11unique_lockISt5mutexE;
@@ -2397,6 +2398,13 @@ GLIBCXX_3.4.29 {
 
 } GLIBCXX_3.4.28;
 
+GLIBCXX_3.4.30 {
+  # std::_Sp_locker:: wait/notify support
+  _ZNSt10_Sp_lockerC*[b];
+  _ZNSt10_Sp_locker7_M_waitEv;
+  _ZNSt10_Sp_locker9_M_notifyEv;
+} GLIBCXX_3.4.29;
+
 # Symbols in the support library (libsupc++) have their own tag.
 CXXABI_1.3 {
 
diff --git a/libstdc++-v3/include/bits/shared_ptr_atomic.h 
b/libstdc++-v3/include/bits/shared_ptr_atomic.h
index 6e94d83c46d..2aec3adac7c 100644
--- a/libstdc++-v3/include/bits/shared_ptr_atomic.h
+++ b/libstdc++-v3/include/bits/shared_ptr_atomic.h
@@ -32,6 +32,10 @@
 
 #include 
 
+#if __cplusplus > 201703L
+# define __cpp_lib_atomic_shared_ptr 201711L
+#endif
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -55,6 +59,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _Sp_locker(const void*, const void*) noexcept;
 ~_Sp_locker();
 
+#if __cpp_lib_atomic_shared_ptr
+// called only by notifiers, does not acquire a lock
+_Sp_locker(const void*, bool) noexcept;
+
+void
+_M_wait() noexcept;
+
+void
+_M_notify() noexcept;
+#endif
+
   private:
 unsigned char _M_key1;
 unsigned char _M_key2;
@@ -327,6 +342,300 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
   /// @}
 
+#if __cpp_lib_atomic_shared_ptr
+  template
+struct _Sp_impl
+{
+  using value_type = _Tp;
+
+  static constexpr bool
+  _M_is_always_lock_free = false;
+
+  static bool
+  _M_is_lock_free() noexcept
+  { return false; }
+
+  constexpr _Sp_impl() noexcept = default;
+  _Sp_impl(value_type __r) noexcept
+   : _M_p(move(__r))
+  { }
+
+  _Sp_impl(const _Sp_impl&) = delete;
+  void operator=(const _Sp_impl&) = delete;
+
+  value_type
+  _M_load(memory_order) const noexcept
+  {
+   _Sp_locker __lock{&_M_p};
+   return _M_p;
+  }
+
+  void
+  _M_store(value_type __r, memory_order) noexcept
+  {
+   _Sp_locker __lock{&_M_p};
+   _M_p.swap(__r); // use swap so that *__p not destroyed while lock held
+  }
+
+  value_type
+  _M_exchange(value_type __r, memory_order) noexcept
+  {
+   _Sp_locker __lock{&_M_p};
+   _M_p.swap(__r);
+   return __r;
+  }
+
+  template
+   bool
+   _M_compare_exchange_strong(value_type& __v, 

Re: Merge from trunk to gccgo branch

2021-09-17 Thread Ian Lance Taylor via Gcc-patches
I merged trunk revision 89be17a1b231ade643f28fbe616d53377e069da8 to
the gccgo branch.

Ian


Re: [PATCH] testsuite: Fix gcc.target/i386/auto-init-* tests.

2021-09-17 Thread Qing Zhao via Gcc-patches


> On Sep 17, 2021, at 11:59 AM, Jakub Jelinek  wrote:
> 
> On Fri, Sep 17, 2021 at 04:55:22PM +, Qing Zhao wrote:
>> This is the patch to fix gcc.target/i386/auto-init-* tests.
>> 
>> I have tested the change at X86_64-linux with
>> 
>> make check-gcc 
>> RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512,-m64/-fstack-protector-all,-m64/-fstack-clash-protection,-m32/-mno-sse,-m32/-mtune=bonnell,-m32/-march=bonnell,-m32/-fstack-protector-all/-fstack-clash-protection\}
>>  i386.exp=auto-init*’
>> 
>> make check-gcc 
>> RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512/-fPIC,-m64/-fstack-protector-all/-fPIC,-m64/-fstack-clash-protection/-fPIC,-m32/-mno-sse/-fPIC,-m32/-mtune=bonnell/-fPIC,-m32/-march=bonnell/-fPIC,-m32/-fstack-protector-all/-fstack-clash-protection/-fPIC\}
>>  i386.exp=auto-init*’
>> 
>> Everything works fine.
>> 
>> Okay for commit?
> 
> LGTM.

Thank you.

I will commit the change soon.

For the aarch64 tests, do you have a suggestion on what the option combination 
I should test?

Qing
> 
>   Jakub
> 



Re: [PATCH] c++: fix wrong fixit hints for misspelled typedef [PR77565]

2021-09-17 Thread Michel Morin via Gcc-patches
On Fri, Sep 17, 2021 at 3:23 AM Jason Merrill  wrote:
>
> On 9/16/21 11:50, Michel Morin wrote:
> > On Thu, Sep 16, 2021 at 5:44 AM Jason Merrill  wrote:
> >>
> >> On 9/14/21 04:29, Michel Morin via Gcc-patches wrote:
> >>> On Tue, Sep 14, 2021 at 7:14 AM David Malcolm  wrote:
> 
>  On Tue, 2021-09-14 at 03:35 +0900, Michel Morin via Gcc-patches wrote:
> > Hi,
> >
> > PR77565 reports that, with the code `typdef int Int;`, GCC emits
> > "did you mean 'typeof'?" instead of "did you mean 'typedef'?".
> >
> > This happens because the typo corrector determines that `typeof` is a
> > candidate for suggestion (through
> > `cp_keyword_starts_decl_specifier_p`),
> > but `typedef` is not.
> >
> > This patch fixes the issue by adding `typedef` as a candidate. The
> > patch
> > additionally adds the `inline` specifier and cv-specifiers as a
> > candidate.
> > Here is a patch (tests `make check-gcc` pass on darwin):
> 
>  Thanks for this patch (and for reporting the bug in the first place).
> 
>  I notice that, as well as being used for fix-it hints by
>  lookup_name_fuzzy (indirectly via suggest_rid_p),
>  cp_keyword_starts_decl_specifier_p is also used by
>  cp_lexer_next_token_is_decl_specifier_keyword, which is used by
>  cp_parser_lambda_declarator_opt and cp_parser_constructor_declarator_p.
> >>>
> >>> Ah, you're right! Thank you for pointing this out.
> >>> I failed to grep those functions somehow.
> >>>
> >>> One thing that confuses me is that cp_keyword_starts_decl_specifier_p
> >>> misses many keywords that can start decl-specifiers (e.g.
> >>> typedef/inline/cv-qual and friend/explicit/virtual).
> >>> So let's wait C++ frontend maintainers ;)
> >>
> >> That is strange.  Let's add all the rest of them as well.
> >
> > Done. Thanks for your help!
> >
> > One more thing — cp_keyword_starts_decl_specifier_p includes RID_ATTRIBUTE
> > (from the beginning; see https://gcc.gnu.org/PR28261 ), but attributes are
> > not decl-specifiers. Would it be reasonable to remove this?
>
> It looks like the place that PR28261 used
> cp_lexer_next_token_is_decl_specifier_keyword specifically exempts
> attributes:
>
> >   && (!cp_lexer_next_token_is_decl_specifier_keyword (parser->lexer)
> >   /* GNU attributes can actually appear both at the start of
> >  a parameter and parenthesized declarator.
> >  S (__attribute__((unused)) int);
> >  is a constructor, but
> >  S (__attribute__((unused)) foo) (int);
> >  is a function declaration.  */
> >   || (cp_parser_allow_gnu_extensions_p (parser)
> >   && cp_next_tokens_can_be_gnu_attribute_p (parser)))
>
> So yes, let's remove RID_ATTRIBUTE and the || clause there.  I'd keep
> the comment, but move it to go with the test for C++11 attributes below.

Done. No regressions introduced.

> > One more thing — cp_keyword_starts_decl_specifier_p includes RID_ATTRIBUTE
> > (from the beginning; see https://gcc.gnu.org/PR28261 ), but attributes are
> > not decl-specifiers.

Oh, this is wrong. I thought that, since C++11 attributes are not a
decl-specifier, neither are GNU attributes. But the comment just before
cp_parser_decl_specifier_seq says that GNU attributes are considered as a
decl-specifier. So I'm not confident about the removal of RID_ATTRIBUTE in
cp_keyword_starts_decl_specifier_p...

I've split the patch into two. The first one is for adding missing keywords to
fix PR77565 and the second one is for removing the "attribute" keyword.
Here is the second patch (if this is not applied, that's no problem ;) )

==
c++: adjust the handling of RID_ATTRIBUTE.

gcc/cp/ChangeLog:

* parser.c (cp_keyword_starts_decl_specifier_p): Do not
handle RID_ATTRIBUTE.
(cp_parser_constructor_declarator_p): Remove now-redundant
checks.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 40308d0d33f..d184a3aca7e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -1062,7 +1062,6 @@ cp_keyword_starts_decl_specifier_p (enum rid keyword)
 case RID_TYPEDEF:
 case RID_INLINE:
   /* GNU extensions.  */
-case RID_ATTRIBUTE:
 case RID_TYPEOF:
   /* C++11 extensions.  */
 case RID_DECLTYPE:
@@ -30798,23 +30797,22 @@ cp_parser_constructor_declarator_p
(cp_parser *parser, cp_parser_flags flags,
/* A parameter declaration begins with a decl-specifier,
   which is either the "attribute" keyword, a storage class
   specifier, or (usually) a type-specifier.  */
-   && (!cp_lexer_next_token_is_decl_specifier_keyword (parser->lexer)
-   /* GNU attributes can actually appear both at the start of
- a parameter and parenthesized declarator.
- S (__attribute__((unused)) int);
- is a constructor, but
- S (__attribute__((unused)) foo) (int);
- is a function declaration.  */
-   || 

Re: [PATCH] testsuite: Fix gcc.target/i386/auto-init-* tests.

2021-09-17 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 17, 2021 at 04:55:22PM +, Qing Zhao wrote:
> This is the patch to fix gcc.target/i386/auto-init-* tests.
> 
> I have tested the change at X86_64-linux with
> 
> make check-gcc 
> RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512,-m64/-fstack-protector-all,-m64/-fstack-clash-protection,-m32/-mno-sse,-m32/-mtune=bonnell,-m32/-march=bonnell,-m32/-fstack-protector-all/-fstack-clash-protection\}
>  i386.exp=auto-init*’
> 
> make check-gcc 
> RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512/-fPIC,-m64/-fstack-protector-all/-fPIC,-m64/-fstack-clash-protection/-fPIC,-m32/-mno-sse/-fPIC,-m32/-mtune=bonnell/-fPIC,-m32/-march=bonnell/-fPIC,-m32/-fstack-protector-all/-fstack-clash-protection/-fPIC\}
>  i386.exp=auto-init*’
> 
> Everything works fine.
> 
> Okay for commit?

LGTM.

Jakub



[PATCH] testsuite: Fix gcc.target/i386/auto-init-* tests.

2021-09-17 Thread Qing Zhao via Gcc-patches
Hi,

This is the patch to fix gcc.target/i386/auto-init-* tests.

I have tested the change at X86_64-linux with

make check-gcc 
RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512,-m64/-fstack-protector-all,-m64/-fstack-clash-protection,-m32/-mno-sse,-m32/-mtune=bonnell,-m32/-march=bonnell,-m32/-fstack-protector-all/-fstack-clash-protection\}
 i386.exp=auto-init*’

make check-gcc 
RUNTESTFLAGS='--target_board=unix\{-m64,-m64/-march=skylake-avx512/-fPIC,-m64/-fstack-protector-all/-fPIC,-m64/-fstack-clash-protection/-fPIC,-m32/-mno-sse/-fPIC,-m32/-mtune=bonnell/-fPIC,-m32/-march=bonnell/-fPIC,-m32/-fstack-protector-all/-fstack-clash-protection/-fPIC\}
 i386.exp=auto-init*’

Everything works fine.

Okay for commit?

Thanks.

Qing

**

testsuite: Fix gcc.target/i386/auto-init-* tests.

This set of tests failed on many different combination of -march, -mtune.
some of them failed with -fstack-protestor-all, or -mno-sse. And the
pattern matches are also different on lp64 or ia32.

The reason for these failures is that the RTL or assembly level patten
matches are only valid for -march=x86-64 -mtune=generic.

We restrict the testing only for -march=x86-64 and -mtune=generic. Also
add -fno-stack-protector or -msse for some of the testing cases. 

gcc/testsuite/ChangeLog:

2021-09-17  qing zhao  

* gcc.target/i386/auto-init-1.c: Restrict the testing only for
-march=x86-64 and -mtune=generic. Add -fno-stack-protector.
* gcc.target/i386/auto-init-2.c: Restrict the testing only for
-march=x86-64 and -mtune=generic -msse.
* gcc.target/i386/auto-init-3.c: Likewise.
* gcc.target/i386/auto-init-4.c: Likewise.
* gcc.target/i386/auto-init-5.c: Different pattern match for lp64 and
ia32.
* gcc.target/i386/auto-init-6.c: Restrict the testing only for
-march=x86-64 and -mtune-generic -msse. Add -fno-stack-protector.
* gcc.target/i386/auto-init-7.c: Likewise.
* gcc.target/i386/auto-init-8.c: Restrict the testing only for
-march=x86-64 and -mtune=generic -msse..
* gcc.target/i386/auto-init-padding-1.c: Likewise.
* gcc.target/i386/auto-init-padding-10.c: Likewise.
* gcc.target/i386/auto-init-padding-11.c: Likewise.
* gcc.target/i386/auto-init-padding-12.c: Likewise.
* gcc.target/i386/auto-init-padding-2.c: Likewise.
* gcc.target/i386/auto-init-padding-3.c: Restrict the testing only for
-march=x86-64. Different pattern match for lp64 and ia32.
* gcc.target/i386/auto-init-padding-4.c: Restrict the testing only for
-march=x86-64 and -mtune-generic -msse.
* gcc.target/i386/auto-init-padding-5.c: Likewise.
* gcc.target/i386/auto-init-padding-6.c: Likewise.
* gcc.target/i386/auto-init-padding-7.c: Restrict the testing only for
-march=x86-64 and -mtune-generic -msse. Add -fno-stack-protector.
* gcc.target/i386/auto-init-padding-8.c: Likewise.
* gcc.target/i386/auto-init-padding-9.c: Restrict the testing only for
-march=x86-64. Different pattern match for lp64 and ia32.



From dd9902a95fb0631f5e2eecb37e76b559913484c7 Mon Sep 17 00:00:00 2001
From: Qing Zhao 
Date: Fri, 17 Sep 2021 14:49:25 +
Subject: [PATCH] Fix i386 testing cases

---
 gcc/testsuite/gcc.target/i386/auto-init-1.c  |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-2.c  |  8 +---
 gcc/testsuite/gcc.target/i386/auto-init-3.c  |  5 +++--
 gcc/testsuite/gcc.target/i386/auto-init-4.c  | 10 ++
 gcc/testsuite/gcc.target/i386/auto-init-5.c  |  5 +++--
 gcc/testsuite/gcc.target/i386/auto-init-6.c  |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-7.c  |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-8.c  |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-padding-1.c  |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-padding-10.c |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-padding-11.c |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-padding-12.c |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-padding-2.c  |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-padding-3.c  |  8 +---
 gcc/testsuite/gcc.target/i386/auto-init-padding-4.c  |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-padding-5.c  |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-padding-6.c  |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-padding-7.c  |  5 +++--
 gcc/testsuite/gcc.target/i386/auto-init-padding-8.c  |  7 +++
 gcc/testsuite/gcc.target/i386/auto-init-padding-9.c  |  7 +--
 20 files changed, 45 insertions(+), 34 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/auto-init-1.c 
b/gcc/testsuite/gcc.target/i386/auto-init-1.c
index b7690df..3391be1 100644
--- a/gcc/testsuite/gcc.target/i386/auto-init-1.c
+++ b/gcc/testsuite/gcc.target/i386/auto-init-1.c
@@ -1,6 +1,6 @@
 /* Verify zero initialization for integer and pointer type 

Re: [PATCH] tree-optimization/65206 - dependence analysis on mixed pointer/array

2021-09-17 Thread Richard Biener via Gcc-patches
On September 17, 2021 6:36:10 PM GMT+02:00, Richard Sandiford 
 wrote:
>Richard Biener via Gcc-patches  writes:
>> On Fri, 17 Sep 2021, Richard Biener wrote:
>>
>>> On Fri, 17 Sep 2021, Richard Sandiford wrote:
>>> 
>>> > Richard Biener  writes:
>>> > > This adds the capability to analyze the dependence of mixed
>>> > > pointer/array accesses.  The example is from where using a masked
>>> > > load/store creates the pointer-based access when an otherwise
>>> > > unconditional access is array based.  Other examples would include
>>> > > accesses to an array mixed with accesses from inlined helpers
>>> > > that work on pointers.
>>> > >
>>> > > The idea is quite simple and old - analyze the data-ref indices
>>> > > as if the reference was pointer-based.  The following change does
>>> > > this by changing dr_analyze_indices to work on the indices
>>> > > sub-structure and storing an alternate indices substructure in
>>> > > each data reference.  That alternate set of indices is analyzed
>>> > > lazily by initialize_data_dependence_relation when it fails to
>>> > > match-up the main set of indices of two data references.
>>> > > initialize_data_dependence_relation is refactored into a head
>>> > > and a tail worker and changed to work on one of the indices
>>> > > structures and thus away from using DR_* access macros which
>>> > > continue to reference the main indices substructure.
>>> > >
>>> > > There are quite some vectorization and loop distribution opportunities
>>> > > unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
>>> > > 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
>>> > > 544.nab_r see amendments in what they report with -fopt-info-loop while
>>> > > the rest of the specrate set sees no changes there.  Measuring runtime
>>> > > for the set where changes were reported reveals nothing off-noise
>>> > > besides 511.povray_r which seems to regress slightly for me
>>> > > (on a Zen2 machine with -Ofast -march=native).
>>> > >
>>> > > Changes from the [RFC] version are properly handling bitfields
>>> > > that we cannot take the address of and optimization of refs
>>> > > that already are MEM_REFs and thus won't see any change.  I've
>>> > > also elided changing the set of vect_masked_stores targets in
>>> > > favor of explicitely listing avx (but I did not verify if the
>>> > > testcase passes on aarch64-sve or amdgcn).
>>> > >
>>> > > The improves cases like the following from Povray:
>>> > >
>>> > >for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
>>> > >  {
>>> > > VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
>>> > > Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
>>> > >  }
>>> > >
>>> > > where there is a plain array access mixed with abstraction
>>> > > using T[] or T* arguments.  That should be a not too uncommon
>>> > > situation in the wild.  The loop above is now vectorized and was not
>>> > > without the change.
>>> > >
>>> > > Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
>>> > > built and run SPEC CPU 2017 successfully.
>>> > >
>>> > > OK?
>>> > 
>>> > Took a while to page this stuff back in :-/
>>> > 
>>> > I guess if we're adding alt_indices to the main data_reference,
>>> > we'll need to free the access_fns in free_data_ref.  It looked like
>>> > the patch as posted might have a memory leak.
>>> 
>>> Doh, yes - thanks for noticing.
>>> 
>>> > Perhaps instead we could use local indices structures for the
>>> > alt_indices and pass them in to initialize_data_dependence_relation?
>>> > Not that that's very elegant…
>>> 
>>> Yeah, I had that but then since for N data references we possibly
>>> call initialize_data_dependence_relation N^2 times we'd do this
>>> alternate analysis N^2 times at worst instead of N so it looked worth
>>> caching it in the data reference.  Since we have no idea why the
>>> first try fails we're going to try this fallback in the majority
>>> of cases that we cannot figure out otherwise so I didn't manage
>>> to hand-wave the quadraticness away ;)  OTOH one might argue
>>> it's just a constant factor ontop of the number of
>>> initialize_data_dependence_relation invocations.
>>> 
>>> So I can probably be convinced either way - guess I'll gather
>>> some statistics.
>>
>> I built SPEC 2017 CPU rate with -Ofast -march=znver2, overall there
>> are
>>
>>  4433976 calls to the first stage initialize_data_dependence_relation
>>  (skipping the cases dr_may_alias returned false)
>>  360248 (8%) ended up recursing with a set of alt_indices
>>  83512   times we computed alt_indices of a DR (that's with the cache)
>>  14905 (0.3%) times the recursive invocation ended with != chrec_dont_know
>>
>> thus when not doing the caching we'd compute alt_indices about 10 times
>> more often.  I didn't collect the number of distinct DRs (that's difficult
>> at this point), but I'd estimate from the above that we have 3 times
>> more "unused" alt_indices 

[PATCH v4] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions

2021-09-17 Thread Barrett Adair via Gcc-patches
I think the patch is in good shape now, thanks for the help.

canon-type*.C fail with trunk and pass with patch, dependent-name*.C are
regression tests that pass with both. I removed the dg-ice from
constexpr-52830.C. I didn't dig much into the churn history there, but the
test code looks valid to me and clang agrees.

I also returned the copyright assignment form yesterday to ass...@gnu.org.
From 9bea055c84d307ade7095fa2cdf7432b42ee0897 Mon Sep 17 00:00:00 2001
From: Barrett Adair 
Date: Wed, 15 Sep 2021 15:26:22 -0500
Subject: [PATCH] Fix template instantiation comparison outside function body

---
 gcc/cp/pt.c| 18 ++
 gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C   |  1 -
 gcc/testsuite/g++.dg/template/canon-type-15.C  |  7 +++
 gcc/testsuite/g++.dg/template/canon-type-16.C  |  6 ++
 gcc/testsuite/g++.dg/template/canon-type-17.C  |  5 +
 gcc/testsuite/g++.dg/template/canon-type-18.C  |  6 ++
 .../g++.dg/template/dependent-name15.C | 18 ++
 .../g++.dg/template/dependent-name16.C | 14 ++
 8 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-15.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-16.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-17.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-18.C
 create mode 100644 gcc/testsuite/g++.dg/template/dependent-name15.C
 create mode 100644 gcc/testsuite/g++.dg/template/dependent-name16.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 224dd9ebd2b..597422045d3 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -27766,6 +27766,19 @@ dependent_template_arg_p (tree arg)
 return value_dependent_expression_p (arg);
 }
 
+/* Identify any expressions that use function parms */
+static tree
+find_parm_usage_r (tree *tp, int *walk_subtrees, void*)
+{
+  tree t = *tp;
+  if (PACK_EXPANSION_P (t) || (TREE_CODE (t) == PARM_DECL))
+{
+  *walk_subtrees = 0;
+  return t;
+}
+  return NULL_TREE;
+}
+
 /* Returns true if ARGS (a collection of template arguments) contains
any types that require structural equality testing.  */
 
@@ -27810,6 +27823,11 @@ any_template_arguments_need_structural_equality_p (tree args)
 	  else if (!TYPE_P (arg) && TREE_TYPE (arg)
 		   && TYPE_STRUCTURAL_EQUALITY_P (TREE_TYPE (arg)))
 		return true;
+	  else if (!current_function_decl
+		   && dependent_template_arg_p (arg)
+		   && cp_walk_tree_without_duplicates (,
+		find_parm_usage_r, NULL))
+		return true;
 	}
 	}
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
index eae0d8c377b..d6057f13497 100644
--- a/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-52830.C
@@ -1,7 +1,6 @@
 // PR c++/52830
 // { dg-do compile { target c++11 } }
 // { dg-additional-options "-fchecking" }
-// { dg-ice "comptypes" }
 
 template struct eif { typedef void type; };
 template<>   struct eif {};
diff --git a/gcc/testsuite/g++.dg/template/canon-type-15.C b/gcc/testsuite/g++.dg/template/canon-type-15.C
new file mode 100644
index 000..b001b5c841d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/canon-type-15.C
@@ -0,0 +1,7 @@
+// { dg-do compile { target c++11 } }
+template struct size_c{ static constexpr unsigned value = u; };
+namespace g {
+template auto return_size(T t) -> size_c;
+template auto return_size(T t) -> size_c;
+}
+static_assert(decltype(g::return_size('a'))::value == 1u, "");
diff --git a/gcc/testsuite/g++.dg/template/canon-type-16.C b/gcc/testsuite/g++.dg/template/canon-type-16.C
new file mode 100644
index 000..99361cbac30
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/canon-type-16.C
@@ -0,0 +1,6 @@
+// { dg-do compile { target c++11 } }
+template struct bool_c{ static constexpr bool value = u; };
+template auto noexcepty(T t) -> bool_c;
+template auto noexcepty(T t) -> bool_c;
+struct foo { void operator()() noexcept; };
+static_assert(decltype(noexcepty(foo{}))::value, "");
diff --git a/gcc/testsuite/g++.dg/template/canon-type-17.C b/gcc/testsuite/g++.dg/template/canon-type-17.C
new file mode 100644
index 000..0555c8d0a42
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/canon-type-17.C
@@ -0,0 +1,5 @@
+// { dg-do compile { target c++11 } }
+template struct size_c{ static constexpr unsigned value = u; };
+template auto return_size(T... t) -> size_c;
+template auto return_size(T... t) -> size_c;
+static_assert(decltype(return_size('a'))::value == 1u, "");
diff --git a/gcc/testsuite/g++.dg/template/canon-type-18.C b/gcc/testsuite/g++.dg/template/canon-type-18.C
new file mode 100644
index 000..2510181725c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/canon-type-18.C
@@ -0,0 +1,6 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wno-pedantic" }
+template struct size_c{ static constexpr 

Re: [PATCH] tree-optimization/65206 - dependence analysis on mixed pointer/array

2021-09-17 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Fri, 17 Sep 2021, Richard Biener wrote:
>
>> On Fri, 17 Sep 2021, Richard Sandiford wrote:
>> 
>> > Richard Biener  writes:
>> > > This adds the capability to analyze the dependence of mixed
>> > > pointer/array accesses.  The example is from where using a masked
>> > > load/store creates the pointer-based access when an otherwise
>> > > unconditional access is array based.  Other examples would include
>> > > accesses to an array mixed with accesses from inlined helpers
>> > > that work on pointers.
>> > >
>> > > The idea is quite simple and old - analyze the data-ref indices
>> > > as if the reference was pointer-based.  The following change does
>> > > this by changing dr_analyze_indices to work on the indices
>> > > sub-structure and storing an alternate indices substructure in
>> > > each data reference.  That alternate set of indices is analyzed
>> > > lazily by initialize_data_dependence_relation when it fails to
>> > > match-up the main set of indices of two data references.
>> > > initialize_data_dependence_relation is refactored into a head
>> > > and a tail worker and changed to work on one of the indices
>> > > structures and thus away from using DR_* access macros which
>> > > continue to reference the main indices substructure.
>> > >
>> > > There are quite some vectorization and loop distribution opportunities
>> > > unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
>> > > 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
>> > > 544.nab_r see amendments in what they report with -fopt-info-loop while
>> > > the rest of the specrate set sees no changes there.  Measuring runtime
>> > > for the set where changes were reported reveals nothing off-noise
>> > > besides 511.povray_r which seems to regress slightly for me
>> > > (on a Zen2 machine with -Ofast -march=native).
>> > >
>> > > Changes from the [RFC] version are properly handling bitfields
>> > > that we cannot take the address of and optimization of refs
>> > > that already are MEM_REFs and thus won't see any change.  I've
>> > > also elided changing the set of vect_masked_stores targets in
>> > > favor of explicitely listing avx (but I did not verify if the
>> > > testcase passes on aarch64-sve or amdgcn).
>> > >
>> > > The improves cases like the following from Povray:
>> > >
>> > >for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
>> > >  {
>> > > VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
>> > > Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
>> > >  }
>> > >
>> > > where there is a plain array access mixed with abstraction
>> > > using T[] or T* arguments.  That should be a not too uncommon
>> > > situation in the wild.  The loop above is now vectorized and was not
>> > > without the change.
>> > >
>> > > Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
>> > > built and run SPEC CPU 2017 successfully.
>> > >
>> > > OK?
>> > 
>> > Took a while to page this stuff back in :-/
>> > 
>> > I guess if we're adding alt_indices to the main data_reference,
>> > we'll need to free the access_fns in free_data_ref.  It looked like
>> > the patch as posted might have a memory leak.
>> 
>> Doh, yes - thanks for noticing.
>> 
>> > Perhaps instead we could use local indices structures for the
>> > alt_indices and pass them in to initialize_data_dependence_relation?
>> > Not that that's very elegant…
>> 
>> Yeah, I had that but then since for N data references we possibly
>> call initialize_data_dependence_relation N^2 times we'd do this
>> alternate analysis N^2 times at worst instead of N so it looked worth
>> caching it in the data reference.  Since we have no idea why the
>> first try fails we're going to try this fallback in the majority
>> of cases that we cannot figure out otherwise so I didn't manage
>> to hand-wave the quadraticness away ;)  OTOH one might argue
>> it's just a constant factor ontop of the number of
>> initialize_data_dependence_relation invocations.
>> 
>> So I can probably be convinced either way - guess I'll gather
>> some statistics.
>
> I built SPEC 2017 CPU rate with -Ofast -march=znver2, overall there
> are
>
>  4433976 calls to the first stage initialize_data_dependence_relation
>  (skipping the cases dr_may_alias returned false)
>  360248 (8%) ended up recursing with a set of alt_indices
>  83512   times we computed alt_indices of a DR (that's with the cache)
>  14905 (0.3%) times the recursive invocation ended with != chrec_dont_know
>
> thus when not doing the caching we'd compute alt_indices about 10 times
> more often.  I didn't collect the number of distinct DRs (that's difficult
> at this point), but I'd estimate from the above that we have 3 times
> more "unused" alt_indices than used.
>
> OK, so that didn't really help me avoid flipping a coin ;)

Sounds like a good justification for keeping the caching to me :-)

Richard


Re: [PATCH] rs6000: Modify the way for extra penalized cost

2021-09-17 Thread Bill Schmidt via Gcc-patches

Hi Kewen,

On 9/15/21 8:14 PM, Kewen.Lin wrote:

Hi,

This patch follows the discussion here[1], where Segher pointed
out the existing way to guard the extra penalized cost for
strided/elementwise loads with a magic bound doesn't scale.

The way with nunits * stmt_cost can get one much exaggerated
penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320,
that's why we need one bound.  To make it scale, this patch
doesn't use nunits * stmt_cost any more, but it still keeps
nunits since there are actually nunits scalar loads there.  So
it uses one cost adjusted from stmt_cost, since the current
stmt_cost sort of considers nunits, we can stablize the cost
for big nunits and retain the cost for small nunits.  After
some tries, this patch gets the adjusted cost as:

 stmt_cost / (log2(nunits) * log2(nunits))

For V16QI, the adjusted cost would be 1 and total penalized
cost is 16, it isn't exaggerated.  For V2DI, the adjusted
cost would be 2 and total penalized cost is 4, which is the
same as before.  btw, I tried to use one single log2(nunits),
but the penalized cost is still big enough and can't fix the
degraded bmk blender_r.

The separated SPEC2017 evaluations on Power8, Power9 and Power10
at option sets O2-vect and Ofast-unroll showed this change is
neutral (that is same effect as before).

Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust
the way to compute extra penalized cost.

---
  gcc/config/rs6000/rs6000.c | 28 +---
  1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4ab23b0ab33..e08b94c0447 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5454,17 +5454,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data 
*data,
{
  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
  unsigned int nunits = vect_nunits_for_cost (vectype);
- unsigned int extra_cost = nunits * stmt_cost;
- /* As function rs6000_builtin_vectorization_cost shows, we have
-priced much on V16QI/V8HI vector construction as their units,
-if we penalize them with nunits * stmt_cost, it can result in
-an unreliable body cost, eg: for V16QI on Power8, stmt_cost
-is 20 and nunits is 16, the extra cost is 320 which looks
-much exaggerated.  So let's use one maximum bound for the
-extra penalized cost for vector construction here.  */
- const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
- if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
-   extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
+ /* As function rs6000_builtin_vectorization_cost shows, we
+have priced much on V16QI/V8HI vector construction by
+considering their units, if we penalize them with nunits
+* stmt_cost here, it can result in an unreliable body cost,


This might be confusing to the reader, since you have deleted the 
calculation of nunits * stmt_cost.  Could you instead write this to 
indicate that we used to adjust in this way, and it had this particular 
downside, so that's why you're choosing this heuristic? It's a minor 
thing but I think people reading the code will be confused otherwise.


I think the heuristic is generally reasonable, and certainly better than 
what we had before!


LGTM with adjusted commentary, so recommend maintainers approve.

Thanks for the patch!
Bill

+eg: for V16QI on Power8, stmt_cost is 20 and nunits is 16,
+the penalty will be 320 which looks much exaggerated.  But
+there are actually nunits scalar loads, so we try to adopt
+one reasonable penalized cost for each load rather than
+stmt_cost.  Here, with stmt_cost dividing by log2(nunits)^2,
+we can still retain the necessary penalty for small nunits
+meanwhile stabilize the penalty for big nunits.  */
+ int nunits_log2 = exact_log2 (nunits);
+ gcc_assert (nunits_log2 > 0);
+ unsigned int nunits_sq = nunits_log2 * nunits_log2;
+ unsigned int adjusted_cost = stmt_cost / nunits_sq;
+ gcc_assert (adjusted_cost > 0);
+ unsigned int extra_cost = nunits * adjusted_cost;
  data->extra_ctor_cost += extra_cost;
}
  }
--
2.25.1




Re: [PATCH] rs6000: Parameterize some const values for density test

2021-09-17 Thread Bill Schmidt via Gcc-patches

Hi Kewen,

On 9/15/21 3:52 AM, Kewen.Lin wrote:

Hi,

This patch follows the discussion here[1], where Segher suggested
parameterizing those exact magic constants for density heuristics,
to make it easier to tweak if need.

Since these heuristics are quite internal, I make these parameters
as undocumented and be mainly used by developers.

The change here should be "No Functional Change".  But I verified
it with SPEC2017 at option sets O2-vect and Ofast-unroll on Power8,
the result is neutral as expected.

Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/rs6000.opt (rs6000-density-pct-threshold,
rs6000-density-size-threshold, rs6000-density-penalty,
rs6000-density-load-pct-threshold,
rs6000-density-load-num-threshold): New parameter.
* config/rs6000/rs6000.c (rs6000_density_test): Adjust with
corresponding parameters.

---
  gcc/config/rs6000/rs6000.c   | 22 +++---
  gcc/config/rs6000/rs6000.opt | 21 +
  2 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9bc826e3a50..4ab23b0ab33 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5284,9 +5284,6 @@ struct rs6000_cost_data
  static void
  rs6000_density_test (rs6000_cost_data *data)
  {
-  const int DENSITY_PCT_THRESHOLD = 85;
-  const int DENSITY_SIZE_THRESHOLD = 70;
-  const int DENSITY_PENALTY = 10;
struct loop *loop = data->loop_info;
basic_block *bbs = get_loop_body (loop);
int nbbs = loop->num_nodes;
@@ -5322,26 +5319,21 @@ rs6000_density_test (rs6000_cost_data *data)
free (bbs);
density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);

-  if (density_pct > DENSITY_PCT_THRESHOLD
-  && vec_cost + not_vec_cost > DENSITY_SIZE_THRESHOLD)
+  if (density_pct > rs6000_density_pct_threshold
+  && vec_cost + not_vec_cost > rs6000_density_size_threshold)
  {
-  data->cost[vect_body] = vec_cost * (100 + DENSITY_PENALTY) / 100;
+  data->cost[vect_body] = vec_cost * (100 + rs6000_density_penalty) / 100;
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
 "density %d%%, cost %d exceeds threshold, penalizing "
-"loop body cost by %d%%\n", density_pct,
-vec_cost + not_vec_cost, DENSITY_PENALTY);
+"loop body cost by %u%%\n", density_pct,
+vec_cost + not_vec_cost, rs6000_density_penalty);
  }

/* Check whether we need to penalize the body cost to account
   for excess strided or elementwise loads.  */
if (data->extra_ctor_cost > 0)
  {
-  /* Threshold for load stmts percentage in all vectorized stmts.  */
-  const int DENSITY_LOAD_PCT_THRESHOLD = 45;
-  /* Threshold for total number of load stmts.  */
-  const int DENSITY_LOAD_NUM_THRESHOLD = 20;
-
gcc_assert (data->nloads <= data->nstmts);
unsigned int load_pct = (data->nloads * 100) / data->nstmts;

@@ -5355,8 +5347,8 @@ rs6000_density_test (rs6000_cost_data *data)
  the loads.
 One typical case is the innermost loop of the hotspot of SPEC2017
 503.bwaves_r without loop interchange.  */
-  if (data->nloads > DENSITY_LOAD_NUM_THRESHOLD
- && load_pct > DENSITY_LOAD_PCT_THRESHOLD)
+  if (data->nloads > (unsigned int) rs6000_density_load_num_threshold
+ && load_pct > (unsigned int) rs6000_density_load_pct_threshold)
{
  data->cost[vect_body] += data->extra_ctor_cost;
  if (dump_enabled_p ())
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 0538db387dc..563983f3269 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -639,3 +639,24 @@ Enable instructions that guard against return-oriented 
programming attacks.
  mprivileged
  Target Var(rs6000_privileged) Init(0)
  Generate code that will run in privileged state.
+
+-param=rs6000-density-pct-threshold=
+Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) 
IntegerRange(0, 99) Param
+When costing for loop vectorization, we probably need to penalize the loop 
body cost if the existing cost model may not adequately reflect delays from 
unavailable vector resources.  We collect the cost for vectorized statements 
and non-vectorized statements separately, check the proportion of vec_cost to 
total cost of vec_cost and non vec_cost, and penalize only if the proportion 
exceeds the threshold specified by this parameter.  The default value is 85.
+
+-param=rs6000-density-size-threshold=
+Target Undocumented Joined UInteger Var(rs6000_density_size_threshold) 
Init(70) IntegerRange(0, 99) Param


I think 99 is not a sufficient upper bound.  This 

Re: [PATCH v2] C++: add type checking for static local vector variable in template

2021-09-17 Thread wangpc via Gcc-patches

OK I know, it's because the redundant code will check a declaration twice.

On 2021/9/17 22:30, wangpc wrote:


I have tested this patch on AArch64 and RISCV by running testsuites, 
the diagnostic message seems to be right.


While one thing that should be noted is that error message will be 
reported twice as below:


static-template.cpp: In instantiation of 'void f1() [with int a = 2]':
static-template.cpp:29:11:   required from here
static-template.cpp:11:24: error: RVV type 'vuint16m1_t' does not have a fixed 
size
11 | static vuint16m1_t v = vmv_v_x_u16m1(a, gvl);
   |^
static-template.cpp:11:24: error: RVV type 'vuint16m1_t' does not have a fixed 
size

I haven't figured it out, or is it a normal behavior?

On 2021/9/17 21:34, Jason Merrill wrote:

On 9/17/21 03:58, wangpc wrote:
This patch moves verify_type_context from start_decl_1 to 
cp_finish_decl
and adds type checking for static local vector variable in C++ 
template.


How have you tested this patch?
https://gcc.gnu.org/contribute.html#testing


2021-08-06  wangpc 

gcc/cp/ChangeLog

 * decl.c (start_decl_1): Remove verify_type_context.
    (cp_finish_decl): Add more type checking.

gcc/testsuite/ChangeLog

 * g++.target/aarch64/sve/static-var-in-template.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90111e4c786..d411963896a 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -5491,13 +5491,6 @@ start_decl_1 (tree decl, bool initialized)
    cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
  }
  -  if (is_global_var (decl))
-    {
-  type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
-   ? TCTX_THREAD_STORAGE
-   : TCTX_STATIC_STORAGE);
-  verify_type_context (input_location, context, TREE_TYPE (decl));
-    }
    if (initialized)
  /* Is it valid for this decl to have an initializer at all?  */
  {
@@ -7520,6 +7513,22 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,

    && DECL_INITIALIZED_IN_CLASS_P (decl))
  check_static_variable_definition (decl, type);
  +  if (!processing_template_decl && VAR_P (decl))
+    {
+  if (is_global_var (decl))
+    {
+  type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
+  ? TCTX_THREAD_STORAGE
+  : TCTX_STATIC_STORAGE);
+  verify_type_context (input_location, context, TREE_TYPE (decl));
+    }
+
+  if (DECL_FUNCTION_SCOPE_P (decl)
+  && TREE_STATIC (decl))
+    verify_type_context (DECL_SOURCE_LOCATION (decl),
+ TCTX_STATIC_STORAGE, type);


This is redundant; is_global_var is true for a local static. Which 
makes the name confusing, but that's the intended behavior.



+    }
+
    if (init && TREE_CODE (decl) == FUNCTION_DECL)
  {
    tree clone;
diff --git 
a/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C 
b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C

new file mode 100644
index 000..c2395d18d50
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+#include 
+
+template 
+void f()
+{
+    static svbool_t pg = svwhilelt_b64(0, N);
+}
+
+int main(int argc, char **argv)
+{
+    f<2>();
+    return 0;
+}
+
+/* { dg-error {SVE type 'svbool_t' does not have a fixed size} } */



[RFC] c++: Allow parm use outside function body for constexpr members

2021-09-17 Thread Barrett Adair via Gcc-patches
The WIP attached patch attempts to enable usage of parameters' constexpr
members before the function body begins (see new tests dependent-expr11.C
and dependent-expr12.C).

Unfortunately, I hit a "vexing" snag: gcc/testsuite/g++.dg/parse/ambig7.C
breaks with this patch. After several hours of debugging, here's what I
think happens:

In trunk, ambig7.C's testFail line compiles because the tentative parse --
where "testFail" is a function and int(A) is a parameter type -- fails only
when seeing that function uses "parameter outside function body" (the
error removed by this patch). This occurs when parsing the template
argument list, before calling finish_template_type.

After applying the patch, the tentative parse still fails because
function is not a type, but this only occurs after finish_template_type
is called, which causes the incorrect template-id for function to be
cached with broken arguments, such that function remains "unresolved
overloaded function type":

ambig7.C:16:36: error: no matching function for call to
‘Helper::Helper(int, )’
   16 | Helper testFail(int(A), function);

Any suggestions on how to go about fixing this? For instance, is it
possible to repair or roll back the cached template instantation?
From 609cbe53b758afaf344f6798be776cce0d00a3fa Mon Sep 17 00:00:00 2001
From: Barrett Adair 
Date: Fri, 17 Sep 2021 10:46:46 -0500
Subject: [PATCH] Allow parm use outside function body for constexpr members

---
 gcc/cp/pt.c | 11 +++
 gcc/cp/semantics.c  | 13 ++---
 gcc/testsuite/g++.dg/cpp0x/noexcept34.C |  2 +-
 .../g++.dg/parse/parameter-declaration-2.C  |  2 +-
 gcc/testsuite/g++.dg/template/dependent-expr11.C|  6 ++
 gcc/testsuite/g++.dg/template/dependent-expr12.C| 10 ++
 gcc/testsuite/g++.dg/template/memfriend19.C | 12 
 gcc/testsuite/g++.dg/template/memfriend20.C | 11 +++
 8 files changed, 50 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/dependent-expr11.C
 create mode 100644 gcc/testsuite/g++.dg/template/dependent-expr12.C
 create mode 100644 gcc/testsuite/g++.dg/template/memfriend19.C
 create mode 100644 gcc/testsuite/g++.dg/template/memfriend20.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 12c8812d8b2..3e864c9fd8e 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15133,9 +15133,13 @@ tsubst_function_type (tree t,
 	  inject_this_parameter (this_type, cp_type_quals (this_type));
 	}
 
+  begin_scope (sk_function_parms, in_decl);
+
   /* Substitute the return type.  */
   return_type = tsubst (TREE_TYPE (t), args, complain, in_decl);
 
+  finish_scope ();
+
   if (do_inject)
 	{
 	  current_class_ptr = save_ccp;
@@ -16594,10 +16598,9 @@ tsubst_copy (tree t, tree args, tsubst_flags_t complain, tree in_decl)
 	  if (DECL_NAME (t) == this_identifier && current_class_ptr)
 	return current_class_ptr;
 
-	  /* This can happen for a parameter name used later in a function
-	 declaration (such as in a late-specified return type).  Just
-	 make a dummy decl, since it's only used for its type.  */
-	  gcc_assert (cp_unevaluated_operand != 0);
+	  gcc_assert (cp_unevaluated_operand != 0
+	  || current_binding_level->kind == sk_function_parms);
+
 	  r = tsubst_decl (t, args, complain);
 	  /* Give it the template pattern as its context; its true context
 	 hasn't been instantiated yet and this is good enough for
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 35a7b9f7b83..d0987166361 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3742,7 +3742,8 @@ outer_var_p (tree decl)
 	  && DECL_FUNCTION_SCOPE_P (decl)
 	  /* Don't get confused by temporaries.  */
 	  && DECL_NAME (decl)
-	  && (DECL_CONTEXT (decl) != current_function_decl
+	  && ((current_function_decl != NULL
+	&& DECL_CONTEXT (decl) != current_function_decl)
 	  || parsing_nsdmi ()));
 }
 
@@ -4002,16 +4003,6 @@ finish_id_expression_1 (tree id_expression,
 	  if (decl == error_mark_node)
 	return error_mark_node;
 	}
-
-  /* Also disallow uses of function parameters outside the function
-	 body, except inside an unevaluated context (i.e. decltype).  */
-  if (TREE_CODE (decl) == PARM_DECL
-	  && DECL_CONTEXT (decl) == NULL_TREE
-	  && !cp_unevaluated_operand)
-	{
-	  *error_msg = G_("use of parameter outside function body");
-	  return error_mark_node;
-	}
 }
 
   /* If we didn't find anything, or what we found was a type,
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept34.C b/gcc/testsuite/g++.dg/cpp0x/noexcept34.C
index dce35652ef5..822a8597f63 100644
--- a/gcc/testsuite/g++.dg/cpp0x/noexcept34.C
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept34.C
@@ -10,7 +10,7 @@ template struct A
   void g () noexcept (f()) { } // { dg-error "use of parameter" }
   void g2 () noexcept (this->f()) { } // { dg-error "use of parameter" }
   void g3 () noexcept (b) { } // 

Re: [PATCH, OpenMP, Fortran] Support in_reduction for Fortran

2021-09-17 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 17, 2021 at 07:57:38PM +0800, Chung-Lin Tang wrote:
> 2021-09-17  Chung-Lin Tang  
> 
> gcc/fortran/ChangeLog:
> 
>   * openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default
>   false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case.
>   (gfc_match_omp_clauses): Add 'openmp_target' default false parameter,
>   adjust call to gfc_match_omp_clause_reduction.
>   (match_omp): Adjust call to gfc_match_omp_clauses
>   * trans-openmp.c (gfc_trans_omp_taskgroup): Add call to
>   gfc_match_omp_clause, create and return block.
> 
> gcc/ChangeLog:
> 
>   * omp-low.c (scan_sharing_clauses): Place in_reduction copy of variable
>   in outer ctx if if exists. Check if non-existent in field_map before
>   installing OMP_CLAUSE_IN_REDUCTION decl.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan
>   pattern.
> 
> libgomp/ChangeLog:
> 
>   * testsuite/libgomp.fortran/target-in-reduction-1.f90: New test.

> @@ -3496,7 +3509,8 @@ static match
>  match_omp (gfc_exec_op op, const omp_mask mask)
>  {
>gfc_omp_clauses *c;
> -  if (gfc_match_omp_clauses (, mask) != MATCH_YES)
> +  if (gfc_match_omp_clauses (, mask, true, true, false,
> +  (op == EXEC_OMP_TARGET)) != MATCH_YES)

The ()s around op == EXEC_OMP_TARGET are unnecessary.

> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -6391,12 +6391,17 @@ gfc_trans_omp_task (gfc_code *code)
>  static tree
>  gfc_trans_omp_taskgroup (gfc_code *code)
>  {
> +  stmtblock_t block;
> +  gfc_start_block ();
>tree body = gfc_trans_code (code->block->next);
>tree stmt = make_node (OMP_TASKGROUP);
>TREE_TYPE (stmt) = void_type_node;
>OMP_TASKGROUP_BODY (stmt) = body;
> -  OMP_TASKGROUP_CLAUSES (stmt) = NULL_TREE;
> -  return stmt;
> +  OMP_TASKGROUP_CLAUSES (stmt) = gfc_trans_omp_clauses (,
> + code->ext.omp_clauses,
> + code->loc);
> +  gfc_add_expr_to_block (, stmt);

If this was missing, then I'm afraid we lack a lot of testsuite coverage for
Fortran task reductions.  It doesn't need to be covered in this patch, but 
would be
good to cover it incrementally.  Because the above means nothing with
taskgroup with task_reduction clause(s) could work properly at runtime.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -1317,9 +1317,13 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> if (is_omp_target (ctx->stmt))
>   {
> tree at = decl;
> +   omp_context *scan_ctx = ctx;
> if (ctx->outer)
> - scan_omp_op (, ctx->outer);
> -   tree nt = omp_copy_decl_1 (at, ctx);
> + {
> +   scan_omp_op (, ctx->outer);
> +   scan_ctx = ctx->outer;
> + }
> +   tree nt = omp_copy_decl_1 (at, scan_ctx);
> splay_tree_insert (ctx->field_map,
>(splay_tree_key) _CONTEXT (decl),
>(splay_tree_value) nt);

You're right that the var remembered with _CONTEXT (whatever) key is
used outside of the target construct rather than inside of it.
So, if ctx->outer is non-NULL, it seems right to create the var in that
outer context.  But, if ctx->outer is NULL, which can happen if the
target construct is orphaned, consider e.g.
extern int 
extern int 

void
foo ()
{
  #pragma omp target in_reduction (+: x, y)
  {
x = x + 8;
y = y + 16;
  }
}

void
bar ()
{
  #pragma omp taskgroup task_reduction (+: x, y)
  foo ();
}
then those artificial decls (copies of x and y) should appear
to be at the function scope and not inside of the target region.

Therefore, I wonder if omp_copy_decl_2 shouldn't do the
  DECL_CONTEXT (copy) = current_function_decl;
  DECL_CHAIN (copy) = ctx->block_vars;
  ctx->block_vars = copy;
(the last one can be moved next to the others) only if ctx != NULL
and otherwise call gimple_add_tmp_var (copy); instead
and then just call omp_copy_decl_1 at that spot with unconditional
ctx->outer.

Also, this isn't the only place that should have such a change,
there is also
  if (ctx->outer)
scan_omp_op (, ctx->outer);
  tree nt = omp_copy_decl_1 (at, ctx);
  splay_tree_insert (ctx->field_map,
 (splay_tree_key) _CONTEXT (t),
 (splay_tree_value) nt);
a few lines above this and I'd expect that it should be (at, ctx->outer)
as well.

> @@ -1339,7 +1343,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> if (!is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx)))
>   {
> by_ref = use_pointer_for_field (decl, ctx);
> -   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IN_REDUCTION)
> +   

Re: [PATCH] warn for more impossible null pointer tests [PR102103]

2021-09-17 Thread Martin Sebor via Gcc-patches

On 9/8/21 2:06 PM, Jason Merrill wrote:

On 9/2/21 7:53 PM, Martin Sebor wrote:
@@ -4622,28 +4622,94 @@ warn_for_null_address (location_t location, 
tree op, tsubst_flags_t complain)

    if (!warn_address
    || (complain & tf_warning) == 0
    || c_inhibit_evaluation_warnings != 0
-  || warning_suppressed_p (op, OPT_Waddress))
+  || warning_suppressed_p (op, OPT_Waddress)
+  || processing_template_decl != 0)


Completely suppressing this warning in templates seems like a 
regression;  I'd think we could recognize many relevant cases before 
instantiation.  You just can't assume that ADDR_EXPR has the default 
meaning if it has unknown type (i.e. because op0 is type-dependent).


I added the suppression to keep g++.dg/warn/pr101219.C from failing
but in hindsight I should have questioned the reasoning behind
the "no warning emitted here (no instantiation)" comment in the test.

I agree that it would be helpful to diagnose the type-independent
subset of the problem even in uninstantiated templates.  Current
trunk doesn't (it never has), but with my patch and the suppression
above removed it does.  I've updated the tests to expect it.

Please see the attached revision.

Martin

PS There are still more opportunities to issue -Waddress in templates
that this patch doesn't handle, e.g.,:

  template  bool f (T *p) { return  == 0; }

Handling this will take more surgery.

PPS It seems that most other warnings (and even some errors, like
-Wnarrowing) are suppressed in uninstantiated templates as well,
even for non-dependent expressions.  In the few test cases I looked
at Clang does better.  It sounds like you'd like to see improvements
in this direction not just for -Waddress but in general.  Just for
the avoidance of doubt, can you confirm that for future reference?



Jason



Enhance -Waddress to detect more suspicious expressions [PR102103].

Resolves:
PR c/102103 - missing warning comparing array address to null


gcc/ChangeLog:

	* doc/invoke.texi (-Waddress): Update.
	* gcc/gengtype.c (write_types): Avoid -Waddress.

gcc/c-family/ChangeLog:

	* c-common.c (decl_with_nonnull_addr_p): Handle members.
	Check and perform warning suppression.
	(c_common_truthvalue_conversion): Enhance warning suppression.

gcc/c/ChangeLog:

	* c-typeck.c (maybe_warn_for_null_address): New function.
	(build_binary_op): Call it.

gcc/cp/ChangeLog:

	* typeck.c (warn_for_null_address): Enhance.
	(cp_build_binary_op): Call it also for member pointers.

gcc/fortran/ChangeLog:

	* gcc/fortran/array.c: Remove an unnecessary test.
	* gcc/fortran/trans-array.c: Same.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/constexpr-array-ptr10.C: Suppress a valid warning.
	* g++.dg/warn/Wreturn-local-addr-6.C: Correct a cast.
	* gcc.dg/Waddress.c: Expect a warning.
	* c-c++-common/Waddress-3.c: New test.
	* c-c++-common/Waddress-4.c: New test.
	* g++.dg/warn/Waddress-5.C: New test.
	* g++.dg/warn/Waddress-6.C: New test.
	* g++.dg/warn/pr101219.C: Expect a warning.

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index c6757f093ac..249da7c7f0f 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -3393,14 +3393,16 @@ c_wrap_maybe_const (tree expr, bool non_const)
   return expr;
 }
 
-/* Return whether EXPR is a declaration whose address can never be
-   NULL.  */
+/* Return whether EXPR is a declaration whose address can never be NULL.
+   The address of the first struct member could be NULL only if it were
+   accessed through a NULL pointer, and such an access would be invalid.  */
 
 bool
 decl_with_nonnull_addr_p (const_tree expr)
 {
   return (DECL_P (expr)
-	  && (TREE_CODE (expr) == PARM_DECL
+	  && (TREE_CODE (expr) == FIELD_DECL
+	  || TREE_CODE (expr) == PARM_DECL
 	  || TREE_CODE (expr) == LABEL_DECL
 	  || !DECL_WEAK (expr)));
 }
@@ -3488,13 +3490,17 @@ c_common_truthvalue_conversion (location_t location, tree expr)
 case ADDR_EXPR:
   {
  	tree inner = TREE_OPERAND (expr, 0);
-	if (decl_with_nonnull_addr_p (inner))
+	if (decl_with_nonnull_addr_p (inner)
+	/* Check both EXPR and INNER for suppression.  */
+	&& !warning_suppressed_p (expr, OPT_Waddress)
+	&& !warning_suppressed_p (inner, OPT_Waddress))
 	  {
-	/* Common Ada programmer's mistake.  */
+	/* Common Ada programmer's mistake.	 */
 	warning_at (location,
 			OPT_Waddress,
 			"the address of %qD will always evaluate as %",
 			inner);
+	suppress_warning (inner, OPT_Waddress);
 	return truthvalue_true_node;
 	  }
 	break;
@@ -3627,8 +3633,17 @@ c_common_truthvalue_conversion (location_t location, tree expr)
 	  break;
 	/* If this isn't narrowing the argument, we can ignore it.  */
 	if (TYPE_PRECISION (totype) >= TYPE_PRECISION (fromtype))
-	  return c_common_truthvalue_conversion (location,
-		 TREE_OPERAND (expr, 0));
+	  {
+	tree op0 = TREE_OPERAND (expr, 0);
+	if ((TREE_CODE (fromtype) == POINTER_TYPE
+		 && TREE_CODE (totype) == INTEGER_TYPE)
+		|| 

Re: [PING^2] Re: Fix 'hash_table::expand' to destruct stale Value objects

2021-09-17 Thread Thomas Schwinge
Hi!

On 2021-09-17T15:03:18+0200, Richard Biener  wrote:
> On Fri, Sep 17, 2021 at 2:39 PM Jonathan Wakely  wrote:
>> On Fri, 17 Sept 2021 at 13:08, Richard Biener
>>  wrote:
>> > On Fri, Sep 17, 2021 at 1:17 PM Thomas Schwinge  
>> > wrote:
>> > > On 2021-09-10T10:00:25+0200, I wrote:
>> > > > On 2021-09-01T19:31:19-0600, Martin Sebor via Gcc-patches 
>> > > >  wrote:
>> > > >> On 8/30/21 4:46 AM, Thomas Schwinge wrote:
>> > > >>> Ping -- we still need to plug the memory leak; see patch attached, 
>> > > >>> [...]

>> > > > Ping for formal approval (and review for using proper
>> > > > C++ terminology in the 'gcc/hash-table.h:hash_table::expand' source 
>> > > > code
>> > > > comment that I'm adding).  Patch again attached, for easy reference.

>> > I'm happy when a C++ literate approves the main change which I quote as
>> >
>> >   new ((void*) q) value_type (std::move (x));
>> > +
>> > + /* Manually invoke destructor of original object, to 
>> > counterbalance
>> > +object constructed via placement new.  */
>> > + x.~value_type ();
>> >
>> > but I had the impression that std::move already "moves away" from the 
>> > source?
>>
>> It just casts the argument to an rvalue reference, which allows the
>> value_type constructor to steal its guts.
>>
>> > That said, the dance above looks iffy, there must be a nicer way to "move"
>> > an object in C++?
>>
>> The code above is doing two things: transfer the resources from x to a
>> new object at location *q, and then destroy x.
>>
>> The first part (moving its resources) has nothing to do with
>> destruction. An object still needs to be destroyed, even if its guts
>> have been moved to another object.
>>
>> The second part is destroying the object, to end its lifetime. You
>> wouldn't usually call a destructor explicitly, because it would be
>> done automatically at the end of scope for objects on the stack, or
>> done by delete when you free obejcts on the heap. This is a special
>> case where the object's lifetime is manually managed in storage that
>> is manually managed.

ACK, and happy that I understood this correctly.

And, thanks for providing some proper C++-esque wording, which I
summarized to replace my original source code comment.

>> > What happens if the dtor is deleted btw?
>>
>> If the destructor is deleted you have created an unusable type that
>> cannot be stored in containers. It can only be created using new, and
>> then never destroyed. If you play stupid games, you win stupid prizes.
>> Don't do that.

Haha!  ;-)

And, by the way, as I understood this: if the destructor is "trivial"
(which includes POD types, for example), the explicit destructor call
here is a no-op.

>> I haven't read the rest of the patch, but the snippet above looks fine.
>
> OK, thanks for clarifying.
>
> The patch is OK then.

Thanks, pushed to master branch
commit 89be17a1b231ade643f28fbe616d53377e069da8
"Fix 'hash_table::expand' to destruct stale Value objects".

Should this be backported to release branches, after a while?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 89be17a1b231ade643f28fbe616d53377e069da8 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 13 Aug 2021 18:03:38 +0200
Subject: [PATCH] Fix 'hash_table::expand' to destruct stale Value objects

Thus plugging potentional memory leaks if these have non-trivial
constructor/destructor.

See

and others.

As one example, compilation of 'g++.dg/warn/Wmismatched-tags.C' per
'valgrind --leak-check=full' improves as follows:

 [...]
-
-104 bytes in 1 blocks are definitely lost in loss record 399 of 519
-   at 0x483DFAF: realloc (vg_replace_malloc.c:836)
-   by 0x223B62C: xrealloc (xmalloc.c:179)
-   by 0xA8D848: void va_heap::reserve(vec*&, unsigned int, bool) (vec.h:290)
-   by 0xA8B373: vec::reserve(unsigned int, bool) (vec.h:1858)
-   by 0xA8B277: vec::safe_push(class_decl_loc_t::class_key_loc_t const&) (vec.h:1967)
-   by 0xA57481: class_decl_loc_t::add_or_diag_mismatched_tag(tree_node*, tag_types, bool, bool) (parser.c:32967)
-   by 0xA573E1: class_decl_loc_t::add(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32941)
-   by 0xA56C52: cp_parser_check_class_key(cp_parser*, unsigned int, tag_types, tree_node*, bool, bool) (parser.c:32819)
-   by 0xA3AD12: cp_parser_elaborated_type_specifier(cp_parser*, bool, bool) (parser.c:20227)
-   by 0xA37EF2: cp_parser_type_specifier(cp_parser*, int, cp_decl_specifier_seq*, bool, int*, bool*) (parser.c:18942)
-   by 0xA31CDD: cp_parser_decl_specifier_seq(cp_parser*, int, cp_decl_specifier_seq*, int*) 

[PATCH] nvptx: Adds uses of -misa=sm_75 and -misa=sm_80

2021-09-17 Thread Roger Sayle

This patch adds upon my previous patch to prototype HFmode support on
nvptx, which includes adding new target macros TARGET_SM75 and TARGET_SM80.
Tobias Burnus has questioned "whether it makes sense to add those
flags if no use is made of those flags".  I had hoped that it might
be possible to split these patch submissions into smaller parts to
assist the review process, but failing that, here's part 2, that
adds support for __builtin_tanhf, HFmode exp2/tanh and also
for HFmode min/max, controlled by TARGET_SM75 and TARGET_SM80 respectively.

The following has been tested on nvptx-none, hosted on x86_64-pc-linux-gnu
(on top of my previous patch) with a "make" and "make -k check" with no
new failures.  Please ignore the hunks in the git diff that were described
in the previous patch (hopefully I'll be able to resume submitting
patches sequentially in future).  Are both parts Ok for mainline?


2020-09-17  Roger Sayle  

gcc/ChangeLog
* config/nvptx/nvptx.md (define_c_enum "unspec"): New UNSPEC_TANH.
(define_mode_iterator HSFM): New iterator for HFmode and SFmode.
(exp2hf2): New define_insn controlled by TARGET_SM75.
(tanh2): New define_insn controlled by TARGET_SM75.
(sminhf3, smaxhf3): New define_isnns controlled by TARGET_SM80.

gcc/testsuite/ChangeLog
* gcc.target/nvptx/float16-2.c: New test case.
* gcc.target/nvptx/tanh-1.c: New test case.

Roger
--


-Original Message-
From: Tobias Burnus  
Sent: 17 September 2021 09:25
To: Roger Sayle ; 'GCC Patches'
; Tom de Vries 
Subject: Re: [PATCH] nvptx: Add (experimental) support for HFmode with
-misa=sm_53

Hi Roger,

some more generic remarks not specific to using new ISA features.

On 17.09.21 00:53, Roger Sayle wrote:

> Whilst there I also added -misa=sm_75 and -misa=sm_80 which are points 
> where other useful instructions were added to the ISA.

First, my impression was that already sm_70 added lots of useful stuff, but
granted sm_75 adds some more. In any case, the question is whether it makes
sense to add those flags if no use is made of those flags.

In particular, sm_80 is according to the following webpage only supported
with PTX ISA 7.0 of CUDA 11.0. But GCC currently only supports
-mptx=3.6 (default) and -mptx=6.3 (= CUDA 10).
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-no
tes

Note that you missed to update gcc/config/nvptx/t-omp-device for the new
sm_*  and likewise the "-misa=@var{ISA-string}" section in
gcc/gcc/doc/invoke.texi.

Additionally, I wonder whether the preprocessor macros __nvptx__,
__nvptx_softstack__, __nvptx_unisimt__ and __PTX_SM__  should be documented
somewhere as well. As all but one are related to command-line options, I
wonder whether the respective section in invoke.texi would be a good place
for them.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
Registergericht München, HRB 106955
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 108de1c..1d0a197 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -26,6 +26,7 @@
UNSPEC_EXP2
UNSPEC_SIN
UNSPEC_COS
+   UNSPEC_TANH
 
UNSPEC_FPINT_FLOOR
UNSPEC_FPINT_BTRUNC
@@ -196,6 +197,7 @@
 (define_mode_iterator QHIM [QI HI])
 (define_mode_iterator QHSIM [QI HI SI])
 (define_mode_iterator SDFM [SF DF])
+(define_mode_iterator HSFM [HF SF])
 (define_mode_iterator SDCM [SC DC])
 (define_mode_iterator BITS [SI SF])
 (define_mode_iterator BITD [DI DF])
@@ -273,6 +275,48 @@
 }
   [(set_attr "subregs_ok" "true")])
 
+(define_insn "*movhf_insn"
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=R,R,m")
+   (match_operand:HF 1 "nonimmediate_operand" "R,m,R"))]
+  "!MEM_P (operands[0]) || REG_P (operands[1])"
+  "@
+   %.\\tmov.b16\\t%0, %1;
+   %.\\tld.b16\\t%0, %1;
+   %.\\tst.b16\\t%0, %1;")
+
+(define_expand "movhf"
+  [(set (match_operand:HF 0 "nonimmediate_operand" "")
+   (match_operand:HF 1 "nonimmediate_operand" ""))]
+  ""
+{
+  /* Load HFmode constants as SFmode with an explicit FLOAT_TRUNCATE.  */
+  if (CONST_DOUBLE_P (operands[1]))
+{
+  rtx tmp1 = gen_reg_rtx (SFmode);
+  REAL_VALUE_TYPE d = *CONST_DOUBLE_REAL_VALUE (operands[1]);
+  real_convert (, SFmode, );
+  emit_move_insn (tmp1, const_double_from_real_value (d, SFmode));
+
+  if (!REG_P (operands[0]))
+   {
+ rtx tmp2 = gen_reg_rtx (HFmode);
+ emit_insn (gen_truncsfhf2 (tmp2, tmp1));
+ emit_move_insn (operands[0], tmp2);
+   }
+  else
+emit_insn (gen_truncsfhf2 (operands[0], tmp1));
+  DONE;
+}
+ 
+  if (MEM_P (operands[0]) && !REG_P (operands[1]))
+{
+  rtx tmp = gen_reg_rtx (HFmode);
+  emit_move_insn (tmp, operands[1]);
+  emit_move_insn (operands[0], tmp);
+  DONE;
+}

[PATCH 2/3][vect] Consider outside costs earlier for epilogue loops

2021-09-17 Thread Andre Vieira (lists) via Gcc-patches

Hi,

This patch changes the order in which we check outside and inside costs 
for epilogue loops, this is to ensure that a predicated epilogue is more 
likely to be picked over an unpredicated one, since it saves having to 
enter a scalar epilogue loop.


gcc/ChangeLog:

    * tree-vect-loop.c (vect_better_loop_vinfo_p): Change how 
epilogue loop costs are compared.
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 
14f8150d7c262b9422784e0e997ca4387664a20a..038af13a91d43c9f09186d042cf415020ea73a38
 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2881,17 +2881,75 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo,
return new_simdlen_p;
 }
 
+  loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo);
+  if (main_loop)
+{
+  poly_uint64 main_poly_vf = LOOP_VINFO_VECT_FACTOR (main_loop);
+  unsigned HOST_WIDE_INT main_vf;
+  unsigned HOST_WIDE_INT old_factor, new_factor, old_cost, new_cost;
+  /* If we can determine how many iterations are left for the epilogue
+loop, that is if both the main loop's vectorization factor and number
+of iterations are constant, then we use them to calculate the cost of
+the epilogue loop together with a 'likely value' for the epilogues
+vectorization factor.  Otherwise we use the main loop's vectorization
+factor and the maximum poly value for the epilogue's.  If the target
+has not provided with a sensible upper bound poly vectorization
+factors are likely to be favored over constant ones.  */
+  if (main_poly_vf.is_constant (_vf)
+ && LOOP_VINFO_NITERS_KNOWN_P (main_loop))
+   {
+ unsigned HOST_WIDE_INT niters
+   = LOOP_VINFO_INT_NITERS (main_loop) % main_vf;
+ HOST_WIDE_INT old_likely_vf
+   = estimated_poly_value (old_vf, POLY_VALUE_LIKELY);
+ HOST_WIDE_INT new_likely_vf
+   = estimated_poly_value (new_vf, POLY_VALUE_LIKELY);
+
+ /* If the epilogue is using partial vectors we account for the
+partial iteration here too.  */
+ old_factor = niters / old_likely_vf;
+ if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo)
+ && niters % old_likely_vf != 0)
+   old_factor++;
+
+ new_factor = niters / new_likely_vf;
+ if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo)
+ && niters % new_likely_vf != 0)
+   new_factor++;
+   }
+  else
+   {
+ unsigned HOST_WIDE_INT main_vf_max
+   = estimated_poly_value (main_poly_vf, POLY_VALUE_MAX);
+
+ old_factor = main_vf_max / estimated_poly_value (old_vf,
+  POLY_VALUE_MAX);
+ new_factor = main_vf_max / estimated_poly_value (new_vf,
+  POLY_VALUE_MAX);
+
+ /* If the loop is not using partial vectors then it will iterate one
+time less than one that does.  It is safe to subtract one here,
+because the main loop's vf is always at least 2x bigger than that
+of an epilogue.  */
+ if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo))
+   old_factor -= 1;
+ if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo))
+   new_factor -= 1;
+   }
+
+  /* Compute the costs by multiplying the inside costs with the factor and
+add the outside costs for a more complete picture.  The factor is the
+amount of times we are expecting to iterate this epilogue.  */
+  old_cost = old_loop_vinfo->vec_inside_cost * old_factor;
+  new_cost = new_loop_vinfo->vec_inside_cost * new_factor;
+  old_cost += old_loop_vinfo->vec_outside_cost;
+  new_cost += new_loop_vinfo->vec_outside_cost;
+  return new_cost < old_cost;
+}
+
   /* Limit the VFs to what is likely to be the maximum number of iterations,
  to handle cases in which at least one loop_vinfo is fully-masked.  */
-  HOST_WIDE_INT estimated_max_niter;
-  loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo);
-  unsigned HOST_WIDE_INT main_vf;
-  if (main_loop
-  && LOOP_VINFO_NITERS_KNOWN_P (main_loop)
-  && LOOP_VINFO_VECT_FACTOR (main_loop).is_constant (_vf))
-estimated_max_niter = LOOP_VINFO_INT_NITERS (main_loop) % main_vf;
-  else
-estimated_max_niter = likely_max_stmt_executions_int (loop);
+  HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop);
   if (estimated_max_niter != -1)
 {
   if (known_le (estimated_max_niter, new_vf))


[PATCH 1/3][vect] Add main vectorized loop unrolling

2021-09-17 Thread Andre Vieira (lists) via Gcc-patches

Hi all,

This patch adds the ability to define a target hook to unroll the main 
vectorized loop. It also introduces --param's vect-unroll and 
vect-unroll-reductions to control this through a command-line. I found 
this useful to experiment and believe can help when tuning, so I decided 
to leave it in.
We only unroll the main loop and have disabled unrolling epilogues for 
now. We also do not support unrolling of any loop that has a negative 
step and we do not support unrolling a loop with any reduction other 
than a TREE_CODE_REDUCTION.


Bootstrapped and regression tested on aarch64-linux-gnu as part of the 
series.


gcc/ChangeLog:

    * doc/tm.texi: Document TARGET_VECTORIZE_UNROLL_FACTOR
    and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
    * doc/tm.texi.in: Add entries for target hooks above.
    * params.opt: Add vect-unroll and vect-unroll-reductions 
parameters.

    * target.def: Define hooks TARGET_VECTORIZE_UNROLL_FACTOR
    and TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL.
    * targhooks.c (default_add_stmt_cost_for_unroll): New.
    (default_unroll_factor): Likewise.
    * targhooks.h (default_add_stmt_cost_for_unroll): Likewise.
    (default_unroll_factor): Likewise.
    * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
    par_unrolling_factor.
    (vect_update_vf_for_slp): Use unrolling factor to update 
vectorization

    factor.
    (vect_determine_partial_vectors_and_peeling): Account for 
unrolling.
    (vect_determine_unroll_factor): Determine how much to unroll 
vectorized

    main loop.
    (vect_analyze_loop_2): Call vect_determine_unroll_factor.
    (vect_analyze_loop): Allow for epilogue vectorization when 
unrolling

    and rewalk vector_mode array for the epilogues.
    (vectorizable_reduction): Disable single_defuse_cycle when 
unrolling.
    * tree-vectorizer.h (vect_unroll_value): Declare 
par_unrolling_factor

    as a member of loop_vec_info.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 
f68f42638a112bed8396fd634bd3fd3c44ce848a..3bc9694d2162055d3db165ef888f35deb676548b
 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6283,6 +6283,19 @@ allocated by TARGET_VECTORIZE_INIT_COST.  The default 
releases the
 accumulator.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL (class 
vec_info *@var{vinfo}, class _stmt_vec_info *@var{stmt_info}, void *@var{data})
+This hook should update the target-specific @var{data} relative
+relative to the statement represented by @var{stmt_vinfo} to be used
+later to determine the unrolling factor for this loop using the current
+vectorization factor.
+@end deftypefn
+
+@deftypefn {Target Hook} unsigned TARGET_VECTORIZE_UNROLL_FACTOR (class 
vec_info *@var{vinfo}, void *@var{data})
+This hook should return the desired vector unrolling factor for a loop with
+@var{vinfo} based on the target-specific @var{data}. The default returns one,
+which means no unrolling will be performed.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_GATHER (const_tree 
@var{mem_vectype}, const_tree @var{index_type}, int @var{scale})
 Target builtin that implements vector gather operation.  @var{mem_vectype}
 is the vector type of the load and @var{index_type} is scalar type of
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 
fdf16b901c537e6a02f630a80a2213d2dcb6d5d6..40f4cb02c34f575439f35070301855ddaf82a21a
 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4195,6 +4195,10 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_VECTORIZE_DESTROY_COST_DATA
 
+@hook TARGET_VECTORIZE_ADD_STMT_COST_FOR_UNROLL
+
+@hook TARGET_VECTORIZE_UNROLL_FACTOR
+
 @hook TARGET_VECTORIZE_BUILTIN_GATHER
 
 @hook TARGET_VECTORIZE_BUILTIN_SCATTER
diff --git a/gcc/params.opt b/gcc/params.opt
index 
f414dc1a61cfa9d5b9ded75e96560fc1f73041a5..00f92d4484797df0dbbad052f45205469cbb2c49
 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1117,4 +1117,12 @@ Controls how loop vectorizer uses partial vectors.  0 
means never, 1 means only
 Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) 
IntegerRange(1, 1) Param Optimization
 The maximum factor which the loop vectorizer applies to the cost of statements 
in an inner loop relative to the loop being vectorized.
 
+-param=vect-unroll=
+Common Joined UInteger Var(param_vect_unroll) Init(0) IntegerRange(0, 32) 
Param Optimization
+Controls how many times the vectorizer tries to unroll loops.  Also see 
vect-unroll-reductions.
+
+-param=vect-unroll-reductions=
+Common Joined UInteger Var(param_vect_unroll_reductions) Init(0) 
IntegerRange(0, 32) Param Optimization
+Controls how many times the vectorizer tries to unroll loops that contain 
associative reductions.  0 means that such loops should be unrolled vect-unroll 
times.
+
 ; This comment is to ensure we retain the blank 

[PATCH 0/3][vect] Enable vector unrolling of main loop

2021-09-17 Thread Andre Vieira (lists) via Gcc-patches

Hi all,

This patch series enables unrolling of an unpredicated main vectorized 
loop based on a target hook. The epilogue loop will have (at least) half 
the VF of the main loop and can be predicated.


Andre Vieira (3):
[vect] Add main vectorized loop unrolling
[vect] Consider outside costs earlier for epilogue loops
[AArch64] Implement vect_unroll backend hook



[PATCH] darwin: support aarch64-darwin host

2021-09-17 Thread Yuta Saito via Gcc-patches
Hi,

Currently, building gcc for aarch64-darwin host fails due to missing
host_hooks definition.

This patch adds host_hooks definition for aarch64-darwin.
aarch64-darwin is not supported as a target yet, but this allows using
gcc cross-compiler on aarch64-darwin.

I confirmed linking gcc-cross succeed on aarch64 darwin.

gcc/ChangeLog:

* config.host: Add aarch64-darwin host support.
* config/aarch64/host-aarch64-darwin.c: New file.
* config/aarch64/x-darwin: Ditto.

Signed-off-by: Yuta Saito 
---
 gcc/config.host  |  4 +++
 gcc/config/aarch64/host-aarch64-darwin.c | 32 
 gcc/config/aarch64/x-darwin  |  3 +++
 3 files changed, 39 insertions(+)
 create mode 100644 gcc/config/aarch64/host-aarch64-darwin.c
 create mode 100644 gcc/config/aarch64/x-darwin

diff --git a/gcc/config.host b/gcc/config.host
index 0a02c33cc80..f419ee7c94c 100644
--- a/gcc/config.host
+++ b/gcc/config.host
@@ -263,6 +263,10 @@ case ${host} in
 out_host_hook_obj="${out_host_hook_obj} host-ppc64-darwin.o"
 host_xmake_file="${host_xmake_file} rs6000/x-darwin64"
 ;;
+  aarch64-*-darwin*)
+out_host_hook_obj="${out_host_hook_obj} host-aarch64-darwin.o"
+host_xmake_file="${host_xmake_file} aarch64/x-darwin"
+;;
   rs6000-ibm-aix* | powerpc-ibm-aix*)
 host_xmake_file="${host_xmake_file} rs6000/x-aix"
 ;;
diff --git a/gcc/config/aarch64/host-aarch64-darwin.c b/gcc/config/aarch64/host-aarch64-darwin.c
new file mode 100644
index 000..388a8ebcc49
--- /dev/null
+++ b/gcc/config/aarch64/host-aarch64-darwin.c
@@ -0,0 +1,32 @@
+/* aarch64-darwin host-specific hook definitions.
+   Copyright (C) 2006-2021 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "hosthooks.h"
+#include "hosthooks-def.h"
+#include "config/host-darwin.h"
+
+/* Darwin doesn't do anything special for aarch64 hosts; this file exists just
+   to include config/host-darwin.h.  */
+
+const struct host_hooks host_hooks = HOST_HOOKS_INITIALIZER;
diff --git a/gcc/config/aarch64/x-darwin b/gcc/config/aarch64/x-darwin
new file mode 100644
index 000..6d788d5e89c
--- /dev/null
+++ b/gcc/config/aarch64/x-darwin
@@ -0,0 +1,3 @@
+host-aarch64-darwin.o : $(srcdir)/config/aarch64/host-aarch64-darwin.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)


Re: [PATCH 04/18] rs6000: Handle some recent MMA builtin changes

2021-09-17 Thread Bill Schmidt via Gcc-patches
Thanks!  I'll remove the elses in the committed patch, along with a TODO 
comment for the additional factoring opportunity for when I get to that 
stage.


Thanks for all the reviews!
Bill

On 9/16/21 6:38 PM, Segher Boessenkool wrote:

Hi!

On Wed, Sep 01, 2021 at 11:13:40AM -0500, Bill Schmidt wrote:

Peter Bergner recently added two new builtins __builtin_vsx_lxvp and
__builtin_vsx_stxvp.  These happened to break a pattern in MMA builtins that
I had been using to automate gimple folding of MMA builtins.  Previously,
every MMA function that could be folded had an associated internal function
that it was folded into.  The LXVP/STXVP builtins are just folded directly
into memory operations.

Instead of relying on this pattern, this patch adds a new attribute to
builtins called "mmaint," which is set for all MMA builtins that have an
associated internal builtin.  The naming convention that adds _INTERNAL to
the builtin index name remains.

The rest of the patch is just duplicating Peter's patch, using the new
builtin infrastructure.
* config/rs6000/rs6000-call.c
(rs6000_gimple_fold_new_mma_builtin): Handle RS6000_BIF_LXVP and
RS6000_BIF_STXVP.

It is fine to end a changelog line in a colon.


+  else if (fncode == RS6000_BIF_LXVP)
+{
+  push_gimplify_context (true);
+  tree offset = gimple_call_arg (stmt, 0);
+  tree ptr = gimple_call_arg (stmt, 1);
+  tree lhs = gimple_call_lhs (stmt);
+  if (TREE_TYPE (TREE_TYPE (ptr)) != vector_pair_type_node)
+   ptr = build1 (VIEW_CONVERT_EXPR,
+ build_pointer_type (vector_pair_type_node), ptr);
+  tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR,
+  TREE_TYPE (ptr), ptr, offset));
+  gimplify_assign (lhs, mem, _seq);
+  pop_gimplify_context (NULL);
+  gsi_replace_with_seq (gsi, new_seq, true);
+  return true;
+}

Fwiw, all those cases return, so those "else" are not needed.  Also it
would be nice if this could be factored a bit better, hrm.

Is that "if" in there useful?  Maybe add a helper function for it, then?

Anyway: okay for trunk.  Thanks!


Segher




Re: [PATCH v2] C++: add type checking for static local vector variable in template

2021-09-17 Thread wangpc via Gcc-patches
I have tested this patch on AArch64 and RISCV by running testsuites, the 
diagnostic message seems to be right.


While one thing that should be noted is that error message will be 
reported twice as below:


static-template.cpp: In instantiation of 'void f1() [with int a = 2]':
static-template.cpp:29:11:   required from here
static-template.cpp:11:24: error: RVV type 'vuint16m1_t' does not have a fixed 
size
   11 | static vuint16m1_t v = vmv_v_x_u16m1(a, gvl);
  |^
static-template.cpp:11:24: error: RVV type 'vuint16m1_t' does not have a fixed 
size

I haven't figured it out, or is it a normal behavior?

On 2021/9/17 21:34, Jason Merrill wrote:

On 9/17/21 03:58, wangpc wrote:

This patch moves verify_type_context from start_decl_1 to cp_finish_decl
and adds type checking for static local vector variable in C++ template.


How have you tested this patch?
https://gcc.gnu.org/contribute.html#testing


2021-08-06  wangpc 

gcc/cp/ChangeLog

 * decl.c (start_decl_1): Remove verify_type_context.
    (cp_finish_decl): Add more type checking.

gcc/testsuite/ChangeLog

 * g++.target/aarch64/sve/static-var-in-template.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90111e4c786..d411963896a 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -5491,13 +5491,6 @@ start_decl_1 (tree decl, bool initialized)
    cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
  }
  -  if (is_global_var (decl))
-    {
-  type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
-   ? TCTX_THREAD_STORAGE
-   : TCTX_STATIC_STORAGE);
-  verify_type_context (input_location, context, TREE_TYPE (decl));
-    }
    if (initialized)
  /* Is it valid for this decl to have an initializer at all?  */
  {
@@ -7520,6 +7513,22 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,

    && DECL_INITIALIZED_IN_CLASS_P (decl))
  check_static_variable_definition (decl, type);
  +  if (!processing_template_decl && VAR_P (decl))
+    {
+  if (is_global_var (decl))
+    {
+  type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
+  ? TCTX_THREAD_STORAGE
+  : TCTX_STATIC_STORAGE);
+  verify_type_context (input_location, context, TREE_TYPE (decl));
+    }
+
+  if (DECL_FUNCTION_SCOPE_P (decl)
+  && TREE_STATIC (decl))
+    verify_type_context (DECL_SOURCE_LOCATION (decl),
+ TCTX_STATIC_STORAGE, type);


This is redundant; is_global_var is true for a local static. Which 
makes the name confusing, but that's the intended behavior.



+    }
+
    if (init && TREE_CODE (decl) == FUNCTION_DECL)
  {
    tree clone;
diff --git 
a/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C 
b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C

new file mode 100644
index 000..c2395d18d50
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+#include 
+
+template 
+void f()
+{
+    static svbool_t pg = svwhilelt_b64(0, N);
+}
+
+int main(int argc, char **argv)
+{
+    f<2>();
+    return 0;
+}
+
+/* { dg-error {SVE type 'svbool_t' does not have a fixed size} } */



Re: [PATCH] [og10] OpenACC: Shared memory layout optimisation

2021-09-17 Thread Thomas Schwinge
Hi Julian!

Three quick questions here:

On 2020-06-29T13:16:52-0700, Julian Brown  wrote:
> This patch implements an algorithm to lay out local data-share (LDS) space.  
> It currently works for AMD GCN.  [...]

Thanks!


> @@ -1449,8 +1634,120 @@ oacc_do_neutering (void)

> +  addr_range ar
> + = first_fit_range (conflicts, size, align, _shm_bounds);
> +
> +  splay_tree_delete (conflicts);
> +
> +  if (ar.invalid ())
> + {
> +   unsigned HOST_WIDE_INT base;
> +   base = bounds_lo + random () % 512;
> +   base = (base + align - 1) & ~(align - 1);
> +   if (base + size > bounds_hi)
> + error_at (UNKNOWN_LOCATION, "shared-memory region overflow");

My dice doesn't have 512 faces -- what am I to read into the expression
'random () % 512' here?  (Surely this must offend the folks of
.)  ;-)

> +   auto base_inrng = std::make_pair (base, false);
> +   blk_offset_map.put (BASIC_BLOCK_FOR_FN (cfun, blkno), base_inrng);
> + }
> +  else
> + {
> +   splay_tree_node old = splay_tree_lookup (used_ranges[blkno],
> +(splay_tree_key) );
> +   if (old)
> + {
> +   fprintf (stderr, "trying to map [%d..%d] but [%d..%d] is "
> +"already mapped in block %d\n", (int) ar.lo,
> +(int) ar.hi, (int) ((addr_range *) old->key)->lo,
> +(int) ((addr_range *) old->key)->hi, blkno);
> +   abort ();
> + }

Here, in a moment of inattention, I wondered if accidentally I had slid
into libgomp code; 'libgomp/target.c:gomp_map_vars_existing' has a very
similar "is already mapped" 'gomp_fatal'.  ;-) Should the whole
'if (old) { [...] }' just turn into 'gcc_asert (!old);' or should we
preserve the 'fprintf' in some way?


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c

> +  #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret)
> +  {
> +int w = 0;
> +LOCALS2(h);
> +
> +#pragma acc loop worker reduction(+:w)
> +for (int i = 0; i < 32; i++)
> +  {
> + int u = USES2(h,+);
> + w += u;
> +  }
> +
> +printf ("w=%d\n", w);
> +/* { dg-output "w=2048(\n|\r\n|\r)" } */

Is there a reason for device-side 'printf' plus 'dg-output' matching
instead of 'assert (w == 2048);' or 'if (w != 2048) __builtin_abort ();'
(still device-side)?  (Same for the following instances.)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH v3] ipa-inline: Add target info into fn summary [PR102059]

2021-09-17 Thread Segher Boessenkool
On Fri, Sep 17, 2021 at 05:42:38PM +0800, Kewen.Lin wrote:
> Against v2 [2], this v3 addressed Martin's review comments:
>   - Replace HWI auto_vec with unsigned int for target_info
> to avoid overkill (also Segher's comments), adjust some
> places need to be updated for this change.

I'd have used a single HWI (always 64 bits), but an int (always at least
32 bits in GCC) is fine as well, sure.

>   * config/rs6000/rs6000-internal.h
>   (rs6000_fn_has_any_of_these_mask_bits): New declare.

You can break that after the ":"...  Just :-)

>   * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
>   hook.

That easily fits one line.  (Many more examples here btw).

> +bool
> +rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
> +   HOST_WIDE_INT mask)
> +{
> +  gcc_assert (code < RS6000_BUILTIN_COUNT);

We don't have this assert anywhere else, so lose it here as well?

If we want such checking we should make an inline accessor function for
this, and check it there.  But we already do a check in
rs6000_builtin_decl (and one in def_builtin, but that one has an
off-by-one error in it).

> +extern bool rs6000_fn_has_any_of_these_mask_bits (enum rs6000_builtins code,
> +   HOST_WIDE_INT mask);

The huge unwieldy name suggests it might not be the best abstraction you
could use, btw ;-)

> +static bool
> +rs6000_update_ipa_fn_target_info (unsigned int , const gimple *stmt)
> +{
> +  /* Assume inline asm can use any instruction features.  */
> +  if (gimple_code (stmt) == GIMPLE_ASM)

This should be fine for HTM, but it may be a bit *too* pessimistic for
other features.  We'll see when we get there :-)

> +@deftypefn {Target Hook} bool TARGET_NEED_IPA_FN_TARGET_INFO (const_tree 
> @var{decl}, unsigned int& @var{info})
> +Allow target to check early whether it is necessary to analyze all gimple
> +statements in the given function to update target specific information for
> +inlining.  See hook @code{update_ipa_fn_target_info} for usage example of
[ ... ]
> +The default version of this hook returns false.

And that is really the only reason to have this premature optimisation:
targets that do not care do not have to pay the price, however trivial
that price may be, which is a good idea politically ;-)

> +/* { dg-final { scan-tree-dump-times "Inlining foo/\[0-9\]* " 1 "einline"} } 
> */

If you use {} instead of "" you don't need the backslashes.

> +default_update_ipa_fn_target_info (uint16_t &, const gimple *)

I'm surprised the compiler didn't warn about this btw.

The rs6000 parts are okay for trunk (with the trivial cleanups please).
Thanks!


Segher


[committed] Fortran: Prefer GCC internal macros to float.h in ISO_Fortran_binding.h (was: [PATCH, Fortran] Revert to non-multilib-specific ISO_Fortran_binding.h)

2021-09-17 Thread Tobias Burnus

On 17.09.21 08:03, Gerald Pfeifer wrote:


On Tue, 14 Sep 2021, Gerald Pfeifer wrote:

And, related, does the following make sense and fixes the issue?

--- a/libgfortran/ISO_Fortran_binding.h
+++ b/libgfortran/ISO_Fortran_binding.h
@@ -228,5 +228,5 @@ extern int CFI_setpointer (CFI_cdesc_t *, CFI_cdesc_t *,
const CFI_index_t []);

  /* This is the 80-bit encoding on x86; Fortran assigns it kind 10.  */
-#elif (LDBL_MANT_DIG == 64 \
+#elif ((LDBL_MANT_DIG == 64 || LDBL_MANT_DIG == 53) \
 && LDBL_MIN_EXP == -16381 \
 && LDBL_MAX_EXP == 16384)

Yes, with this patch (on top of current trunk) i586-freebsd-* is back
in bootstrap land. :)

Neither this (which fixes the bootstrap) nor Sandra's rewrite (which
does not, but seemed generally liked) has been committed.

I have now committed the attached patch as r12-3621. It includes the
patch by Sandra
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579372.html
(approved 3 days ago) plus adding the "== 53" similar to above.

Hopefully, we can now close this issue.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 654187d05376f08667c8ba88309073e0345431c2
Author: Tobias Burnus 
Date:   Fri Sep 17 15:43:30 2021 +0200

Fortran: Prefer GCC internal macros to float.h in ISO_Fortran_binding.h.

2021-09-17  Sandra Loosemore  
Tobias Burnus  

libgfortran/
* ISO_Fortran_binding.h: Only include float.h if the C compiler
doesn't have predefined __LDBL_* and __DBL_* macros. Handle
LDBL_MANT_DIG == 53 for FreeBSD.

diff --git a/libgfortran/ISO_Fortran_binding.h b/libgfortran/ISO_Fortran_binding.h
index 9c42464affa..50b02d27c9c 100644
--- a/libgfortran/ISO_Fortran_binding.h
+++ b/libgfortran/ISO_Fortran_binding.h
@@ -32,7 +32,6 @@ extern "C" {
 
 #include   /* Standard ptrdiff_t tand size_t. */
 #include   /* Integer types. */
-#include   /* Macros for floating-point type characteristics.  */
 
 /* Constants, defined as macros. */
 #define CFI_VERSION 1
@@ -217,40 +216,82 @@ extern int CFI_setpointer (CFI_cdesc_t *, CFI_cdesc_t *, const CFI_index_t []);
 #endif
 
 /* The situation with long double support is more complicated; we need to
-   examine the type in more detail to figure out its kind.  */
+   examine the type in more detail to figure out its kind.
+   GCC and some other compilers predefine the __LDBL* macros; otherwise
+   get the parameters we need from float.h.  */
+
+#if (defined (__LDBL_MANT_DIG__) \
+ && defined (__LDBL_MIN_EXP__) \
+ && defined (__LDBL_MAX_EXP__) \
+ && defined (__DBL_MANT_DIG__) \
+ && defined (__DBL_MIN_EXP__) \
+ && defined (__DBL_MAX_EXP__))
+#define __CFI_LDBL_MANT_DIG__ __LDBL_MANT_DIG__
+#define __CFI_LDBL_MIN_EXP__ __LDBL_MIN_EXP__
+#define __CFI_LDBL_MAX_EXP__ __LDBL_MAX_EXP__
+#define __CFI_DBL_MANT_DIG__ __DBL_MANT_DIG__
+#define __CFI_DBL_MIN_EXP__ __DBL_MIN_EXP__
+#define __CFI_DBL_MAX_EXP__ __DBL_MAX_EXP__
+
+#else
+#include 
+
+#if (defined (LDBL_MANT_DIG) \
+ && defined (LDBL_MIN_EXP) \
+ && defined (LDBL_MAX_EXP) \
+ && defined (DBL_MANT_DIG) \
+ && defined (DBL_MIN_EXP) \
+ && defined (DBL_MAX_EXP))
+#define __CFI_LDBL_MANT_DIG__ LDBL_MANT_DIG
+#define __CFI_LDBL_MIN_EXP__ LDBL_MIN_EXP
+#define __CFI_LDBL_MAX_EXP__ LDBL_MAX_EXP
+#define __CFI_DBL_MANT_DIG__ DBL_MANT_DIG
+#define __CFI_DBL_MIN_EXP__ DBL_MIN_EXP
+#define __CFI_DBL_MAX_EXP__ DBL_MAX_EXP
+
+#else
+#define CFI_no_long_double 1
+
+#endif  /* Definitions from float.h.  */
+#endif  /* Definitions from compiler builtins.  */
+
+/* Can't determine anything about long double support?  */
+#if (defined (CFI_no_long_double))
+#define CFI_type_long_double -2
+#define CFI_type_long_double_Complex -2
 
 /* Long double is the same kind as double.  */
-#if (LDBL_MANT_DIG == DBL_MANT_DIG \
- && LDBL_MIN_EXP == DBL_MIN_EXP \
- && LDBL_MAX_EXP == DBL_MAX_EXP)
+#elif (__CFI_LDBL_MANT_DIG__ == __CFI_DBL_MANT_DIG__ \
+ && __CFI_LDBL_MIN_EXP__ == __CFI_DBL_MIN_EXP__ \
+ && __CFI_LDBL_MAX_EXP__ == __CFI_DBL_MAX_EXP__)
 #define CFI_type_long_double CFI_type_double
 #define CFI_type_long_double_Complex CFI_type_double_Complex
 
 /* This is the 80-bit encoding on x86; Fortran assigns it kind 10.  */
-#elif (LDBL_MANT_DIG == 64 \
-   && LDBL_MIN_EXP == -16381 \
-   && LDBL_MAX_EXP == 16384)
+#elif ((__CFI_LDBL_MANT_DIG__ == 64 || __CFI_LDBL_MANT_DIG__ == 53) \
+   && __CFI_LDBL_MIN_EXP__ == -16381 \
+   && __CFI_LDBL_MAX_EXP__ == 16384)
 #define CFI_type_long_double (CFI_type_Real + (10 << CFI_type_kind_shift))
 #define CFI_type_long_double_Complex (CFI_type_Complex + (10 << CFI_type_kind_shift))
 
 /* This is the 96-bit encoding on m68k; Fortran assigns it kind 10.  */
-#elif (LDBL_MANT_DIG == 64 \

Re: [PATCH] tree-optimization/65206 - dependence analysis on mixed pointer/array

2021-09-17 Thread Richard Biener via Gcc-patches
On Fri, 17 Sep 2021, Richard Biener wrote:

> On Fri, 17 Sep 2021, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > This adds the capability to analyze the dependence of mixed
> > > pointer/array accesses.  The example is from where using a masked
> > > load/store creates the pointer-based access when an otherwise
> > > unconditional access is array based.  Other examples would include
> > > accesses to an array mixed with accesses from inlined helpers
> > > that work on pointers.
> > >
> > > The idea is quite simple and old - analyze the data-ref indices
> > > as if the reference was pointer-based.  The following change does
> > > this by changing dr_analyze_indices to work on the indices
> > > sub-structure and storing an alternate indices substructure in
> > > each data reference.  That alternate set of indices is analyzed
> > > lazily by initialize_data_dependence_relation when it fails to
> > > match-up the main set of indices of two data references.
> > > initialize_data_dependence_relation is refactored into a head
> > > and a tail worker and changed to work on one of the indices
> > > structures and thus away from using DR_* access macros which
> > > continue to reference the main indices substructure.
> > >
> > > There are quite some vectorization and loop distribution opportunities
> > > unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
> > > 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
> > > 544.nab_r see amendments in what they report with -fopt-info-loop while
> > > the rest of the specrate set sees no changes there.  Measuring runtime
> > > for the set where changes were reported reveals nothing off-noise
> > > besides 511.povray_r which seems to regress slightly for me
> > > (on a Zen2 machine with -Ofast -march=native).
> > >
> > > Changes from the [RFC] version are properly handling bitfields
> > > that we cannot take the address of and optimization of refs
> > > that already are MEM_REFs and thus won't see any change.  I've
> > > also elided changing the set of vect_masked_stores targets in
> > > favor of explicitely listing avx (but I did not verify if the
> > > testcase passes on aarch64-sve or amdgcn).
> > >
> > > The improves cases like the following from Povray:
> > >
> > >for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
> > >  {
> > > VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
> > > Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
> > >  }
> > >
> > > where there is a plain array access mixed with abstraction
> > > using T[] or T* arguments.  That should be a not too uncommon
> > > situation in the wild.  The loop above is now vectorized and was not
> > > without the change.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
> > > built and run SPEC CPU 2017 successfully.
> > >
> > > OK?
> > 
> > Took a while to page this stuff back in :-/
> > 
> > I guess if we're adding alt_indices to the main data_reference,
> > we'll need to free the access_fns in free_data_ref.  It looked like
> > the patch as posted might have a memory leak.
> 
> Doh, yes - thanks for noticing.
> 
> > Perhaps instead we could use local indices structures for the
> > alt_indices and pass them in to initialize_data_dependence_relation?
> > Not that that's very elegant…
> 
> Yeah, I had that but then since for N data references we possibly
> call initialize_data_dependence_relation N^2 times we'd do this
> alternate analysis N^2 times at worst instead of N so it looked worth
> caching it in the data reference.  Since we have no idea why the
> first try fails we're going to try this fallback in the majority
> of cases that we cannot figure out otherwise so I didn't manage
> to hand-wave the quadraticness away ;)  OTOH one might argue
> it's just a constant factor ontop of the number of
> initialize_data_dependence_relation invocations.
> 
> So I can probably be convinced either way - guess I'll gather
> some statistics.

I built SPEC 2017 CPU rate with -Ofast -march=znver2, overall there
are

 4433976 calls to the first stage initialize_data_dependence_relation
 (skipping the cases dr_may_alias returned false)
 360248 (8%) ended up recursing with a set of alt_indices
 83512   times we computed alt_indices of a DR (that's with the cache)
 14905 (0.3%) times the recursive invocation ended with != chrec_dont_know

thus when not doing the caching we'd compute alt_indices about 10 times
more often.  I didn't collect the number of distinct DRs (that's difficult
at this point), but I'd estimate from the above that we have 3 times
more "unused" alt_indices than used.

OK, so that didn't really help me avoid flipping a coin ;)

Richard.

> Richard.
> 
> > 
> > > Thanks,
> > > Richard.
> > >
> > > 2021-09-08  Richard Biener  
> > >
> > >   PR tree-optimization/65206
> > >   * tree-data-ref.h (struct data_reference): Add alt_indices,
> > >   order it last.
> > >   * 

Re: [PATCH v2] C++: add type checking for static local vector variable in template

2021-09-17 Thread Jason Merrill via Gcc-patches

On 9/17/21 03:58, wangpc wrote:

This patch moves verify_type_context from start_decl_1 to cp_finish_decl
and adds type checking for static local vector variable in C++ template.


How have you tested this patch?
https://gcc.gnu.org/contribute.html#testing


2021-08-06  wangpc  

gcc/cp/ChangeLog

 * decl.c (start_decl_1): Remove verify_type_context.
(cp_finish_decl): Add more type checking.

gcc/testsuite/ChangeLog

 * g++.target/aarch64/sve/static-var-in-template.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90111e4c786..d411963896a 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -5491,13 +5491,6 @@ start_decl_1 (tree decl, bool initialized)
cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
  }
  
-  if (is_global_var (decl))

-{
-  type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
-  ? TCTX_THREAD_STORAGE
-  : TCTX_STATIC_STORAGE);
-  verify_type_context (input_location, context, TREE_TYPE (decl));
-}
if (initialized)
  /* Is it valid for this decl to have an initializer at all?  */
  {
@@ -7520,6 +7513,22 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
&& DECL_INITIALIZED_IN_CLASS_P (decl))
  check_static_variable_definition (decl, type);
  
+  if (!processing_template_decl && VAR_P (decl))

+{
+  if (is_global_var (decl))
+   {
+ type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
+ ? TCTX_THREAD_STORAGE
+ : TCTX_STATIC_STORAGE);
+ verify_type_context (input_location, context, TREE_TYPE (decl));
+   }
+
+  if (DECL_FUNCTION_SCOPE_P (decl)
+ && TREE_STATIC (decl))
+   verify_type_context (DECL_SOURCE_LOCATION (decl),
+TCTX_STATIC_STORAGE, type);


This is redundant; is_global_var is true for a local static.  Which 
makes the name confusing, but that's the intended behavior.



+}
+
if (init && TREE_CODE (decl) == FUNCTION_DECL)
  {
tree clone;
diff --git a/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C 
b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
new file mode 100644
index 000..c2395d18d50
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+#include 
+
+template 
+void f()
+{
+static svbool_t pg = svwhilelt_b64(0, N);
+}
+
+int main(int argc, char **argv)
+{
+f<2>();
+return 0;
+}
+
+/* { dg-error {SVE type 'svbool_t' does not have a fixed size} } */





Re: [PING^2] Re: Fix 'hash_table::expand' to destruct stale Value objects

2021-09-17 Thread Richard Biener via Gcc-patches
On Fri, Sep 17, 2021 at 2:39 PM Jonathan Wakely  wrote:
>
> On Fri, 17 Sept 2021 at 13:08, Richard Biener
>  wrote:
> >
> > On Fri, Sep 17, 2021 at 1:17 PM Thomas Schwinge  
> > wrote:
> > >
> > > Hi!
> > >
> > > On 2021-09-10T10:00:25+0200, I wrote:
> > > > On 2021-09-01T19:31:19-0600, Martin Sebor via Gcc-patches 
> > > >  wrote:
> > > >> On 8/30/21 4:46 AM, Thomas Schwinge wrote:
> > > >>> Ping -- we still need to plug the memory leak; see patch attached, 
> > > >>> and/or
> > > >>> long discussion here:
> > > >>
> > > >> Thanks for answering my questions.  I have no concerns with going
> > > >> forward with the patch as is.
> > > >
> > > > Thanks, Martin.  Ping for formal approval (and review for using proper
> > > > C++ terminology in the 'gcc/hash-table.h:hash_table::expand' source code
> > > > comment that I'm adding).  Patch again attached, for easy reference.
> > >
> > > Ping, once again.
> >
> > I'm happy when a C++ literate approves the main change which I quote as
> >
> >   new ((void*) q) value_type (std::move (x));
> > +
> > + /* Manually invoke destructor of original object, to 
> > counterbalance
> > +object constructed via placement new.  */
> > + x.~value_type ();
> >
> > but I had the impression that std::move already "moves away" from the 
> > source?
>
> It just casts the argument to an rvalue reference, which allows the
> value_type constructor to steal its guts.
>
> > That said, the dance above looks iffy, there must be a nicer way to "move"
> > an object in C++?
>
> The code above is doing two things: transfer the resources from x to a
> new object at location *q, and then destroy x.
>
> The first part (moving its resources) has nothing to do with
> destruction. An object still needs to be destroyed, even if its guts
> have been moved to another object.
>
> The second part is destroying the object, to end its lifetime. You
> wouldn't usually call a destructor explicitly, because it would be
> done automatically at the end of scope for objects on the stack, or
> done by delete when you free obejcts on the heap. This is a special
> case where the object's lifetime is manually managed in storage that
> is manually managed.
>
> >
> > What happens if the dtor is deleted btw?
>
> If the destructor is deleted you have created an unusable type that
> cannot be stored in containers. It can only be created using new, and
> then never destroyed. If you play stupid games, you win stupid prizes.
> Don't do that.
>
> > Shouldn't you use sth
> > like a placement 'delete' instead of invoking a DTOR?
>
> No, there is no placement delete. This is exactly the right way to
> destroy an object in-place.
>
> I haven't read the rest of the patch, but the snippet above looks fine.

OK, thanks for clarifying.

The patch is OK then.

Thanks,
Richard.


[Patch] Fortran: Fix -Wno-missing-include-dirs handling [PR55534]

2021-09-17 Thread Tobias Burnus

Short version:
* -Wno-missing-include-dirs  had no effect as the warning was always on
* For CPP-only options like -idirafter, no warning was shown.

This patch tries to address both, i.e. cpp's include-dir diagnostics are
shown as well – and silencing the diagnostic works as well.

OK for mainline?

Tobias

PS:  BACKGROUND and LONG DESCRIPTION

C/C++ by default have disabled the -Wmissing-include-dirs warning.
Fortran by default has that warning enabled.

The value is actually stored at two places (cf. c-family/c.opt):
  Wmissing-include-dirs
  ... CPP(warn_missing_include_dirs) Var(cpp_warn_missing_include_dirs) Init(0)

For CPP, that value always needs to initialized – and it is used
in gcc/incpath.c as
  cpp_options *opts = cpp_get_options (pfile);
  if (opts->warn_missing_include_dirs && cur->user_supplied_p)
cpp_warning (pfile, CPP_W_MISSING_INCLUDE_DIRS, "%s: %s",

Additionally, there is cpp_warn_missing_include_dirs which is used by
Fortran – and which consists of
  global_options.x_cpp_warn_missing_include_dirs
  global_options_set._cpp_warn_missing_include_dirs

The flag processing happens as described in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55534#c11
in short:
  - First gfc_init_options is called
  - Then for reach command-line option gfc_handle_option
  - Finally gfc_post_options

Currently:
- gfc_init_options: Sets cpp_warn_missing_include_dirs
  (unconditionally as unset)
- gfc_handle_option: Always warns about the missing include dir
- before gfc_post_options: set_option is called, which sets
  cpp_warn_missing_include_dirs – but that's too late.

Additionally, as mentioned above – pfile's warn_missing_include_dirs
is never properly set.

 * * *

This patch fixes those issues:
* Now -Wno-missing-include-dirs does silence the warnings
* CPP now also properly does warn.

Example (initial version):
$ gfortran-trunk ../empty.f90 -c -cpp -idirafter /fdaf/ -I bar/ 
-Wmissing-include-dirs
f951: Warning: Nonexistent include directory ‘bar//’ [-Wmissing-include-dirs]
: Warning: /fdaf/: No such file or directory
: Warning: bar/: No such file or directory

In order to avoid the double output for -I, I disabled the Fortran output if
CPP is enabled. Additionally, I had to use the cpp_reason_option_codes to
print the flag in brackets.
Fixed/final output is:

: Warning: /fdaf/: No such file or directory [-Wmissing-include-dirs]
: Warning: bar/: No such file or directory [-Wmissing-include-dirs]

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix -Wno-missing-include-dirs handling [PR55534]

gcc/fortran/ChangeLog:

	PR fortran/55534
	* cpp.c: Define GCC_C_COMMON_C for #include "options.h" to make
	cpp_reason_option_codes available.
	(gfc_cpp_register_include_paths): Make static, set pfile's
	warn_missing_include_dirs and move before caller.
	(gfc_cpp_init_cb): New, cb code moved from ...
	(gfc_cpp_init_0): ... here.
	(gfc_cpp_post_options): Call gfc_cpp_init_cb.
	(cb_cpp_diagnostic_cpp_option): New. As implemented in c-family
	to match CppReason flags to -W... names.
	(cb_cpp_diagnostic): Use it to replace single special case.
	* cpp.h (gfc_cpp_register_include_paths): Remove as now static.
	* gfortran.h (gfc_check_include_dirs): New prototype.
	* options.c (gfc_init_options): Don't set -Wmissing-include-dirs.
	(gfc_post_options): Set it here after commandline processing.
	* scanner.c (gfc_do_check_include_dirs, gfc_check_include_dirs):
	New. Diagnostic moved from ...
	(add_path_to_list): ... here, which came before cmdline processing.
	* scanner.h (struct gfc_directorylist): Reorder for alignment issues,
	add new 'bool warn'.

 gcc/fortran/cpp.c  | 106 ++---
 gcc/fortran/cpp.h  |   2 -
 gcc/fortran/gfortran.h |   1 +
 gcc/fortran/options.c  |  14 +++
 gcc/fortran/scanner.c  |  58 +++
 gcc/fortran/scanner.h  |   2 +-
 6 files changed, 116 insertions(+), 67 deletions(-)

diff --git a/gcc/fortran/cpp.c b/gcc/fortran/cpp.c
index 83c4517acdb..3ff895455e9 100644
--- a/gcc/fortran/cpp.c
+++ b/gcc/fortran/cpp.c
@@ -19,11 +19,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
+
+#define GCC_C_COMMON_C
+#include "options.h"  /* For cpp_reason_option_codes. */
+#undef GCC_C_COMMON_C
+
 #include "target.h"
 #include "gfortran.h"
 #include "diagnostic.h"
 
-
 #include "toplev.h"
 
 #include "../../libcpp/internal.h"
@@ -240,6 +244,18 @@ gfc_cpp_temporary_file (void)
   return gfc_cpp_option.temporary_filename;
 }
 
+static void
+gfc_cpp_register_include_paths (void)
+{
+  int cxx_stdinc = 0;
+  cpp_get_options (cpp_in)->warn_missing_include_dirs
+= global_options.x_cpp_warn_missing_include_dirs;

Re: [PING^2] Re: Fix 'hash_table::expand' to destruct stale Value objects

2021-09-17 Thread Jonathan Wakely via Gcc-patches
On Fri, 17 Sept 2021 at 13:08, Richard Biener
 wrote:
>
> On Fri, Sep 17, 2021 at 1:17 PM Thomas Schwinge  
> wrote:
> >
> > Hi!
> >
> > On 2021-09-10T10:00:25+0200, I wrote:
> > > On 2021-09-01T19:31:19-0600, Martin Sebor via Gcc-patches 
> > >  wrote:
> > >> On 8/30/21 4:46 AM, Thomas Schwinge wrote:
> > >>> Ping -- we still need to plug the memory leak; see patch attached, 
> > >>> and/or
> > >>> long discussion here:
> > >>
> > >> Thanks for answering my questions.  I have no concerns with going
> > >> forward with the patch as is.
> > >
> > > Thanks, Martin.  Ping for formal approval (and review for using proper
> > > C++ terminology in the 'gcc/hash-table.h:hash_table::expand' source code
> > > comment that I'm adding).  Patch again attached, for easy reference.
> >
> > Ping, once again.
>
> I'm happy when a C++ literate approves the main change which I quote as
>
>   new ((void*) q) value_type (std::move (x));
> +
> + /* Manually invoke destructor of original object, to counterbalance
> +object constructed via placement new.  */
> + x.~value_type ();
>
> but I had the impression that std::move already "moves away" from the source?

It just casts the argument to an rvalue reference, which allows the
value_type constructor to steal its guts.

> That said, the dance above looks iffy, there must be a nicer way to "move"
> an object in C++?

The code above is doing two things: transfer the resources from x to a
new object at location *q, and then destroy x.

The first part (moving its resources) has nothing to do with
destruction. An object still needs to be destroyed, even if its guts
have been moved to another object.

The second part is destroying the object, to end its lifetime. You
wouldn't usually call a destructor explicitly, because it would be
done automatically at the end of scope for objects on the stack, or
done by delete when you free obejcts on the heap. This is a special
case where the object's lifetime is manually managed in storage that
is manually managed.

>
> What happens if the dtor is deleted btw?

If the destructor is deleted you have created an unusable type that
cannot be stored in containers. It can only be created using new, and
then never destroyed. If you play stupid games, you win stupid prizes.
Don't do that.

> Shouldn't you use sth
> like a placement 'delete' instead of invoking a DTOR?

No, there is no placement delete. This is exactly the right way to
destroy an object in-place.

I haven't read the rest of the patch, but the snippet above looks fine.


Re: [PATCH] tree-optimization/65206 - dependence analysis on mixed pointer/array

2021-09-17 Thread Richard Biener via Gcc-patches
On Fri, 17 Sep 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > This adds the capability to analyze the dependence of mixed
> > pointer/array accesses.  The example is from where using a masked
> > load/store creates the pointer-based access when an otherwise
> > unconditional access is array based.  Other examples would include
> > accesses to an array mixed with accesses from inlined helpers
> > that work on pointers.
> >
> > The idea is quite simple and old - analyze the data-ref indices
> > as if the reference was pointer-based.  The following change does
> > this by changing dr_analyze_indices to work on the indices
> > sub-structure and storing an alternate indices substructure in
> > each data reference.  That alternate set of indices is analyzed
> > lazily by initialize_data_dependence_relation when it fails to
> > match-up the main set of indices of two data references.
> > initialize_data_dependence_relation is refactored into a head
> > and a tail worker and changed to work on one of the indices
> > structures and thus away from using DR_* access macros which
> > continue to reference the main indices substructure.
> >
> > There are quite some vectorization and loop distribution opportunities
> > unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
> > 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
> > 544.nab_r see amendments in what they report with -fopt-info-loop while
> > the rest of the specrate set sees no changes there.  Measuring runtime
> > for the set where changes were reported reveals nothing off-noise
> > besides 511.povray_r which seems to regress slightly for me
> > (on a Zen2 machine with -Ofast -march=native).
> >
> > Changes from the [RFC] version are properly handling bitfields
> > that we cannot take the address of and optimization of refs
> > that already are MEM_REFs and thus won't see any change.  I've
> > also elided changing the set of vect_masked_stores targets in
> > favor of explicitely listing avx (but I did not verify if the
> > testcase passes on aarch64-sve or amdgcn).
> >
> > The improves cases like the following from Povray:
> >
> >for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
> >  {
> > VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
> > Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
> >  }
> >
> > where there is a plain array access mixed with abstraction
> > using T[] or T* arguments.  That should be a not too uncommon
> > situation in the wild.  The loop above is now vectorized and was not
> > without the change.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
> > built and run SPEC CPU 2017 successfully.
> >
> > OK?
> 
> Took a while to page this stuff back in :-/
> 
> I guess if we're adding alt_indices to the main data_reference,
> we'll need to free the access_fns in free_data_ref.  It looked like
> the patch as posted might have a memory leak.

Doh, yes - thanks for noticing.

> Perhaps instead we could use local indices structures for the
> alt_indices and pass them in to initialize_data_dependence_relation?
> Not that that's very elegant…

Yeah, I had that but then since for N data references we possibly
call initialize_data_dependence_relation N^2 times we'd do this
alternate analysis N^2 times at worst instead of N so it looked worth
caching it in the data reference.  Since we have no idea why the
first try fails we're going to try this fallback in the majority
of cases that we cannot figure out otherwise so I didn't manage
to hand-wave the quadraticness away ;)  OTOH one might argue
it's just a constant factor ontop of the number of
initialize_data_dependence_relation invocations.

So I can probably be convinced either way - guess I'll gather
some statistics.

Richard.

> 
> > Thanks,
> > Richard.
> >
> > 2021-09-08  Richard Biener  
> >
> > PR tree-optimization/65206
> > * tree-data-ref.h (struct data_reference): Add alt_indices,
> > order it last.
> > * tree-data-ref.c (dr_analyze_indices): Work on
> > struct indices and get DR_REF as tree.
> > (create_data_ref): Adjust.
> > (initialize_data_dependence_relation): Split into head
> > and tail.  When the base objects fail to match up try
> > again with pointer-based analysis of indices.
> > * tree-vectorizer.c (vec_info_shared::check_datarefs): Do
> > not compare the lazily computed alternate set of indices.
> >
> > * gcc.dg/torture/20210916.c: New testcase.
> > * gcc.dg/vect/pr65206.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.dg/torture/20210916.c |  20 +++
> >  gcc/testsuite/gcc.dg/vect/pr65206.c |  22 +++
> >  gcc/tree-data-ref.c | 173 
> >  gcc/tree-data-ref.h |   9 +-
> >  gcc/tree-vectorizer.c   |   3 +-
> >  5 files changed, 167 insertions(+), 60 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/torture/20210916.c
> >  

Re: [PATCH 05/18] rs6000: Support for vectorizing built-in functions

2021-09-17 Thread Segher Boessenkool
On Wed, Sep 01, 2021 at 11:13:41AM -0500, Bill Schmidt wrote:
> This patch just duplicates a couple of functions and adjusts them to use the
> new builtin names.  There's no logical change otherwise.

> +/* Returns a function decl for a vectorized version of the builtin function
> +   with builtin function code FN and the result vector type TYPE, or 
> NULL_TREE
> +   if it is not available.  */
> +
> +static tree
> +rs6000_new_builtin_vectorized_function (unsigned int fn, tree type_out,
> + tree type_in)
> +{
> +  machine_mode in_mode, out_mode;
> +  int in_n, out_n;
> +
> +  if (TARGET_DEBUG_BUILTIN)
> +fprintf (stderr, "rs6000_new_builtin_vectorized_function (%s, %s, %s)\n",
> +  combined_fn_name (combined_fn (fn)),
> +  GET_MODE_NAME (TYPE_MODE (type_out)),
> +  GET_MODE_NAME (TYPE_MODE (type_in)));
> +
> +  if (TREE_CODE (type_out) != VECTOR_TYPE
> +  || TREE_CODE (type_in) != VECTOR_TYPE)
> +return NULL_TREE;

This is not described in the function comment.  Should it?  Should this
be here at all, should it be an assert instead?

It also should say it implements the
TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION macro?

> +static tree
> +rs6000_new_builtin_md_vectorized_function (tree fndecl, tree type_out,
> +tree type_in)
> +{
> +  machine_mode in_mode, out_mode;
> +  int in_n, out_n;
> +
> +  if (TARGET_DEBUG_BUILTIN)
> +fprintf (stderr,
> +  "rs6000_new_builtin_md_vectorized_function (%s, %s, %s)\n",
> +  IDENTIFIER_POINTER (DECL_NAME (fndecl)),
> +  GET_MODE_NAME (TYPE_MODE (type_out)),
> +  GET_MODE_NAME (TYPE_MODE (type_in)));
> +
> +  if (TREE_CODE (type_out) != VECTOR_TYPE
> +  || TREE_CODE (type_in) != VECTOR_TYPE)
> +return NULL_TREE;

Here it definitely should be an assert, the documentation of this hook
says so.

Other than that this is fine of course (or not worse than what there
already was, anyway ;-) ).  So put this on the big "one day we will
clean this up" list?

Okay for trunk.  Thanks!


Segher


Re: [PING^2] Re: Fix 'hash_table::expand' to destruct stale Value objects

2021-09-17 Thread Richard Biener via Gcc-patches
On Fri, Sep 17, 2021 at 1:17 PM Thomas Schwinge  wrote:
>
> Hi!
>
> On 2021-09-10T10:00:25+0200, I wrote:
> > On 2021-09-01T19:31:19-0600, Martin Sebor via Gcc-patches 
> >  wrote:
> >> On 8/30/21 4:46 AM, Thomas Schwinge wrote:
> >>> Ping -- we still need to plug the memory leak; see patch attached, and/or
> >>> long discussion here:
> >>
> >> Thanks for answering my questions.  I have no concerns with going
> >> forward with the patch as is.
> >
> > Thanks, Martin.  Ping for formal approval (and review for using proper
> > C++ terminology in the 'gcc/hash-table.h:hash_table::expand' source code
> > comment that I'm adding).  Patch again attached, for easy reference.
>
> Ping, once again.

I'm happy when a C++ literate approves the main change which I quote as

  new ((void*) q) value_type (std::move (x));
+
+ /* Manually invoke destructor of original object, to counterbalance
+object constructed via placement new.  */
+ x.~value_type ();

but I had the impression that std::move already "moves away" from the source?
That said, the dance above looks iffy, there must be a nicer way to "move"
an object in C++?

What happens if the dtor is deleted btw?  Shouldn't you use sth
like a placement 'delete' instead of invoking a DTOR?

But the above clearly shows I know nothing of C++ :P

Richard.

>
> Grüße
>  Thomas
>
>
> >> Just a suggestion/request: unless
> >> this patch fixes all the outstanding problems you know of or suspect
> >> in this area (leaks/missing dtor calls) and unless you plan to work
> >> on those in the near future, please open a bug for them with a brain
> >> dump of what you learned.  That should save us time when the day
> >> comes to tackle those.
> >
> > ACK.  I'm not aware of any additional known problems.  (In our email
> > discussion, we did have some "vague ideas" for opportunities of
> > clarification/clean-up, but these aren't worth filing PRs for; needs
> > someone to gain understanding, taking a look.)
> >
> >
> > Grüße
> >  Thomas
> >
> >
> >>> On 2021-08-16T14:10:00-0600, Martin Sebor  wrote:
>  On 8/16/21 6:44 AM, Thomas Schwinge wrote:
> > On 2021-08-12T17:15:44-0600, Martin Sebor via Gcc  
> > wrote:
> >> On 8/6/21 10:57 AM, Thomas Schwinge wrote:
> >>> So I'm trying to do some C++...  ;-)
> >>>
> >>> Given:
> >>>
> >>>/* A map from SSA names or var decls to record fields.  */
> >>>typedef hash_map field_map_t;
> >>>
> >>>/* For each propagation record type, this is a map from SSA 
> >>> names or var decls
> >>>   to propagate, to the field in the record type that should 
> >>> be used for
> >>>   transmission and reception.  */
> >>>typedef hash_map record_field_map_t;
> >>>
> >>> Thus, that's a 'hash_map>'.  (I may do 
> >>> that,
> >>> right?)  Looking through GCC implementation files, very most of all 
> >>> uses
> >>> of 'hash_map' boil down to pointer key ('tree', for example) and
> >>> pointer/integer value.
> >>
> >> Right.  Because most GCC containers rely exclusively on GCC's own
> >> uses for testing, if your use case is novel in some way, chances
> >> are it might not work as intended in all circumstances.
> >>
> >> I've wrestled with hash_map a number of times.  A use case that's
> >> close to yours (i.e., a non-trivial value type) is in cp/parser.c:
> >> see class_to_loc_map_t.
> >
> > Indeed, at the time you sent this email, I already had started looking
> > into that one!  (The Fortran test cases that I originally analyzed, 
> > which
> > triggered other cases of non-POD/non-trivial destructor, all didn't
> > result in a memory leak, because the non-trivial constructor doesn't
> > actually allocate any resources dynamically -- that's indeed different 
> > in
> > this case here.)  ..., and indeed:
> >
> >> (I don't remember if I tested it for leaks
> >> though.  It's used to implement -Wmismatched-tags so compiling
> >> a few tests under Valgrind should show if it does leak.)
> >
> > ... it does leak memory at present.  :-| (See attached commit log for
> > details for one example.)
> >>>
> >>> (Attached "Fix 'hash_table::expand' to destruct stale Value objects"
> >>> again.)
> >>>
> > To that effect, to document the current behavior, I propose to
> > "Add more self-tests for 'hash_map' with Value type with non-trivial
> > constructor/destructor"
> >>>
> >>> (We've done that in commit e4f16e9f357a38ec702fb69a0ffab9d292a6af9b
> >>> "Add more self-tests for 'hash_map' with Value type with non-trivial
> >>> constructor/destructor", quickly followed by bug fix
> >>> commit bb04a03c6f9bacc890118b9e12b657503093c2f8
> >>> "Make 'gcc/hash-map-tests.c:test_map_of_type_with_ctor_and_dtor_expand'
> >>> work on 32-bit architectures [PR101959]".
> >>>
> > (Also cherry-pick into release 

[PATCH, OpenMP, Fortran] Support in_reduction for Fortran

2021-09-17 Thread Chung-Lin Tang

Hi Jakub, and Fortran folks,
this patch does the required adjustments to let 'in_reduction' work for Fortran.
Not just for the target directive actually, task directive is also working after
this patch.

There is a little bit of adjustment in omp-low.c:scan_sharing_clauses:
RTL expand of the copy of the OMP_CLAUSE_IN_REDUCTION decl was failing
for Fortran by-reference arguments, which seems to work after placing them
under the outer ctx (when it exists). This also now needs checking the field_map
for existence of the field before inserting.

Tested without regressions on mainline trunk, is this okay?

(testing for devel/omp/gcc-11 is in progress)

Thanks,
Chung-Lin

2021-09-17  Chung-Lin Tang  

gcc/fortran/ChangeLog:

* openmp.c (gfc_match_omp_clause_reduction): Add 'openmp_target' default
false parameter. Add 'always,tofrom' map for OMP_LIST_IN_REDUCTION case.
(gfc_match_omp_clauses): Add 'openmp_target' default false parameter,
adjust call to gfc_match_omp_clause_reduction.
(match_omp): Adjust call to gfc_match_omp_clauses
* trans-openmp.c (gfc_trans_omp_taskgroup): Add call to
gfc_match_omp_clause, create and return block.

gcc/ChangeLog:

* omp-low.c (scan_sharing_clauses): Place in_reduction copy of variable
in outer ctx if if exists. Check if non-existent in field_map before
installing OMP_CLAUSE_IN_REDUCTION decl.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/reduction4.f90: Adjust omp target in_reduction' scan
pattern.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/target-in-reduction-1.f90: New test.
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index a64b7f5aa10..8179b5aa8bc 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1138,7 +1138,7 @@ failed:
 
 static match
 gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses *c, bool openacc,
-   bool allow_derived)
+   bool allow_derived, bool openmp_target = false)
 {
   if (pc == 'r' && gfc_match ("reduction ( ") != MATCH_YES)
 return MATCH_NO;
@@ -1285,6 +1285,19 @@ gfc_match_omp_clause_reduction (char pc, gfc_omp_clauses 
*c, bool openacc,
n->u2.udr = gfc_get_omp_namelist_udr ();
n->u2.udr->udr = udr;
  }
+   if (openmp_target && list_idx == OMP_LIST_IN_REDUCTION)
+ {
+   gfc_omp_namelist *p = gfc_get_omp_namelist (), **tl;
+   p->sym = n->sym;
+   p->where = p->where;
+   p->u.map_op = OMP_MAP_ALWAYS_TOFROM;
+
+   tl = >lists[OMP_LIST_MAP];
+   while (*tl)
+ tl = &((*tl)->next);
+   *tl = p;
+   p->next = NULL;
+ }
  }
   return MATCH_YES;
 }
@@ -1353,7 +1366,7 @@ gfc_match_dupl_atomic (bool not_dupl, const char *name)
 static match
 gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
   bool first = true, bool needs_space = true,
-  bool openacc = false)
+  bool openacc = false, bool openmp_target = false)
 {
   bool error = false;
   gfc_omp_clauses *c = gfc_get_omp_clauses ();
@@ -2057,8 +2070,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const 
omp_mask mask,
  goto error;
}
  if ((mask & OMP_CLAUSE_IN_REDUCTION)
- && gfc_match_omp_clause_reduction (pc, c, openacc,
-allow_derived) == MATCH_YES)
+ && gfc_match_omp_clause_reduction (pc, c, openacc, allow_derived,
+openmp_target) == MATCH_YES)
continue;
  if ((mask & OMP_CLAUSE_INBRANCH)
  && (m = gfc_match_dupl_check (!c->inbranch && !c->notinbranch,
@@ -3496,7 +3509,8 @@ static match
 match_omp (gfc_exec_op op, const omp_mask mask)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (, mask) != MATCH_YES)
+  if (gfc_match_omp_clauses (, mask, true, true, false,
+(op == EXEC_OMP_TARGET)) != MATCH_YES)
 return MATCH_ERROR;
   new_st.op = op;
   new_st.ext.omp_clauses = c;
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index e55e0c81868..08483951066 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -6391,12 +6391,17 @@ gfc_trans_omp_task (gfc_code *code)
 static tree
 gfc_trans_omp_taskgroup (gfc_code *code)
 {
+  stmtblock_t block;
+  gfc_start_block ();
   tree body = gfc_trans_code (code->block->next);
   tree stmt = make_node (OMP_TASKGROUP);
   TREE_TYPE (stmt) = void_type_node;
   OMP_TASKGROUP_BODY (stmt) = body;
-  OMP_TASKGROUP_CLAUSES (stmt) = NULL_TREE;
-  return stmt;
+  OMP_TASKGROUP_CLAUSES (stmt) = gfc_trans_omp_clauses (,
+   code->ext.omp_clauses,
+   code->loc);
+  gfc_add_expr_to_block (, stmt);
+  return 

[PATCH] Revert no longer needed fix for PR95539

2021-09-17 Thread Richard Biener via Gcc-patches
The workaround is no longer necessary since we maintain alignment
info on the DR group leader only.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-09-17  Richard Biener  

* tree-vect-stmts.c (vectorizable_load): Do not frob
stmt_info for SLP.
---
 gcc/tree-vect-stmts.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 4e0b2adf1dc..ce79d883dbf 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -8515,17 +8515,6 @@ vectorizable_load (vec_info *vinfo,
   if (!STMT_VINFO_DATA_REF (stmt_info))
 return false;
 
-  /* ???  Alignment analysis for SLP looks at SLP_TREE_SCALAR_STMTS[0]
- for unpermuted loads but we get passed SLP_TREE_REPRESENTATIVE
- which can be different when reduction chains were re-ordered.
- Now that we figured we're a dataref reset stmt_info back to
- SLP_TREE_SCALAR_STMTS[0].  When we're SLP only things should be
- refactored in a way to maintain the dr_vec_info pointer for the
- relevant access explicitely.  */
-  stmt_vec_info orig_stmt_info = stmt_info;
-  if (slp_node)
-stmt_info = SLP_TREE_SCALAR_STMTS (slp_node)[0];
-
   tree mask = NULL_TREE, mask_vectype = NULL_TREE;
   if (gassign *assign = dyn_cast  (stmt_info->stmt))
 {
@@ -8768,7 +8757,7 @@ vectorizable_load (vec_info *vinfo,
dump_printf_loc (MSG_NOTE, vect_location,
 "Vectorizing an unaligned access.\n");
 
-  STMT_VINFO_TYPE (orig_stmt_info) = load_vec_info_type;
+  STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
   vect_model_load_cost (vinfo, stmt_info, ncopies, vf, memory_access_type,
_info, slp_node, cost_vec);
   return true;
-- 
2.31.1


[committed] libstdc++: Rename tests with incorrect extension

2021-09-17 Thread Jonathan Wakely via Gcc-patches
The libstdc++ testsuite only runs .cc files, so these two old tests have
never been run.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/26_numerics/valarray/dr630-3.C: Moved to...
* testsuite/26_numerics/valarray/dr630-3.cc: ...here.
* testsuite/27_io/basic_iostream/cons/16251.C: Moved to...
* testsuite/27_io/basic_iostream/cons/16251.cc: ...here.

Tested x86_64-linux. Committed to trunk.

commit 749c31b345c2a37106b57ce805ea46a6d4765e09
Author: Jonathan Wakely 
Date:   Fri Sep 17 12:25:40 2021

libstdc++: Rename tests with incorrect extension

The libstdc++ testsuite only runs .cc files, so these two old tests have
never been run.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* testsuite/26_numerics/valarray/dr630-3.C: Moved to...
* testsuite/26_numerics/valarray/dr630-3.cc: ...here.
* testsuite/27_io/basic_iostream/cons/16251.C: Moved to...
* testsuite/27_io/basic_iostream/cons/16251.cc: ...here.

diff --git a/libstdc++-v3/testsuite/26_numerics/valarray/dr630-3.C 
b/libstdc++-v3/testsuite/26_numerics/valarray/dr630-3.cc
similarity index 100%
rename from libstdc++-v3/testsuite/26_numerics/valarray/dr630-3.C
rename to libstdc++-v3/testsuite/26_numerics/valarray/dr630-3.cc
diff --git a/libstdc++-v3/testsuite/27_io/basic_iostream/cons/16251.C 
b/libstdc++-v3/testsuite/27_io/basic_iostream/cons/16251.cc


[PATCH] configure: Update --help output for --with-multilib-list

2021-09-17 Thread Jonathan Wakely via Gcc-patches
The list of architectures that support the option is incomplete.

gcc/ChangeLog:

* configure.ac: Fix --with-multilib-list description.
* configure: Regenerate.

OK for trunk?


commit 630dc3085cbd87a224129177870103e2c4fbf22a
Author: Jonathan Wakely 
Date:   Fri Sep 17 12:34:22 2021

configure: Update --help output for --with-multilib-list

The list of architectures that support the option is incomplete.

gcc/ChangeLog:

* configure.ac: Fix --with-multilib-list description.
* configure: Regenerate.

diff --git a/gcc/configure.ac b/gcc/configure.ac
index fadd34dbbb6..838d2d6d122 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1169,7 +1169,7 @@ if test "x$enable_offload_defaulted" = xyes; then
 fi
 
 AC_ARG_WITH(multilib-list,
-[AS_HELP_STRING([--with-multilib-list], [select multilibs (AArch64, SH and 
x86-64 only)])],
+[AS_HELP_STRING([--with-multilib-list], [select multilibs (AArch64, ARM, 
RISC-V, SH and x86-64 only)])],
 :,
 with_multilib_list=default)
 


Re: [PATCH v3] ipa-inline: Add target info into fn summary [PR102059]

2021-09-17 Thread Martin Jambor
Hi,

On Fri, Sep 17 2021, Kewen.Lin wrote:
>
[...]
>
> Against v2 [2], this v3 addressed Martin's review comments:
>   - Replace HWI auto_vec with unsigned int for target_info
> to avoid overkill (also Segher's comments), adjust some
> places need to be updated for this change.
>   - Annotate target_info won't be streamed for offloading
> target compilers.
>   - Scan for all gimple statements instead of those with
> non-zero size/time weights.
>
> Against v1 [1], the v2 addressed Richi's and Segher's review
> comments, mainly consists of:
>   - Extend it to cover non always_inline.
>   - Exclude the case for offload streaming.
>   - Some function naming and formatting issues.
>   - Adjust rs6000_can_inline_p.
>   - Add new cases.
>
> Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.
>
> Any comments are highly appreciated!
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578555.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579045.html
> --
> gcc/ChangeLog:
>
>   PR ipa/102059
>   * config/rs6000/rs6000-call.c (rs6000_fn_has_any_of_these_mask_bits):
>   New function.
>   * config/rs6000/rs6000-internal.h
>   (rs6000_fn_has_any_of_these_mask_bits): New declare.
>   * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
>   (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
>   (rs6000_need_ipa_fn_target_info): New function.
>   (rs6000_update_ipa_fn_target_info): Likewise.
>   (rs6000_can_inline_p): Adjust for ipa function summary target info.
>   * config/rs6000/rs6000.h (RS6000_FN_TARGET_INFO_HTM): New macro.
>   * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
>   summary target info.
>   (analyze_function_body): Adjust for ipa function summary target
>   info and call hook rs6000_need_ipa_fn_target_info and
>   rs6000_update_ipa_fn_target_info.
>   (ipa_merge_fn_summary_after_inlining): Adjust for ipa function
>   summary target info.
>   (inline_read_section): Likewise.
>   (ipa_fn_summary_write): Likewise.
>   * ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
>   * doc/tm.texi: Regenerate.
>   * doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
>   hook.
>   (TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
>   * target.def (update_ipa_fn_target_info): New hook.
>   (need_ipa_fn_target_info): Likewise.
>   * targhooks.c (default_need_ipa_fn_target_info): New function.
>   (default_update_ipa_fn_target_info): Likewise.
>   * targhooks.h (default_update_ipa_fn_target_info): New declare.
>   (default_need_ipa_fn_target_info): Likewise.
>

[...]

> diff --git a/gcc/target.def b/gcc/target.def
> index 28a34f1d51b..28ff639684b 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -6600,6 +6600,41 @@ specific target options and the caller does not use 
> the same options.",
>   bool, (tree caller, tree callee),
>   default_target_can_inline_p)
>  
> +DEFHOOK
> +(update_ipa_fn_target_info,
> + "Allow target to analyze all gimple statements for the given function to\n\
> +record and update some target specific information for inlining.  A 
> typical\n\
> +example is that a caller with one isa feature disabled is normally not\n\
> +allowed to inline a callee with that same isa feature enabled even which 
> is\n\
> +attributed by always_inline, but with the conservative analysis on all\n\
> +statements of the callee if we are able to guarantee the callee does not\n\
> +exploit any instructions from the mismatch isa feature, it would be safe 
> to\n\
> +allow the caller to inline the callee.\n\
> +@var{info} is one @code{unsigned int} value to record information in which\n\
> +one set bit indicates one corresponding feature is detected in the 
> analysis,\n\
> +@var{stmt} is the statement being analyzed.  Return true if target still\n\
> +need to analyze the subsequent statements, otherwise return false to stop\n\
> +subsequent analysis.\n\
> +The default version of this hook returns false.",
> + bool, (unsigned int& info, const gimple* stmt),
> + default_update_ipa_fn_target_info)
> +
> +DEFHOOK
> +(need_ipa_fn_target_info,
> + "Allow target to check early whether it is necessary to analyze all 
> gimple\n\
> +statements in the given function to update target specific information for\n\
> +inlining.  See hook @code{update_ipa_fn_target_info} for usage example of\n\
> +target specific information.  This hook is expected to be invoked ahead of\n\
> +the iterating with hook @code{update_ipa_fn_target_info}.\n\
> +@var{decl} is the function being analyzed, @var{info} is the same as what\n\
> +in hook @code{update_ipa_fn_target_info}, target can do one time update\n\
> +into @var{info} without iterating for some case.  Return true if target\n\
> +decides to analyze all gimple statements to collect information, otherwise\n\
> +return false.\n\
> +The default version of this 

Re: [PATCH v2] ipa-inline: Add target info into fn summary [PR102059]

2021-09-17 Thread Martin Jambor
Hi,

On Fri, Sep 17 2021, Kewen.Lin wrote:
> on 2021/9/16 下午9:19, Martin Jambor wrote:
>> On Thu, Sep 16 2021, Kewen.Lin wrote:
>>> on 2021/9/15 下午8:51, Martin Jambor wrote:
 On Wed, Sep 08 2021, Kewen.Lin wrote:
>

 [...]

> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
> index 78399b0b9bb..300b8da4507 100644
> --- a/gcc/ipa-fnsummary.h
> +++ b/gcc/ipa-fnsummary.h
> @@ -193,6 +194,9 @@ public:
>vec *loop_strides;
>/* Parameters tested by builtin_constant_p.  */
>vec GTY((skip)) builtin_constant_p_parms;
> +  /* Like fp_expressions, but it's to hold some target specific 
> information,
> + such as some target specific isa flags.  */
> +  auto_vec GTY((skip)) target_info;
>/* Estimated growth for inlining all copies of the function before 
> start
>   of small functions inlining.
>   This value will get out of date as the callers are duplicated, but

 Segher already wrote in the first thread that a vector of HOST_WIDE_INTs
 is an overkill and I agree.  So at least make the new field just a
 HOST_WIDE_INT or better yet, an unsigned int.  But I would even go
 further and make target_info only a 16-bit bit-field, place it after the
 other bit-fields in class ipa_fn_summary and pass it to the hooks as
 uint16_t.  Unless you have plans which require more space, I think we
 should be conservative here.

>>>
>>> OK, yeah, the consideration is mainly for the scenario that target has
>>> a few bits to care about.  I just realized that to avoid inefficient
>>> bitwise operation for mapping target info bits to isa_flag bits, target
>>> can rearrange the sparse bits in isa_flag, so it's not a deal.
>>> Thanks for re-raising this!  I'll use the 16 bits bit-field in v3 as you
>>> suggested, if you don't mind, I will put it before the existing bit-fields
>>> to have a good alignment.
>> 
>> All right.
>> 
>
> Sorry that I failed to use 16 bit-fields for this, I figured out that
> the bit-fields can not be address-taken or passed as non-const reference.
> The gentype also failed to recognize uint16_t if I used uint16_t directly
> in ipa-fnsummary.h.  Finally I used unsigned int instead.
>

well, you could have used:

  unsigned int target_info : 16;

for the field (and uint16_t when passed to hooks).

But I am not sure if it is that crucial.

Martin


Re: [committed] libstdc++: Add missing 'constexpr' to std::tuple [PR102270]

2021-09-17 Thread Jonathan Wakely via Gcc-patches

On 16/09/21 23:07 +0100, Jonathan Wakely wrote:

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/102270
* include/std/tuple (_Head_base, _Tuple_impl): Add
_GLIBCXX20_CONSTEXPR to allocator-extended constructors.
(tuple<>::swap(tuple&)): Add _GLIBCXX20_CONSTEXPR.
* testsuite/20_util/tuple/cons/102270.C: New test.


Oops, this test has a .C extension, and so doesn't actually run (and
if I rename it to 102270.cc it fails).

Fix incoming ...




[PING^2] Re: Fix 'hash_table::expand' to destruct stale Value objects

2021-09-17 Thread Thomas Schwinge
Hi!

On 2021-09-10T10:00:25+0200, I wrote:
> On 2021-09-01T19:31:19-0600, Martin Sebor via Gcc-patches 
>  wrote:
>> On 8/30/21 4:46 AM, Thomas Schwinge wrote:
>>> Ping -- we still need to plug the memory leak; see patch attached, and/or
>>> long discussion here:
>>
>> Thanks for answering my questions.  I have no concerns with going
>> forward with the patch as is.
>
> Thanks, Martin.  Ping for formal approval (and review for using proper
> C++ terminology in the 'gcc/hash-table.h:hash_table::expand' source code
> comment that I'm adding).  Patch again attached, for easy reference.

Ping, once again.


Grüße
 Thomas


>> Just a suggestion/request: unless
>> this patch fixes all the outstanding problems you know of or suspect
>> in this area (leaks/missing dtor calls) and unless you plan to work
>> on those in the near future, please open a bug for them with a brain
>> dump of what you learned.  That should save us time when the day
>> comes to tackle those.
>
> ACK.  I'm not aware of any additional known problems.  (In our email
> discussion, we did have some "vague ideas" for opportunities of
> clarification/clean-up, but these aren't worth filing PRs for; needs
> someone to gain understanding, taking a look.)
>
>
> Grüße
>  Thomas
>
>
>>> On 2021-08-16T14:10:00-0600, Martin Sebor  wrote:
 On 8/16/21 6:44 AM, Thomas Schwinge wrote:
> On 2021-08-12T17:15:44-0600, Martin Sebor via Gcc  
> wrote:
>> On 8/6/21 10:57 AM, Thomas Schwinge wrote:
>>> So I'm trying to do some C++...  ;-)
>>>
>>> Given:
>>>
>>>/* A map from SSA names or var decls to record fields.  */
>>>typedef hash_map field_map_t;
>>>
>>>/* For each propagation record type, this is a map from SSA 
>>> names or var decls
>>>   to propagate, to the field in the record type that should be 
>>> used for
>>>   transmission and reception.  */
>>>typedef hash_map record_field_map_t;
>>>
>>> Thus, that's a 'hash_map>'.  (I may do that,
>>> right?)  Looking through GCC implementation files, very most of all uses
>>> of 'hash_map' boil down to pointer key ('tree', for example) and
>>> pointer/integer value.
>>
>> Right.  Because most GCC containers rely exclusively on GCC's own
>> uses for testing, if your use case is novel in some way, chances
>> are it might not work as intended in all circumstances.
>>
>> I've wrestled with hash_map a number of times.  A use case that's
>> close to yours (i.e., a non-trivial value type) is in cp/parser.c:
>> see class_to_loc_map_t.
>
> Indeed, at the time you sent this email, I already had started looking
> into that one!  (The Fortran test cases that I originally analyzed, which
> triggered other cases of non-POD/non-trivial destructor, all didn't
> result in a memory leak, because the non-trivial constructor doesn't
> actually allocate any resources dynamically -- that's indeed different in
> this case here.)  ..., and indeed:
>
>> (I don't remember if I tested it for leaks
>> though.  It's used to implement -Wmismatched-tags so compiling
>> a few tests under Valgrind should show if it does leak.)
>
> ... it does leak memory at present.  :-| (See attached commit log for
> details for one example.)
>>>
>>> (Attached "Fix 'hash_table::expand' to destruct stale Value objects"
>>> again.)
>>>
> To that effect, to document the current behavior, I propose to
> "Add more self-tests for 'hash_map' with Value type with non-trivial
> constructor/destructor"
>>>
>>> (We've done that in commit e4f16e9f357a38ec702fb69a0ffab9d292a6af9b
>>> "Add more self-tests for 'hash_map' with Value type with non-trivial
>>> constructor/destructor", quickly followed by bug fix
>>> commit bb04a03c6f9bacc890118b9e12b657503093c2f8
>>> "Make 'gcc/hash-map-tests.c:test_map_of_type_with_ctor_and_dtor_expand'
>>> work on 32-bit architectures [PR101959]".
>>>
> (Also cherry-pick into release branches, eventually?)
>>>
>>> Then:
>>>
>>>record_field_map_t field_map ([...]); // see below
>>>for ([...])
>>>  {
>>>tree record_type = [...];
>>>[...]
>>>bool existed;
>>>field_map_t 
>>>  = field_map.get_or_insert (record_type, );
>>>gcc_checking_assert (!existed);
>>>[...]
>>>for ([...])
>>>  fields.put ([...], [...]);
>>>[...]
>>>  }
>>>[stuff that looks up elements from 'field_map']
>>>field_map.empty ();
>>>
>>> This generally works.
>>>
>>> If I instantiate 'record_field_map_t field_map (40);', Valgrind is 
>>> happy.
>>> If however I instantiate 'record_field_map_t field_map (13);' (where 
>>> '13'
>>> would be the default for 'hash_map'), Valgrind 

[PING^2] Re: [Committed] [PATCH 2/4] (v4) On-demand locations within string-literals

2021-09-17 Thread Thomas Schwinge
Hi!

On 2021-09-10T09:48:56+0200, I wrote:
> Ping.  My patches again attached, for easy reference.

Ping once again.


Grüße
 Thomas


> On 2021-09-03T18:33:37+0200, I wrote:
>> Hi!
>>
>> On 2021-09-02T21:09:54+0200, I wrote:
>>> On 2021-09-02T15:59:14+0200, I wrote:
 On 2016-08-05T14:16:58-0400, David Malcolm  wrote:
> Committed to trunk as r239175; I'm attaching the final version of the
> patch for reference.

 David, you've added here 'gcc/input.h:struct location_hash' (see quoted
 below), which will be useful elsewhere, so:

> --- a/gcc/input.c
> +++ b/gcc/input.c

> +/* Internal function.  Canonicalize LOC into a form suitable for
> +   use as a key within the database, stripping away macro expansion,
> +   ad-hoc information, and range information, using the location of
> +   the start of LOC within an ordinary linemap.  */
> +
> +location_t
> +string_concat_db::get_key_loc (location_t loc)
> +{
> +  loc = linemap_resolve_location (line_table, loc, LRK_SPELLING_LOCATION,
> + NULL);
> +
> +  loc = get_range_from_loc (line_table, loc).m_start;
> +
> +  return loc;
> +}

 OK to push the attached
 "Harden 'gcc/input.c:string_concat_db::get_key_loc'"?  (This fell out of
 my analysis for development work elsewhere.)
>>>
>>> My suggested patch was:
>>>
>>> --- a/gcc/input.c
>>> +++ b/gcc/input.c
>>> @@ -1483,6 +1483,9 @@ string_concat_db::get_key_loc (location_t loc)
>>>
>>>loc = get_range_from_loc (line_table, loc).m_start;
>>>
>>> +  /* Ascertain that 'loc' is valid as a key in 'm_table'.  */
>>> +  gcc_checking_assert (!RESERVED_LOCATION_P (loc));
>>> +
>>>return loc;
>>>  }
>>>
>>> Uh, I should've looked at the correct test logs...  This change actually
>>> does regress 'c-c++-common/substring-location-PR-87721.c' and
>>> 'gcc.dg/plugin/diagnostic-test-string-literals-1.c': for these, we do see
>>> 'BUILTINS_LOCATION' (via 'string_concat_db::record_string_concatenation').
>>> Unless someone tell me that's unexpected (I'm completely lost in this
>>> code...)
>>
>> I think I convinced myself that the current code doesn't have stable
>> behavior, so...
>>
>>> I shall change/generalize my changes to provide both a
>>> 'location_hash' only using 'UNKNOWN_LOCATION' as a spare value for
>>> 'Empty' (as currently used here) and another variant additionally using
>>> 'BUILTINS_LOCATION' as spare value for 'Deleted'.
>>
>> ... I didn't do this, but instead would like to push the attached
>> "Don't record string concatenation data for 'RESERVED_LOCATION_P'"
>> (replacing "Harden 'gcc/input.c:string_concat_db::get_key_loc'" as
>> originally proposed).  OK?
>>
>>
>> ... and then re:
>>
> --- a/gcc/input.h
> +++ b/gcc/input.h

> +struct location_hash : int_hash  { };
> +
> +class GTY(()) string_concat_db
> +{
> +[...]
> +  hash_map  *m_table;
> +};

 OK to push the attached
 "Generalize 'gcc/input.h:struct location_hash'"?
>>
>> Attached again.
>>
>>
>> Grüße
>>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9f1066fcb770397d6e791aa0594f067a755e2ed6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 3 Sep 2021 18:25:10 +0200
Subject: [PATCH] Don't record string concatenation data for
 'RESERVED_LOCATION_P'

'RESERVED_LOCATION_P' means 'UNKNOWN_LOCATION' or 'BUILTINS_LOCATION.
We're using 'UNKNOWN_LOCATION' as a spare value for 'Empty', so should
ascertain that we don't use it as a key additionally.  Similarly for
'BUILTINS_LOCATION' that we'd later like to use as a spare value for
'Deleted'.

As discussed in the source code comment added, for these we didn't have
stable behavior anyway.

Follow-up to r239175 (commit 88faa309e5d6c6171b957daaf2f800920869)
"On-demand locations within string-literals".

	gcc/
	* input.c (string_concat_db::record_string_concatenation)
	(string_concat_db::get_string_concatenation): Skip for
	'RESERVED_LOCATION_P'.
	gcc/testsuite/
	* gcc.dg/plugin/diagnostic-test-string-literals-1.c: Adjust
	expected error diagnostics.
---
 gcc/input.c  | 9 +
 .../gcc.dg/plugin/diagnostic-test-string-literals-1.c| 4 ++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/input.c b/gcc/input.c
index 4b809862e02..dd753decfa0 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -1437,6 +1437,11 @@ string_concat_db::record_string_concatenation (int num, location_t *locs)
   gcc_assert (locs);
 
   location_t key_loc = get_key_loc (locs[0]);
+  /* We don't record data for 'RESERVED_LOCATION_P (key_loc)' key values:
+ any data now recorded under key 'key_loc' would be 

[committed] libgomp: Spelling error fix in OpenMP 5.1 conformance section

2021-09-17 Thread Jakub Jelinek via Gcc-patches
Hi!

Fix spelling of OpenMP directive declare variant.

2021-09-17  Jakub Jelinek  

* libgomp.texi (OpenMP 5.1): Spelling fix,
declare variante -> declare variant.

--- libgomp/libgomp.texi.jj 2021-09-08 09:55:29.014718647 +0200
+++ libgomp/libgomp.texi2021-09-17 12:30:57.518214360 +0200
@@ -276,9 +276,9 @@ The OpenMP 4.5 specification is fully su
 @item @code{omp_all_memory} reserved locator @tab N @tab
 @item @emph{target_device trait} in OpenMP Context @tab N @tab
 @item @code{target_device} selector set in context selectors @tab N @tab
-@item C/C++'s @code{declare variante} directive: elision support of
+@item C/C++'s @code{declare variant} directive: elision support of
   preprocessed code @tab N @tab
-@item @code{declare variante}: new clauses @code{adjust_args} and
+@item @code{declare variant}: new clauses @code{adjust_args} and
   @code{append_args} @tab N @tab
 @item @code{dispatch} construct @tab N @tab
 @item device-specific ICV settings the environment variables @tab N @tab

Jakub



Re: [PATCH] ipa-fnsummary: Remove inconsistent bp_pack_value

2021-09-17 Thread Richard Biener via Gcc-patches
On Fri, Sep 17, 2021 at 12:03 PM Richard Biener
 wrote:
>
> On Fri, Sep 17, 2021 at 11:43 AM Kewen.Lin  wrote:
> >
> > Hi,
> >
> > When changing target_info with bitfield, I happened to find this
> > inconsistent streaming in and out.  We have the streaming in:
> >
> >   bp_pack_value (, info->inlinable, 1);
> >   bp_pack_value (, false, 1);
> >   bp_pack_value (, info->fp_expressions, 1);
> >
> > while the streaming out:
> >
> >   info->inlinable = bp_unpack_value (, 1);
> >   info->fp_expressions = bp_unpack_value (, 1)
> >
> > The cleanup of Cilk Plus support seemed to miss to remove the bit
> > streaming out but change with streaming false.
> >
> > By hacking fp_expression_p to return true always, I can see it
> > reads the wrong fp_expressions value (false) out in wpa dumping.
> >
> > Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.
> >
> > Is it ok for trunk?
>
> OK for trunk and all affected branches (note we need to bump the
> LTO minor version there).  The issue comes from the removal
> of cilk+ in r8-4956 which removed the bp_unpack but replaced
> the bp_pack ...
>
> It's a correctness issue as we'll read fp_expressions as always 'false'

Btw, on branches we could also simply unpack a dummy bit to avoid
changing the format.

> Thanks,
> Richard.
>
> >
> > BR,
> > Kewen
> > -
> > gcc/ChangeLog:
> >
> > * ipa-fnsummary.c (ipa_fn_summary_write): Remove inconsistent
> > bitfield streaming out.
> >
> > diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> > index 2470937460f..31199919405 100644
> > --- a/gcc/ipa-fnsummary.c
> > +++ b/gcc/ipa-fnsummary.c
> > @@ -4652,7 +4652,6 @@ ipa_fn_summary_write (void)
> >info->time.stream_out (ob);
> >bp = bitpack_create (ob->main_stream);
> >bp_pack_value (, info->inlinable, 1);
> > -  bp_pack_value (, false, 1);
> >bp_pack_value (, info->fp_expressions, 1);
> >streamer_write_bitpack ();
> >streamer_write_uhwi (ob, vec_safe_length (info->conds));


Re: [PATCH] ipa-fnsummary: Remove inconsistent bp_pack_value

2021-09-17 Thread Richard Biener via Gcc-patches
On Fri, Sep 17, 2021 at 11:43 AM Kewen.Lin  wrote:
>
> Hi,
>
> When changing target_info with bitfield, I happened to find this
> inconsistent streaming in and out.  We have the streaming in:
>
>   bp_pack_value (, info->inlinable, 1);
>   bp_pack_value (, false, 1);
>   bp_pack_value (, info->fp_expressions, 1);
>
> while the streaming out:
>
>   info->inlinable = bp_unpack_value (, 1);
>   info->fp_expressions = bp_unpack_value (, 1)
>
> The cleanup of Cilk Plus support seemed to miss to remove the bit
> streaming out but change with streaming false.
>
> By hacking fp_expression_p to return true always, I can see it
> reads the wrong fp_expressions value (false) out in wpa dumping.
>
> Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.
>
> Is it ok for trunk?

OK for trunk and all affected branches (note we need to bump the
LTO minor version there).  The issue comes from the removal
of cilk+ in r8-4956 which removed the bp_unpack but replaced
the bp_pack ...

It's a correctness issue as we'll read fp_expressions as always 'false'

Thanks,
Richard.

>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * ipa-fnsummary.c (ipa_fn_summary_write): Remove inconsistent
> bitfield streaming out.
>
> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> index 2470937460f..31199919405 100644
> --- a/gcc/ipa-fnsummary.c
> +++ b/gcc/ipa-fnsummary.c
> @@ -4652,7 +4652,6 @@ ipa_fn_summary_write (void)
>info->time.stream_out (ob);
>bp = bitpack_create (ob->main_stream);
>bp_pack_value (, info->inlinable, 1);
> -  bp_pack_value (, false, 1);
>bp_pack_value (, info->fp_expressions, 1);
>streamer_write_bitpack ();
>streamer_write_uhwi (ob, vec_safe_length (info->conds));


Re: [PATCH 1/N] Rename asm_out_file function arguments.

2021-09-17 Thread Richard Biener via Gcc-patches
On Fri, Sep 17, 2021 at 11:42 AM Iain Sandoe  wrote:
>
> Hi Folks,
>
> > On 17 Sep 2021, at 09:23, Richard Biener  wrote:
> >
> > On Thu, Sep 16, 2021 at 3:52 PM Iain Sandoe  wrote:
> >>
> >>
> >>> On 16 Sep 2021, at 11:00, Martin Liška  wrote:
> >>>
> >>> As preparation for a new global object that will encapsulate
> >>> asm_out_file, we would need to live with a macro that will
> >>> define asm_out_file as casm->out_file and thus the name
> >>> can't be used in function arguments.
> >>
> >> So, if I understand correctly, the motivation is to be able to switch
> >> between output file streams for different categories of content?
> >>
> >> Darwin, actually already does this (manually) with a separate
> >> lto_asm_out_name for lto data (so a general solution would
> >> be great).
> >>
> >> What is the reason for associating the section pointers with the
> >> casm object?
> >>
> >> * I can understand that each instance of a casm object would have
> >> potentially a different current section (“in_section”), but it seems that
> >> as things stand the section pointers would be duplicates.
> >>
> >> * In the case that there’s reason that the sections could be different
> >>  between casm instances, then would it make sense to have a
> >>  target hook so that target-specific sections can be added to the
> >>  local list (via some indirection, I’d assume)?
> >
> > Yes, casm likely will end up with target specific state.  Note the main
> > motivation of the exercise is to develop and alternate way of funneling
> > the early debug data through the LTO pipeline, eliding the need for
> > the simple-object copying and section renaming dance.
>
> great ;-)
>
> FWIW, I did an implementation for Darwin, but not yet presented/comitted
> because…. the dependencies on the debug linker (dsymutil) mean that it
> is of limited value until I can find time to fix that up to understand the 
> input.
>
> > The idea is that at dwarf2out_early_finish time (which runs at
> > the compile stage) we write a regular pure debug-info object
> > with unmangled section names to an alternate assembler file
> > which we then assemble and include as raw byte blob in the
> > LTO IR + debug object file.  At link time the debug object
> > byte blobs can either be re-instantiated as separate input
> > object file or the linker can be taught to pick them up from
> > the existing file at a byte offset (internally it does this for
> > AR archive members for example) avoiding the extra I/O.
>
> … the more dependencies on external tool behaviour we have
> the harder it gets for non-ELF-binutils targets; I’d like to think
> about how to implement this in Mach-O…
>
> I don’t think there’s any mechanism for Mach-O to include an
> arbitrary blob in a _regular_ mach-o file, of course one could make
> it FAT in some way - but that would mean a lot of changes to the
> back end tools..

Nah, it would be GCC itself opening the object and copying the
data byte-by-byte into a special LTO data section which of course
means emitting a lot of .data in the assembler ... but well.

We could do without the copying but then we produce two files
for each object we compile - the LTO IR .o and the object with
the debug info, say, .debug.o.  The debug info will be linked into
the final object without further changes.  But we have to be
able to emit those two objects from a single cc1 invocation - that's
what this change is about (yeah, emit assembly).

The copying would just involve assembling the alternate file
and including it as data in the main assembly (and thus object)
in a way that makes it easy to either re-materialize the alternate
object file at link time or include it by reference.

> … so the path of least resistance is to do something like we do
> with the LTO already - abstract the info into a blob section with
> and index and a name table (we funnel the LTO off into a second
> file and then re-include that in finish_asm_file).
>
> not sure what xcoff etc. can do...
>
> > So for this alternate assembler file we'd have a different set
> > of sections.
>
> can you give me an example ?
> (I’m not succeeding in visualizing this yet).
>
> Note that we already have categories of sections:
>  generic (e.g. .text, .data, etc. supported by all file formats)
>  language-related (for at least C++, D, ObjC…)
>  debug
>  lto
>  back-end-specific
>
> I was wondering which of those needed to be cloned per casm
> and if it includes any of the language/back-end ones how we figure
> a mechanism to include those (I suppose the usual style of a macro-
> programmed .def would work?)

In principle all of them - we are really emitting a completely separate
and different object file (as assembler).  For the LTO debug use
we'd need all DWARF .debug_* sections.

I understand that all the special backend sections are eventually
created on-demand (there's also targetm.asm_file_start/end ...)

> >> —
> >>
> >> (of course, it would be great if one day we could abstract the asm out
> >> such 

Re: [PATCH v2] ipa-inline: Add target info into fn summary [PR102059]

2021-09-17 Thread Kewen.Lin via Gcc-patches
Hi Martin,

on 2021/9/16 下午9:19, Martin Jambor wrote:
> Hi,
> 
> On Thu, Sep 16 2021, Kewen.Lin wrote:
>> Hi Martin,
>>
>> Thanks for the review comments!
>>
>> on 2021/9/15 下午8:51, Martin Jambor wrote:
>>> Hi,
>>>
>>> since this is inlining-related, I would somewhat prefer Honza to have a
>>> look too, but I have the following comments:
>>>
>>> On Wed, Sep 08 2021, Kewen.Lin wrote:

>>>
>>> [...]
>>>
 diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
 index 78399b0b9bb..300b8da4507 100644
 --- a/gcc/ipa-fnsummary.h
 +++ b/gcc/ipa-fnsummary.h
 @@ -193,6 +194,9 @@ public:
vec *loop_strides;
/* Parameters tested by builtin_constant_p.  */
vec GTY((skip)) builtin_constant_p_parms;
 +  /* Like fp_expressions, but it's to hold some target specific 
 information,
 + such as some target specific isa flags.  */
 +  auto_vec GTY((skip)) target_info;
/* Estimated growth for inlining all copies of the function before start
   of small functions inlining.
   This value will get out of date as the callers are duplicated, but
>>>
>>> Segher already wrote in the first thread that a vector of HOST_WIDE_INTs
>>> is an overkill and I agree.  So at least make the new field just a
>>> HOST_WIDE_INT or better yet, an unsigned int.  But I would even go
>>> further and make target_info only a 16-bit bit-field, place it after the
>>> other bit-fields in class ipa_fn_summary and pass it to the hooks as
>>> uint16_t.  Unless you have plans which require more space, I think we
>>> should be conservative here.
>>>
>>
>> OK, yeah, the consideration is mainly for the scenario that target has
>> a few bits to care about.  I just realized that to avoid inefficient
>> bitwise operation for mapping target info bits to isa_flag bits, target
>> can rearrange the sparse bits in isa_flag, so it's not a deal.
>> Thanks for re-raising this!  I'll use the 16 bits bit-field in v3 as you
>> suggested, if you don't mind, I will put it before the existing bit-fields
>> to have a good alignment.
> 
> All right.
> 

Sorry that I failed to use 16 bit-fields for this, I figured out that
the bit-fields can not be address-taken or passed as non-const reference.
The gentype also failed to recognize uint16_t if I used uint16_t directly
in ipa-fnsummary.h.  Finally I used unsigned int instead.


/* When optimizing and analyzing for IPA inliner, initialize loop 
 optimizer
   so we can produce proper inline hints.
 @@ -2659,6 +2669,12 @@ analyze_function_body (struct cgraph_node *node, 
 bool early)
   bb_predicate,
   bb_predicate);
  
 +  /* Only look for target information for inlinable functions.  */
 +  bool scan_for_target_info =
 +info->inlinable
 +&& targetm.target_option.need_ipa_fn_target_info (node->decl,
 +info->target_info);
 +
if (fbi.info)
  compute_bb_predicates (, node, info, params_summary);
const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
 @@ -2876,6 +2892,10 @@ analyze_function_body (struct cgraph_node *node, 
 bool early)
  if (dump_file)
fprintf (dump_file, "   fp_expression set\n");
}
 +if (scan_for_target_info)
 +  scan_for_target_info =
 +targetm.target_option.update_ipa_fn_target_info
 +(info->target_info, stmt);
}
>>>
>>> Practically it probably does not matter, but why is this in the "if
>>> (this_time || this_size)" block?  Although I can see that setting
>>> fp_expression is also done that way... but it seems like copying a
>>> mistake to me.
>>
>> Yeah, I felt target info scanning is similar to fp_expression scanning,
>> so I just followed the same way.  If I read it right, the case
>> !(this_time || this_size) means the STMT won't be weighted to any RTL
>> insn from both time and size perspectives, so guarding it seems to avoid
>> unnecessary scannings.  I assumed that target bifs and inline asm would
>> not be evaluated as zero cost, it seems safe so far for HTM usage.
>>
>> Do you worry about some special STMT which is weighted to zero but it's
>> necessarily to be checked for target info in a long term?
>> If so, I'll move it out in v3.
> 
> It seems that gimple_call_internal_p statements are always costed to
> zero and I am wondering whether those are something that targets would
> want to look out for in the future.
> 
> But hopefully anyone implementing that in the future would come up with
> a testcase and would need to fix this to have the testcase pass.
> 

Thanks for confirming, I guess targets are very likely to have the need
to scan the IFNs in future.  I've moved it out of the block in V3.
Thanks for noticing this!

V3: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579658.html


[PATCH] ipa-fnsummary: Remove inconsistent bp_pack_value

2021-09-17 Thread Kewen.Lin via Gcc-patches
Hi,

When changing target_info with bitfield, I happened to find this
inconsistent streaming in and out.  We have the streaming in:

  bp_pack_value (, info->inlinable, 1);
  bp_pack_value (, false, 1);
  bp_pack_value (, info->fp_expressions, 1);

while the streaming out:

  info->inlinable = bp_unpack_value (, 1);
  info->fp_expressions = bp_unpack_value (, 1)

The cleanup of Cilk Plus support seemed to miss to remove the bit
streaming out but change with streaming false.

By hacking fp_expression_p to return true always, I can see it
reads the wrong fp_expressions value (false) out in wpa dumping.

Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* ipa-fnsummary.c (ipa_fn_summary_write): Remove inconsistent
bitfield streaming out.

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 2470937460f..31199919405 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -4652,7 +4652,6 @@ ipa_fn_summary_write (void)
   info->time.stream_out (ob);
   bp = bitpack_create (ob->main_stream);
   bp_pack_value (, info->inlinable, 1);
-  bp_pack_value (, false, 1);
   bp_pack_value (, info->fp_expressions, 1);
   streamer_write_bitpack ();
   streamer_write_uhwi (ob, vec_safe_length (info->conds));


[PATCH v3] ipa-inline: Add target info into fn summary [PR102059]

2021-09-17 Thread Kewen.Lin via Gcc-patches
Hi!

Power ISA 2.07 (Power8) introduces transactional memory feature
but ISA3.1 (Power10) removes it.  It exposes one troublesome
issue as PR102059 shows.  Users define some function with
target pragma cpu=power10 then it calls one function with
attribute always_inline which inherits command line option
cpu=power8 which enables HTM implicitly.  The current isa_flags
check doesn't allow this inlining due to "target specific
option mismatch" and error mesasge is emitted.

Normally, the callee function isn't intended to exploit HTM
feature, but the default flag setting make it look it has.
As Richi raised in the PR, we have fp_expressions flag in
function summary, and allow us to check the function actually
contains any floating point expressions to avoid overkill.
So this patch follows the similar idea but is more target
specific, for this rs6000 port specific requirement on HTM
feature check, we would like to check rs6000 specific HTM
built-in functions and inline assembly, it allows targets
to do their own customized checks and updates.

It introduces two target hooks need_ipa_fn_target_info and
update_ipa_fn_target_info.  The former allows target to do
some previous check and decides to collect target specific
information for this function or not.  For some special case,
it can predict the analysis result and set it early without
any scannings.  The latter allows the analyze_function_body
to pass gimple stmts down just like fp_expressions handlings,
target can do its own tricks.  I put them as one hook initially
with one boolean to indicates whether it's initial time, but
the code looks a bit ugly, to separate them seems to have
better readability.

Against v2 [2], this v3 addressed Martin's review comments:
  - Replace HWI auto_vec with unsigned int for target_info
to avoid overkill (also Segher's comments), adjust some
places need to be updated for this change.
  - Annotate target_info won't be streamed for offloading
target compilers.
  - Scan for all gimple statements instead of those with
non-zero size/time weights.

Against v1 [1], the v2 addressed Richi's and Segher's review
comments, mainly consists of:
  - Extend it to cover non always_inline.
  - Exclude the case for offload streaming.
  - Some function naming and formatting issues.
  - Adjust rs6000_can_inline_p.
  - Add new cases.

Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.

Any comments are highly appreciated!

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578555.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579045.html
--
gcc/ChangeLog:

PR ipa/102059
* config/rs6000/rs6000-call.c (rs6000_fn_has_any_of_these_mask_bits):
New function.
* config/rs6000/rs6000-internal.h
(rs6000_fn_has_any_of_these_mask_bits): New declare.
* config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
(TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
(rs6000_need_ipa_fn_target_info): New function.
(rs6000_update_ipa_fn_target_info): Likewise.
(rs6000_can_inline_p): Adjust for ipa function summary target info.
* config/rs6000/rs6000.h (RS6000_FN_TARGET_INFO_HTM): New macro.
* ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
summary target info.
(analyze_function_body): Adjust for ipa function summary target
info and call hook rs6000_need_ipa_fn_target_info and
rs6000_update_ipa_fn_target_info.
(ipa_merge_fn_summary_after_inlining): Adjust for ipa function
summary target info.
(inline_read_section): Likewise.
(ipa_fn_summary_write): Likewise.
* ipa-fnsummary.h (ipa_fn_summary::target_info): New member.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_UPDATE_IPA_FN_TARGET_INFO): Document new
hook.
(TARGET_NEED_IPA_FN_TARGET_INFO): Likewise.
* target.def (update_ipa_fn_target_info): New hook.
(need_ipa_fn_target_info): Likewise.
* targhooks.c (default_need_ipa_fn_target_info): New function.
(default_update_ipa_fn_target_info): Likewise.
* targhooks.h (default_update_ipa_fn_target_info): New declare.
(default_need_ipa_fn_target_info): Likewise.

gcc/testsuite/ChangeLog:

PR ipa/102059
* gcc.dg/lto/pr102059-1_0.c: New test.
* gcc.dg/lto/pr102059-1_1.c: New test.
* gcc.dg/lto/pr102059-1_2.c: New test.
* gcc.dg/lto/pr102059-2_0.c: New test.
* gcc.dg/lto/pr102059-2_1.c: New test.
* gcc.dg/lto/pr102059-2_2.c: New test.
* gcc.target/powerpc/pr102059-5.c: New test.
* gcc.target/powerpc/pr102059-6.c: New test.
* gcc.target/powerpc/pr102059-7.c: New test.

---
 gcc/config/rs6000/rs6000-call.c   | 12 +++
 gcc/config/rs6000/rs6000-internal.h   |  3 +
 gcc/config/rs6000/rs6000.c| 87 +--
 

Re: [PATCH 1/N] Rename asm_out_file function arguments.

2021-09-17 Thread Iain Sandoe
Hi Folks,

> On 17 Sep 2021, at 09:23, Richard Biener  wrote:
> 
> On Thu, Sep 16, 2021 at 3:52 PM Iain Sandoe  wrote:
>> 
>> 
>>> On 16 Sep 2021, at 11:00, Martin Liška  wrote:
>>> 
>>> As preparation for a new global object that will encapsulate
>>> asm_out_file, we would need to live with a macro that will
>>> define asm_out_file as casm->out_file and thus the name
>>> can't be used in function arguments.
>> 
>> So, if I understand correctly, the motivation is to be able to switch
>> between output file streams for different categories of content?
>> 
>> Darwin, actually already does this (manually) with a separate
>> lto_asm_out_name for lto data (so a general solution would
>> be great).
>> 
>> What is the reason for associating the section pointers with the
>> casm object?
>> 
>> * I can understand that each instance of a casm object would have
>> potentially a different current section (“in_section”), but it seems that
>> as things stand the section pointers would be duplicates.
>> 
>> * In the case that there’s reason that the sections could be different
>>  between casm instances, then would it make sense to have a
>>  target hook so that target-specific sections can be added to the
>>  local list (via some indirection, I’d assume)?
> 
> Yes, casm likely will end up with target specific state.  Note the main
> motivation of the exercise is to develop and alternate way of funneling
> the early debug data through the LTO pipeline, eliding the need for
> the simple-object copying and section renaming dance.

great ;-)

FWIW, I did an implementation for Darwin, but not yet presented/comitted
because…. the dependencies on the debug linker (dsymutil) mean that it
is of limited value until I can find time to fix that up to understand the 
input.

> The idea is that at dwarf2out_early_finish time (which runs at
> the compile stage) we write a regular pure debug-info object
> with unmangled section names to an alternate assembler file
> which we then assemble and include as raw byte blob in the
> LTO IR + debug object file.  At link time the debug object
> byte blobs can either be re-instantiated as separate input
> object file or the linker can be taught to pick them up from
> the existing file at a byte offset (internally it does this for
> AR archive members for example) avoiding the extra I/O.

… the more dependencies on external tool behaviour we have
the harder it gets for non-ELF-binutils targets; I’d like to think
about how to implement this in Mach-O…

I don’t think there’s any mechanism for Mach-O to include an
arbitrary blob in a _regular_ mach-o file, of course one could make
it FAT in some way - but that would mean a lot of changes to the
back end tools..

… so the path of least resistance is to do something like we do
with the LTO already - abstract the info into a blob section with
and index and a name table (we funnel the LTO off into a second
file and then re-include that in finish_asm_file).

not sure what xcoff etc. can do...

> So for this alternate assembler file we'd have a different set
> of sections.

can you give me an example ?
(I’m not succeeding in visualizing this yet).

Note that we already have categories of sections:
 generic (e.g. .text, .data, etc. supported by all file formats)
 language-related (for at least C++, D, ObjC…)
 debug
 lto
 back-end-specific 

I was wondering which of those needed to be cloned per casm
and if it includes any of the language/back-end ones how we figure
a mechanism to include those (I suppose the usual style of a macro-
programmed .def would work?)

>> —
>> 
>> (of course, it would be great if one day we could abstract the asm out
>> such that we could switch to a direct-to-object implementation)
> 
> Small steps ;)

Yes - but it’s easier to see if the small steps are in the direction we want,
with some idea of the finishing post, right? :)

I was wondering about a conceptual scenario like:

 casm ==> target_asm state
 this also conceptually “owns” the TARGET_ASM macros.
 one could consider migrating those macros to add casm as a first argument
 this would allow a second migration when target chose to implement the macros
 as inline functions taking the state as a first argument...

 .. or to implement casm as an abstract base class, where the target macros 
become
 virtual methods, with a default impl. that can be overriden by the target.

 My guess is that people will say “the second one is too much overhead because 
it
 incurs an indirection instead of the direct inline” … I suppose that depends 
on how
 well we devirtualize ….

>>> I've built all cross compilers with the change and
>>> can bootstrap on x86_64-linux-gnu and survives regression tests.
>> 
>> A native bootstrap fails early in stage1 for x86_64-darwin (I’ll take a look
>> at fixing the issues once the patch series settles down)

JFTR, I applied the first two patches and then a couple of tweaks and it did
bootstrap on Darwin.

thanks
Iain




Re: [PATCH] tree-optimization/65206 - dependence analysis on mixed pointer/array

2021-09-17 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> This adds the capability to analyze the dependence of mixed
> pointer/array accesses.  The example is from where using a masked
> load/store creates the pointer-based access when an otherwise
> unconditional access is array based.  Other examples would include
> accesses to an array mixed with accesses from inlined helpers
> that work on pointers.
>
> The idea is quite simple and old - analyze the data-ref indices
> as if the reference was pointer-based.  The following change does
> this by changing dr_analyze_indices to work on the indices
> sub-structure and storing an alternate indices substructure in
> each data reference.  That alternate set of indices is analyzed
> lazily by initialize_data_dependence_relation when it fails to
> match-up the main set of indices of two data references.
> initialize_data_dependence_relation is refactored into a head
> and a tail worker and changed to work on one of the indices
> structures and thus away from using DR_* access macros which
> continue to reference the main indices substructure.
>
> There are quite some vectorization and loop distribution opportunities
> unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
> 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
> 544.nab_r see amendments in what they report with -fopt-info-loop while
> the rest of the specrate set sees no changes there.  Measuring runtime
> for the set where changes were reported reveals nothing off-noise
> besides 511.povray_r which seems to regress slightly for me
> (on a Zen2 machine with -Ofast -march=native).
>
> Changes from the [RFC] version are properly handling bitfields
> that we cannot take the address of and optimization of refs
> that already are MEM_REFs and thus won't see any change.  I've
> also elided changing the set of vect_masked_stores targets in
> favor of explicitely listing avx (but I did not verify if the
> testcase passes on aarch64-sve or amdgcn).
>
> The improves cases like the following from Povray:
>
>for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
>  {
> VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
> Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
>  }
>
> where there is a plain array access mixed with abstraction
> using T[] or T* arguments.  That should be a not too uncommon
> situation in the wild.  The loop above is now vectorized and was not
> without the change.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
> built and run SPEC CPU 2017 successfully.
>
> OK?

Took a while to page this stuff back in :-/

I guess if we're adding alt_indices to the main data_reference,
we'll need to free the access_fns in free_data_ref.  It looked like
the patch as posted might have a memory leak.

Perhaps instead we could use local indices structures for the
alt_indices and pass them in to initialize_data_dependence_relation?
Not that that's very elegant…

> Thanks,
> Richard.
>
> 2021-09-08  Richard Biener  
>
>   PR tree-optimization/65206
>   * tree-data-ref.h (struct data_reference): Add alt_indices,
>   order it last.
>   * tree-data-ref.c (dr_analyze_indices): Work on
>   struct indices and get DR_REF as tree.
>   (create_data_ref): Adjust.
>   (initialize_data_dependence_relation): Split into head
>   and tail.  When the base objects fail to match up try
>   again with pointer-based analysis of indices.
>   * tree-vectorizer.c (vec_info_shared::check_datarefs): Do
>   not compare the lazily computed alternate set of indices.
>
>   * gcc.dg/torture/20210916.c: New testcase.
>   * gcc.dg/vect/pr65206.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/torture/20210916.c |  20 +++
>  gcc/testsuite/gcc.dg/vect/pr65206.c |  22 +++
>  gcc/tree-data-ref.c | 173 
>  gcc/tree-data-ref.h |   9 +-
>  gcc/tree-vectorizer.c   |   3 +-
>  5 files changed, 167 insertions(+), 60 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/20210916.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr65206.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c 
> b/gcc/testsuite/gcc.dg/torture/20210916.c
> new file mode 100644
> index 000..0ea6d45e463
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/20210916.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +
> +typedef union tree_node *tree;
> +struct tree_base {
> +  unsigned : 1;
> +  unsigned lang_flag_2 : 1;
> +};
> +struct tree_type {
> +  tree main_variant;
> +};
> +union tree_node {
> +  struct tree_base base;
> +  struct tree_type type;
> +};
> +tree finish_struct_t, finish_struct_x;
> +void finish_struct()
> +{
> +  for (; finish_struct_t->type.main_variant;)
> +finish_struct_x->base.lang_flag_2 = 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c 
> b/gcc/testsuite/gcc.dg/vect/pr65206.c
> new file mode 100644
> index 

[committed] openmp: Add support for OpenMP 5.1 atomics for C++

2021-09-17 Thread Jakub Jelinek via Gcc-patches
Hi!

Besides the C++ FE changes, I've noticed that the C FE didn't reject
  #pragma omp atomic capture compare
  { v = x; x = y; }
and other forms of atomic swap, this patch fixes that too.  And the
c-family/ routine needed quite a few changes so that the new code
in it works fine with both FEs.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-09-17  Jakub Jelinek  

gcc/c-family/
* c-omp.c (c_finish_omp_atomic): Avoid creating
TARGET_EXPR if test is true, use create_tmp_var_raw instead of
create_tmp_var and add a zero initializer to TARGET_EXPRs that
had NULL initializer.  When omitting operands after v = x,
use type of v rather than type of x.  Fix type of vtmp
TARGET_EXPR.
gcc/c/
* c-parser.c (c_parser_omp_atomic): Reject atomic swap if capture
is true.
gcc/cp/
* cp-tree.h (finish_omp_atomic): Add r and weak arguments.
* parser.c (cp_parser_omp_atomic): Update function comment for
OpenMP 5.1 atomics, parse OpenMP 5.1 atomics and fail, compare and
weak clauses.
* semantics.c (finish_omp_atomic): Add r and weak arguments, handle
them, handle COND_EXPRs.
* pt.c (tsubst_expr): Adjust for COND_EXPR forms that
finish_omp_atomic can now produce.
gcc/testsuite/
* c-c++-common/gomp/atomic-18.c: Expect same diagnostics in C++ as in
C.
* c-c++-common/gomp/atomic-25.c: Drop c effective target.
* c-c++-common/gomp/atomic-26.c: Likewise.
* c-c++-common/gomp/atomic-27.c: Likewise.
* c-c++-common/gomp/atomic-28.c: Likewise.
* c-c++-common/gomp/atomic-29.c: Likewise.
* c-c++-common/gomp/atomic-30.c: Likewise.  Adjust expected diagnostics
for C++ when it differs from C.
(foo): Change return type from double to void.
* g++.dg/gomp/atomic-5.C: Adjust expected diagnostics wording.
* g++.dg/gomp/atomic-20.C: New test.
libgomp/
* testsuite/libgomp.c-c++-common/atomic-19.c: Drop c effective target.
Use /* */ comments instead of //.
* testsuite/libgomp.c-c++-common/atomic-20.c: Likewise.
* testsuite/libgomp.c-c++-common/atomic-21.c: Likewise.
* testsuite/libgomp.c++/atomic-16.C: New test.
* testsuite/libgomp.c++/atomic-17.C: New test.

--- gcc/c-family/c-omp.c.jj 2021-09-11 09:33:37.735334112 +0200
+++ gcc/c-family/c-omp.c2021-09-16 19:20:21.349157878 +0200
@@ -376,7 +376,7 @@ c_finish_omp_atomic (location_t loc, enu
return error_mark_node;
   gcc_assert (TREE_CODE (rhs1) == EQ_EXPR);
   tree cmptype = TREE_TYPE (TREE_OPERAND (rhs1, 0));
-  if (SCALAR_FLOAT_TYPE_P (cmptype))
+  if (SCALAR_FLOAT_TYPE_P (cmptype) && !test)
{
  bool clear_padding = false;
  if (BITS_PER_UNIT == 8 && CHAR_BIT == 8)
@@ -443,12 +443,14 @@ c_finish_omp_atomic (location_t loc, enu
}
}
}
-  if (r)
+  if (r && test)
+   rtmp = rhs1;
+  else if (r)
{
- tree var = create_tmp_var (boolean_type_node);
+ tree var = create_tmp_var_raw (boolean_type_node);
  DECL_CONTEXT (var) = current_function_decl;
  rtmp = build4 (TARGET_EXPR, boolean_type_node, var,
-NULL, NULL, NULL);
+boolean_false_node, NULL, NULL);
  save = in_late_binary_op;
  in_late_binary_op = true;
  x = build_modify_expr (loc, var, NULL_TREE, NOP_EXPR,
@@ -529,14 +531,11 @@ c_finish_omp_atomic (location_t loc, enu
}
}
   if (blhs)
+   x = build3_loc (loc, BIT_FIELD_REF, TREE_TYPE (blhs), x,
+   bitsize_int (bitsize), bitsize_int (bitpos));
+  if (r && !test)
{
- x = build3_loc (loc, BIT_FIELD_REF, TREE_TYPE (blhs), x,
- bitsize_int (bitsize), bitsize_int (bitpos));
- type = TREE_TYPE (blhs);
-   }
-  if (r)
-   {
- vtmp = create_tmp_var (TREE_TYPE (x));
+ vtmp = create_tmp_var_raw (TREE_TYPE (x));
  DECL_CONTEXT (vtmp) = current_function_decl;
}
   else
@@ -545,10 +544,11 @@ c_finish_omp_atomic (location_t loc, enu
 loc, x, NULL_TREE);
   if (x == error_mark_node)
return error_mark_node;
-  if (r)
+  type = TREE_TYPE (x);
+  if (r && !test)
{
- vtmp = build4 (TARGET_EXPR, boolean_type_node, vtmp,
-NULL, NULL, NULL);
+ vtmp = build4 (TARGET_EXPR, TREE_TYPE (vtmp), vtmp,
+build_zero_cst (TREE_TYPE (vtmp)), NULL, NULL);
  gcc_assert (TREE_CODE (x) == MODIFY_EXPR
  && TREE_OPERAND (x, 0) == TARGET_EXPR_SLOT (vtmp));
  TREE_OPERAND (x, 0) = vtmp;
--- gcc/c/c-parser.c.jj 2021-09-11 09:33:37.738334069 +0200
+++ gcc/c/c-parser.c2021-09-16 18:18:30.366878199 +0200
@@ -18500,7 

Re: [PATCH, Fortran] Use _Float128 rather than __float128 for c_float128 kind

2021-09-17 Thread Tobias Burnus

On 17.09.21 06:02, Sandra Loosemore wrote:

On 9/5/21 11:20 PM, Sandra Loosemore wrote:

Unless the aarch64 maintainers think it is a bug that __float128 is
not supported, I think the right solution here is [...] to tie the
C_FLOAT128 kind to _Float128 rather than __float128, [...]


Here's a new patch that does this.  I've tested it on
x86_64-linux-gnu, powerpc64le-linux-gnu, and aarch64-linux-gnu, and it
does fix the previously reported failure compiling
gfortran.dg/PR100914.c on aarch64.  OK to commit?


LGTM – thanks!

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] nvptx: Add (experimental) support for HFmode with -misa=sm_53

2021-09-17 Thread Tobias Burnus

Hi Roger,

some more generic remarks not specific to using new ISA features.

On 17.09.21 00:53, Roger Sayle wrote:


Whilst there I also added -misa=sm_75 and -misa=sm_80 which are points
where other useful instructions were added to the ISA.


First, my impression was that already sm_70 added lots of useful stuff,
but granted sm_75 adds some more. In any case, the question is whether
it makes sense to add those flags if no use is made of those flags.

In particular, sm_80 is according to the following webpage only
supported with PTX ISA 7.0 of CUDA 11.0. But GCC currently only supports
-mptx=3.6 (default) and -mptx=6.3 (= CUDA 10).
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes

Note that you missed to update gcc/config/nvptx/t-omp-device for the new
sm_*  and likewise the "-misa=@var{ISA-string}" section in
gcc/gcc/doc/invoke.texi.

Additionally, I wonder whether the preprocessor macros __nvptx__,
__nvptx_softstack__, __nvptx_unisimt__ and __PTX_SM__  should be
documented somewhere as well. As all but one are related to command-line
options, I wonder whether the respective section in invoke.texi would be
a good place for them.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH 1/N] Rename asm_out_file function arguments.

2021-09-17 Thread Richard Biener via Gcc-patches
On Thu, Sep 16, 2021 at 3:52 PM Iain Sandoe  wrote:
>
>
> Hi Martin,
>
> > On 16 Sep 2021, at 11:00, Martin Liška  wrote:
> >
> > As preparation for a new global object that will encapsulate
> > asm_out_file, we would need to live with a macro that will
> > define asm_out_file as casm->out_file and thus the name
> > can't be used in function arguments.
>
> So, if I understand correctly, the motivation is to be able to switch
> between output file streams for different categories of content?
>
> Darwin, actually already does this (manually) with a separate
> lto_asm_out_name for lto data (so a general solution would
> be great).
>
> What is the reason for associating the section pointers with the
> casm object?
>
> * I can understand that each instance of a casm object would have
>  potentially a different current section (“in_section”), but it seems that
>  as things stand the section pointers would be duplicates.
>
> * In the case that there’s reason that the sections could be different
>   between casm instances, then would it make sense to have a
>   target hook so that target-specific sections can be added to the
>   local list (via some indirection, I’d assume)?

Yes, casm likely will end up with target specific state.  Note the main
motivation of the exercise is to develop and alternate way of funneling
the early debug data through the LTO pipeline, eliding the need for
the simple-object copying and section renaming dance.

The idea is that at dwarf2out_early_finish time (which runs at
the compile stage) we write a regular pure debug-info object
with unmangled section names to an alternate assembler file
which we then assemble and include as raw byte blob in the
LTO IR + debug object file.  At link time the debug object
byte blobs can either be re-instantiated as separate input
object file or the linker can be taught to pick them up from
the existing file at a byte offset (internally it does this for
AR archive members for example) avoiding the extra I/O.

So for this alternate assembler file we'd have a different set
of sections.

> —
>
> (of course, it would be great if one day we could abstract the asm out
>  such that we could switch to a direct-to-object implementation)

Small steps ;)

> > I've built all cross compilers with the change and
> > can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> A native bootstrap fails early in stage1 for x86_64-darwin (I’ll take a look
> at fixing the issues once the patch series settles down)
>
> ---
>
> /src-local/gcc-master/gcc/dwarf2asm.c: In function ‘void 
> dw2_asm_output_nstring(const char*, size_t, const char*, ...)’:
> /src-local/gcc-master/gcc/output.h:387:26: error: expected initializer before 
> ‘->’ token
>  #define asm_out_file casm->out_file
>   ^
> /src-local/gcc-master/gcc/defaults.h:68:13: note: in expansion of macro 
> ‘asm_out_file’
>FILE *asm_out_file = _hide_asm_out_file;  \
>  ^~~~
> /src-local/gcc-master/gcc/dwarf2asm.c:414:7: note: in expansion of macro 
> ‘ASM_OUTPUT_ASCII’
>ASM_OUTPUT_ASCII (asm_out_file, str, len);
>^~~~
> In file included from ./tm.h:42:0,
>
> --
>
> /src-local/gcc-master/gcc/config/i386/darwin.h:219:6: error: ‘in_section’ was 
> not declared in this scope
>   if (in_section == text_section)   \
>   ^
> /src-local/gcc-master/gcc/dwarf2out.c:677:3: note: in expansion of macro 
> ‘ASM_OUTPUT_ALIGN’
>ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (PTR_SIZE));
>
>


Re: [PATCH V2] Set bound/cmp/control for until wrap loop.

2021-09-17 Thread Richard Biener via Gcc-patches
On Thu, 2 Sep 2021, Jiufu Guo wrote:

> 
> Changes on V1:
> * Add more test case
> * Add comment for exit-condition transform
> * Removing duplicate setting on niter->control
> 
> This patch reset niter->control, niter->bound and niter->cmp in
> number_of_iterations_until_wrap.
> 
> Bootstrap and test pass on ppc64 and x86, and pass the test cases
> in PR.  Is this ok for trunk?

OK.

Thanks,
Richard.

> One thing, in this patch, the IVbase is still keep as biasing by 1 step.
> 
> 
> BR.
> Jiufu Guo
> 
> gcc/ChangeLog:
> 
> 2021-09-02  Jiufu Guo  
> 
> PR tree-optimization/102087
> * tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
> Update bound/cmp/control for niter.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-09-02  Jiufu Guo  
> 
> PR tree-optimization/102087
> * gcc.dg/pr102087.c: New test.
> 
> ---
>  gcc/tree-ssa-loop-niter.c   | 16 ++-
>  gcc/testsuite/gcc.dg/pr102087.c | 35 +
>  2 files changed, 50 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr102087.c
> 
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index 7af92d1c893..75109407124 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -1482,7 +1482,7 @@ number_of_iterations_until_wrap (class loop *, tree 
> type, affine_iv *iv0,
>affine_iv *iv1, class tree_niter_desc *niter)
>  {
>tree niter_type = unsigned_type_for (type);
> -  tree step, num, assumptions, may_be_zero;
> +  tree step, num, assumptions, may_be_zero, span;
>wide_int high, low, max, min;
>  
>may_be_zero = fold_build2 (LE_EXPR, boolean_type_node, iv1->base, 
> iv0->base);
> @@ -1557,6 +1557,20 @@ number_of_iterations_until_wrap (class loop *, tree 
> type, affine_iv *iv0,
>  
>niter->control.no_overflow = false;
>  
> +  /* Update bound and exit condition as:
> + bound = niter * STEP + (IVbase - STEP).
> + { IVbase - STEP, +, STEP } != bound
> + Here, biasing IVbase by 1 step makes 'bound' be the value before wrap.
> + */
> +  niter->control.base = fold_build2 (MINUS_EXPR, niter_type,
> +  niter->control.base, niter->control.step);
> +  span = fold_build2 (MULT_EXPR, niter_type, niter->niter,
> +   fold_convert (niter_type, niter->control.step));
> +  niter->bound = fold_build2 (PLUS_EXPR, niter_type, span,
> +   fold_convert (niter_type, niter->control.base));
> +  niter->bound = fold_convert (type, niter->bound);
> +  niter->cmp = NE_EXPR;
> +
>return true;
>  }
>  
> diff --git a/gcc/testsuite/gcc.dg/pr102087.c b/gcc/testsuite/gcc.dg/pr102087.c
> new file mode 100644
> index 000..fc60cbda066
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr102087.c
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +unsigned __attribute__ ((noinline))
> +foo (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
> +{
> +  while (n < ++l)
> +*a++ = *b++ + 1;
> +  return l;
> +}
> +
> +volatile int a[1];
> +unsigned b;
> +int c;
> +
> +int
> +check ()
> +{
> +  int d;
> +  for (; b > 1; b++)
> +for (c = 0; c < 2; c++)
> +  for (d = 0; d < 2; d++)
> + a[0];
> +  return 0;
> +}
> +
> +char **Gif_ClipImage_gfi_0;
> +int Gif_ClipImage_y, Gif_ClipImage_shift;
> +void
> +Gif_ClipImage ()
> +{
> +  for (; Gif_ClipImage_y >= Gif_ClipImage_shift; Gif_ClipImage_y++)
> +Gif_ClipImage_gfi_0[Gif_ClipImage_shift]
> +  = Gif_ClipImage_gfi_0[Gif_ClipImage_y];
> +}
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH #2] PR c/102245: Disable sign-changing optimization for shifts by zero.

2021-09-17 Thread Richard Biener via Gcc-patches
On Tue, Sep 14, 2021 at 9:44 AM Roger Sayle  wrote:
>
>
> Respecting Jakub's suggestion that it may be better to warn-on-valid for
> "if (x << 0)" as the author might have intended "if (x < 0)" [which will
> also warn when x is _Bool], the simplest way to resolve this regression
> is to disable the recently added fold transformation for shifts by zero;
> these will be optimized later (elsewhere).  Guarding against integer_zerop
> is the simplest of three alternatives; the second being to only apply
> this transformation to GIMPLE and not GENERIC, and the third (potentially)
> being to explicitly handle shifts by zero here, with an (if cond then else),
> optimizing the expression to a convert, but awkwardly duplicating the
> more general transformation earlier in match.pd's shift simplifications.
>
> This patch has been tested (against a recent snapshot without build issues)
> on x86_64-pc-linux-gnu with "make bootstrap" and "make -k check" with no
> new failures.  Note that test1 in the new testcase is changed from
> dg-bogus to dg-warning compared with version #1.  Ok for mainline?

OK.

Thanks,
Richard.

> 2021-09-14  Roger Sayle  
>
> gcc/ChangeLog
> PR c/102245
> * match.pd (shift optimizations): Disable recent sign-changing
> optimization for shifts by zero, these will be folded later.
>
> gcc/testsuite/ChangeLog
> PR c/102245
> * gcc.dg/Wint-in-bool-context-4.c: New test case.
>
>
> Roger
>
> -Original Message-
> From: Jakub Jelinek 
> Sent: 13 September 2021 11:58
> To: Roger Sayle 
> Cc: 'GCC Patches' 
> Subject: Re: [PATCH] PR c/102245: Don't warn that ((_Bool)x<<0) isn't a 
> truthvalue.
>
> On Mon, Sep 13, 2021 at 11:42:08AM +0100, Roger Sayle wrote:
> > gcc/c-family/ChangeLog
> >   PR c/102245
> >   * c-common.c (c_common_truthvalue_conversion) [LSHIFT_EXPR]:
> >   Special case (optimize) shifts by zero.
> >
> > gcc/testsuite/ChangeLog
> >   PR c/102245
> >   * gcc.dg/Wint-in-bool-context-4.c: New test case.
> >
> > Roger
> > --
> >
>
> > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index
> > 017e415..44b5fcc 100644
> > --- a/gcc/c-family/c-common.c
> > +++ b/gcc/c-family/c-common.c
> > @@ -3541,6 +3541,10 @@ c_common_truthvalue_conversion (location_t location, 
> > tree expr)
> >break;
> >
> >  case LSHIFT_EXPR:
> > +  /* Treat shifts by zero as a special case.  */
> > +  if (integer_zerop (TREE_OPERAND (expr, 1)))
> > + return c_common_truthvalue_conversion (location,
> > +TREE_OPERAND (expr, 0));
> >/* We will only warn on signed shifts here, because the majority of
> >false positive warnings happen in code where unsigned arithmetic
> >was used in anticipation of a possible overflow.
>
> > /* PR c/102245 */
> > /* { dg-options "-Wint-in-bool-context" } */
> > /* { dg-do compile } */
> >
> > _Bool test1(_Bool x)
> > {
> >   return !(x << 0);  /* { dg-bogus "boolean context" } */ }
>
> While this exact case is unlikely a misspelling of !(x < 0) as no _Bool is 
> less than zero and hopefully we get a warning for !(x < 0), what about _Bool 
> test1a(int x) {
>   return !(x << 0);
> }
> ?  I think there is a non-zero chance this was meant to be !(x < 0) and the 
> current
> pr102245.c: In function ‘test1a’:
> pr102245.c:3:14: warning: ‘<<’ in boolean context, did you mean ‘<’? 
> [-Wint-in-bool-context]
> 3 |   return !(x << 0);
>   |   ~~~^
> warning seems to be useful.
>
> Jakub
>


Re: [RFC] ldist: Recognize rawmemchr loop patterns

2021-09-17 Thread Richard Biener via Gcc-patches
On Mon, Sep 13, 2021 at 4:53 PM Stefan Schulze Frielinghaus
 wrote:
>
> On Mon, Sep 06, 2021 at 11:56:21AM +0200, Richard Biener wrote:
> > On Fri, Sep 3, 2021 at 10:01 AM Stefan Schulze Frielinghaus
> >  wrote:
> > >
> > > On Fri, Aug 20, 2021 at 12:35:58PM +0200, Richard Biener wrote:
> > > [...]
> > > > > >
> > > > > > +  /* Handle strlen like loops.  */
> > > > > > +  if (store_dr == NULL
> > > > > > +  && integer_zerop (pattern)
> > > > > > +  && TREE_CODE (reduction_iv.base) == INTEGER_CST
> > > > > > +  && TREE_CODE (reduction_iv.step) == INTEGER_CST
> > > > > > +  && integer_onep (reduction_iv.step)
> > > > > > +  && (types_compatible_p (TREE_TYPE (reduction_var), 
> > > > > > size_type_node)
> > > > > > + || TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var
> > > > > > +{
> > > > > >
> > > > > > I wonder what goes wrong with a larger or smaller wrapping IV type?
> > > > > > The iteration
> > > > > > only stops when you load a NUL and the increments just wrap along 
> > > > > > (you're
> > > > > > using the pointer IVs to compute the strlen result).  Can't you 
> > > > > > simply truncate?
> > > > >
> > > > > I think truncation is enough as long as no overflow occurs in strlen 
> > > > > or
> > > > > strlen_using_rawmemchr.
> > > > >
> > > > > > For larger than size_type_node (actually larger than ptr_type_node 
> > > > > > would matter
> > > > > > I guess), the argument is that since pointer wrapping would be 
> > > > > > undefined anyway
> > > > > > the IV cannot wrap either.  Now, the correct check here would IMHO 
> > > > > > be
> > > > > >
> > > > > >   TYPE_PRECISION (TREE_TYPE (reduction_var)) < TYPE_PRECISION
> > > > > > (ptr_type_node)
> > > > > >|| TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (pointer-iv-var))
> > > > > >
> > > > > > ?
> > > > >
> > > > > Regarding the implementation which makes use of rawmemchr:
> > > > >
> > > > > We can count at most PTRDIFF_MAX many bytes without an overflow.  
> > > > > Thus,
> > > > > the maximal length we can determine of a string where each character 
> > > > > has
> > > > > size S is PTRDIFF_MAX / S without an overflow.  Since an overflow for
> > > > > ptrdiff type is undefined we have to make sure that if an overflow
> > > > > occurs, then an overflow occurs for reduction variable, too, and that
> > > > > this is undefined, too.  However, I'm not sure anymore whether we want
> > > > > to respect overflows in all cases.  If TYPE_PRECISION (ptr_type_node)
> > > > > equals TYPE_PRECISION (ptrdiff_type_node) and an overflow occurs, then
> > > > > this would mean that a single string consumes more than half of the
> > > > > virtual addressable memory.  At least for architectures where
> > > > > TYPE_PRECISION (ptrdiff_type_node) == 64 holds, I think it is 
> > > > > reasonable
> > > > > to neglect the case where computing pointer difference may overflow.
> > > > > Otherwise we are talking about strings with lenghts of multiple
> > > > > pebibytes.  For other architectures we might have to be more precise
> > > > > and make sure that reduction variable overflows first and that this is
> > > > > undefined.
> > > > >
> > > > > Thus a conservative condition would be (I assumed that the size of any
> > > > > integral type is a power of two which I'm not sure if this really 
> > > > > holds;
> > > > > IIRC the C standard requires only that the alignment is a power of two
> > > > > but not necessarily the size so I might need to change this):
> > > > >
> > > > > /* Compute precision (reduction_var) < (precision (ptrdiff_type) - 1 
> > > > > - log2 (sizeof (load_type))
> > > > >or in other words return true if reduction variable overflows first
> > > > >and false otherwise.  */
> > > > >
> > > > > static bool
> > > > > reduction_var_overflows_first (tree reduction_var, tree load_type)
> > > > > {
> > > > >   unsigned precision_ptrdiff = TYPE_PRECISION (ptrdiff_type_node);
> > > > >   unsigned precision_reduction_var = TYPE_PRECISION (TREE_TYPE 
> > > > > (reduction_var));
> > > > >   unsigned size_exponent = wi::exact_log2 (wi::to_wide 
> > > > > (TYPE_SIZE_UNIT (load_type)));
> > > > >   return wi::ltu_p (precision_reduction_var, precision_ptrdiff - 1 - 
> > > > > size_exponent);
> > > > > }
> > > > >
> > > > > TYPE_PRECISION (ptrdiff_type_node) == 64
> > > > > || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))
> > > > > && reduction_var_overflows_first (reduction_var, load_type)
> > > > >
> > > > > Regarding the implementation which makes use of strlen:
> > > > >
> > > > > I'm not sure what it means if strlen is called for a string with a
> > > > > length greater than SIZE_MAX.  Therefore, similar to the 
> > > > > implementation
> > > > > using rawmemchr where we neglect the case of an overflow for 64bit
> > > > > architectures, a conservative condition would be:
> > > > >
> > > > > TYPE_PRECISION (size_type_node) == 64
> > > > > || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))
> > > > > 

[PATCH] C++: add type checking for static local vector variable in template

2021-09-17 Thread wangpc via Gcc-patches

Thanks for your advice, I have misunderstood what you meant.

I have sent a second version patch, please review whether it is OK.

On 2021/9/16 23:19, Jason Merrill wrote:

On 9/16/21 05:11, wangpc via Gcc-patches wrote:

This patch adds type checking for static local vector variable in
C++ template, both AArch64 SVE and RISCV RVV are of sizeless type
and they all have this issue.

2021-08-06  wangpc  

gcc/cp/ChangeLog

 * decl.c (cp_finish_decl): Add type checking.

gcc/testsuite/ChangeLog

 * g++.target/aarch64/sve/static-var-in-template.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90111e4c786..e3a06ea0858 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7520,6 +7520,12 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,

    && DECL_INITIALIZED_IN_CLASS_P (decl))
  check_static_variable_definition (decl, type);
  +  if (VAR_P (decl)
+  && DECL_FUNCTION_SCOPE_P (decl)
+  && TREE_STATIC (decl))
+    verify_type_context (DECL_SOURCE_LOCATION (decl),
+  TCTX_STATIC_STORAGE, type);


I was thinking to move the verify_type_context code from start_decl, 
which handles more cases:



  if (is_global_var (decl))
    {
  type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
   ? TCTX_THREAD_STORAGE
   : TCTX_STATIC_STORAGE);
  verify_type_context (input_location, context, TREE_TYPE (decl));
    }


Jason


Re: [PATCH] Fix PR 67102: Add libstdc++ dependancy to libffi

2021-09-17 Thread Andrew Pinski via Gcc-patches
On Fri, Sep 17, 2021 at 12:46 AM Thomas Schwinge
 wrote:
>
> Hi Andrew!
>
> First, I appreciate you working through all these old PRs!
>
>
> On 2021-09-15T13:56:37-0700, apinski--- via Gcc-patches 
>  wrote:
> > The error message is obvious -funconfigured-libstdc++-v3 is used
> > on the g++ command line.  So we just add the dependancy.
>
> > --- a/Makefile.def
> > +++ b/Makefile.def
> > @@ -592,6 +592,7 @@ dependencies = { module=configure-target-fastjar; 
> > on=configure-target-zlib; };
> >  dependencies = { module=all-target-fastjar; on=all-target-zlib; };
> >  dependencies = { module=configure-target-libgo; 
> > on=configure-target-libffi; };
> >  dependencies = { module=configure-target-libgo; 
> > on=all-target-libstdc++-v3; };
> > +dependencies = { module=configure-target-libffi; 
> > on=all-target-libstdc++-v3; };
> >  dependencies = { module=all-target-libgo; on=all-target-libbacktrace; };
> >  dependencies = { module=all-target-libgo; on=all-target-libffi; };
> >  dependencies = { module=all-target-libgo; on=all-target-libatomic; };
>
> I'm confused, because given that this 'Makefile.def' change only has the
> following effect:
>
> > --- a/Makefile.in
> > +++ b/Makefile.in
> > @@ -61261,6 +61261,7 @@ all-bison: maybe-all-intl
> >  all-flex: maybe-all-intl
> >  all-m4: maybe-all-intl
> >  configure-target-libgo: maybe-all-target-libstdc++-v3
> > +configure-target-libffi: maybe-all-target-libstdc++-v3
> >  configure-target-liboffloadmic: maybe-configure-target-libgomp
> >  all-target-liboffloadmic: maybe-all-target-libgomp
> >  configure-target-newlib: maybe-all-binutils
>
> ... isn't that actually a no-op, because we already had such a dependency
> listed?  Now twice:
>
> $ grep -n -F 'configure-target-libffi: maybe-all-target-libstdc++-v3' -- 
> Makefile.in
> 61264:configure-target-libffi: maybe-all-target-libstdc++-v3
> 61372:configure-target-libffi: maybe-all-target-libstdc++-v3
>
> Compared to the existing one, the one you've added is additionally
> restricted by '@unless gcc-bootstrap'.
>
> I noticed this as I remembered that on our og[...] development branches
> we have a patch in the opposite direction: get rid of this dependency via
> removing 'lang_env_dependencies = { module=libffi; cxx=true; };' from
> 'Makefile.def'.  See
> 
> "Disable libstdc++ dependency for libffi".  (Maciej CCed in case you have
> any further thoughts on that.)

Oh, I see what happened now, the old bug was actually fixed by r6-5415
which added cxx=true.
So yes my patch is actually not needed and can be reverted.
I tried to look to see if there was a dependency was there but for
some reason I did not see it.
Also it looks like the OpenACC changes never went to the trunk either 

>
>
> Grüße
>  Thomas
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955


Re: [PATCH 30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.

2021-09-17 Thread Hongtao Liu via Gcc-patches
I'm going to check in 10 patches.

[PATCH 30/62] AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.
[PATCH 31/62] AVX512FP16: Add testcase for
vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.
[PATCH 32/62] AVX512FP16: Add
vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq
[PATCH 33/62] AVX512FP16: Add testcase for
vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2udq/vcvttph2qq/vcvttph2uqq
[PATCH 34/62] AVX512FP16: Add vcvttsh2si/vcvttsh2usi
[PATCH 35/62] AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx
[PATCH 36/62] AVX512FP16: Add testcase for
vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx
[PATCH 37/62] AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh.
[PATCH 38/62] AVX512FP16: Add testcase for
vcvtsh2sd/vcvtsh2ss/vcvtsd2sh/vcvtss2sh
[PATCH 39/62] AVX512FP16: Add intrinsics for casting between vector
float16 and vector float32/float64/integer.

  Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
  Newly added runtime testcase passed on sde{-m32,}.


On Thu, Jul 1, 2021 at 2:17 PM liuhongt  wrote:
>
> gcc/ChangeLog:
>
> * config/i386/avx512fp16intrin.h (_mm_cvtsh_i32): New intrinsic.
> (_mm_cvtsh_u32): Likewise.
> (_mm_cvt_roundsh_i32): Likewise.
> (_mm_cvt_roundsh_u32): Likewise.
> (_mm_cvtsh_i64): Likewise.
> (_mm_cvtsh_u64): Likewise.
> (_mm_cvt_roundsh_i64): Likewise.
> (_mm_cvt_roundsh_u64): Likewise.
> (_mm_cvti32_sh): Likewise.
> (_mm_cvtu32_sh): Likewise.
> (_mm_cvt_roundi32_sh): Likewise.
> (_mm_cvt_roundu32_sh): Likewise.
> (_mm_cvti64_sh): Likewise.
> (_mm_cvtu64_sh): Likewise.
> (_mm_cvt_roundi64_sh): Likewise.
> (_mm_cvt_roundu64_sh): Likewise.
> * config/i386/i386-builtin-types.def: Add corresponding builtin types.
> * config/i386/i386-builtin.def: Add corresponding new builtins.
> * config/i386/i386-expand.c (ix86_expand_round_builtin):
> Handle new builtin types.
> * config/i386/sse.md
> 
> (avx512fp16_vcvtsh2si):
> New define_insn.
> (avx512fp16_vcvtsh2si_2): 
> Likewise.
> (avx512fp16_vcvtsi2sh): 
> Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx-1.c: Add test for new builtins.
> * gcc.target/i386/sse-13.c: Ditto.
> * gcc.target/i386/sse-23.c: Ditto.
> * gcc.target/i386/sse-14.c: Add test for new intrinsics.
> * gcc.target/i386/sse-22.c: Ditto.
> ---
>  gcc/config/i386/avx512fp16intrin.h | 158 +
>  gcc/config/i386/i386-builtin-types.def |   8 ++
>  gcc/config/i386/i386-builtin.def   |   8 ++
>  gcc/config/i386/i386-expand.c  |   8 ++
>  gcc/config/i386/sse.md |  46 +++
>  gcc/testsuite/gcc.target/i386/avx-1.c  |   8 ++
>  gcc/testsuite/gcc.target/i386/sse-13.c |   8 ++
>  gcc/testsuite/gcc.target/i386/sse-14.c |  10 ++
>  gcc/testsuite/gcc.target/i386/sse-22.c |  10 ++
>  gcc/testsuite/gcc.target/i386/sse-23.c |   8 ++
>  10 files changed, 272 insertions(+)
>
> diff --git a/gcc/config/i386/avx512fp16intrin.h 
> b/gcc/config/i386/avx512fp16intrin.h
> index bd801942365..7524a8d6a5b 100644
> --- a/gcc/config/i386/avx512fp16intrin.h
> +++ b/gcc/config/i386/avx512fp16intrin.h
> @@ -3529,6 +3529,164 @@ _mm512_maskz_cvt_roundepu16_ph (__mmask32 __A, 
> __m512i __B, int __C)
>
>  #endif /* __OPTIMIZE__ */
>
> +/* Intrinsics vcvtsh2si, vcvtsh2us.  */
> +extern __inline int
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtsh_i32 (__m128h __A)
> +{
> +  return (int) __builtin_ia32_vcvtsh2si32_round (__A, 
> _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline unsigned
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtsh_u32 (__m128h __A)
> +{
> +  return (int) __builtin_ia32_vcvtsh2usi32_round (__A,
> + _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +#ifdef __OPTIMIZE__
> +extern __inline int
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvt_roundsh_i32 (__m128h __A, const int __R)
> +{
> +  return (int) __builtin_ia32_vcvtsh2si32_round (__A, __R);
> +}
> +
> +extern __inline unsigned
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvt_roundsh_u32 (__m128h __A, const int __R)
> +{
> +  return (int) __builtin_ia32_vcvtsh2usi32_round (__A, __R);
> +}
> +
> +#else
> +#define _mm_cvt_roundsh_i32(A, B)  \
> +  ((int)__builtin_ia32_vcvtsh2si32_round ((A), (B)))
> +#define _mm_cvt_roundsh_u32(A, B)  \
> +  ((int)__builtin_ia32_vcvtsh2usi32_round ((A), (B)))
> +
> +#endif /* __OPTIMIZE__ */
> +
> +#ifdef __x86_64__
> +extern __inline long long
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtsh_i64 (__m128h __A)
> +{
> +  return (long long)
> +__builtin_ia32_vcvtsh2si64_round (__A, _MM_FROUND_CUR_DIRECTION);
> +}
> +
> +extern __inline unsigned 

Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-09-17 Thread Hongtao Liu via Gcc-patches
On Fri, Sep 17, 2021 at 3:47 PM Richard Biener  wrote:
>
> On Fri, 17 Sep 2021, Hongtao Liu wrote:
>
> > On Thu, Sep 16, 2021 at 8:31 PM Richard Biener  wrote:
> > >
> > > On Thu, 16 Sep 2021, Hongtao Liu wrote:
> > >
> > > > On Thu, Sep 16, 2021 at 4:23 PM Richard Biener via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > On Thu, 16 Sep 2021, liuhongt wrote:
> > > > >
> > > > > > Ping
> > > > > > rebased on latest trunk.
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >   * common.opt (ftree-vectorize): Add Var(flag_tree_vectorize).
> > > > > >   * doc/invoke.texi (Options That Control Optimization): Update
> > > > > >   documents.
> > > > > >   * opts.c (default_options_table): Enable auto-vectorization at
> > > > > >   O2 with very-cheap cost model.
> > > > > >   (finish_options): Use cheap cost model for
> > > > > >   explicit -ftree{,-loop}-vectorize.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > >   * c-c++-common/Wstringop-overflow-2.c: Adjust testcase.
> > > > > >   * g++.dg/tree-ssa/pr81408.C: Ditto.
> > > > > >   * g++.dg/warn/Wuninitialized-13.C: Ditto.
> > > > > >   * gcc.dg/Warray-bounds-51.c: Ditto.
> > > > > >   * gcc.dg/Warray-parameter-3.c: Ditto.
> > > > > >   * gcc.dg/Wstringop-overflow-13.c: Ditto.
> > > > > >   * gcc.dg/Wstringop-overflow-14.c: Ditto.
> > > > > >   * gcc.dg/Wstringop-overflow-21.c: Ditto.
> > > > > >   * gcc.dg/Wstringop-overflow-68.c: Ditto.
> > > > > >   * gcc.dg/gomp/pr46032-2.c: Ditto.
> > > > > >   * gcc.dg/gomp/pr46032-3.c: Ditto.
> > > > > >   * gcc.dg/gomp/simd-2.c: Ditto.
> > > > > >   * gcc.dg/gomp/simd-3.c: Ditto.
> > > > > >   * gcc.dg/graphite/fuse-1.c: Ditto.
> > > > > >   * gcc.dg/pr67089-6.c: Ditto.
> > > > > >   * gcc.dg/pr82929-2.c: Ditto.
> > > > > >   * gcc.dg/pr82929.c: Ditto.
> > > > > >   * gcc.dg/store_merging_1.c: Ditto.
> > > > > >   * gcc.dg/store_merging_11.c: Ditto.
> > > > > >   * gcc.dg/store_merging_15.c: Ditto.
> > > > > >   * gcc.dg/store_merging_16.c: Ditto.
> > > > > >   * gcc.dg/store_merging_19.c: Ditto.
> > > > > >   * gcc.dg/store_merging_24.c: Ditto.
> > > > > >   * gcc.dg/store_merging_25.c: Ditto.
> > > > > >   * gcc.dg/store_merging_28.c: Ditto.
> > > > > >   * gcc.dg/store_merging_30.c: Ditto.
> > > > > >   * gcc.dg/store_merging_5.c: Ditto.
> > > > > >   * gcc.dg/store_merging_7.c: Ditto.
> > > > > >   * gcc.dg/store_merging_8.c: Ditto.
> > > > > >   * gcc.dg/strlenopt-85.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/dump-6.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/pr19210-1.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/pr47059.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/pr86017.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/pr91482.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/predcom-1.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/predcom-dse-3.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/prefetch-3.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/prefetch-6.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/prefetch-8.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/prefetch-9.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/ssa-dse-18.c: Ditto.
> > > > > >   * gcc.dg/tree-ssa/ssa-dse-19.c: Ditto.
> > > > > >   * gcc.dg/uninit-40.c: Ditto.
> > > > > >   * gcc.dg/unroll-7.c: Ditto.
> > > > > >   * gcc.misc-tests/help.exp: Ditto.
> > > > > >   * gcc.target/i386/avx512vpopcntdqvl-vpopcntd-1.c: Ditto.
> > > > > >   * gcc.target/i386/pr22141.c: Ditto.
> > > > > >   * gcc.target/i386/pr34012.c: Ditto.
> > > > > >   * gcc.target/i386/pr49781-1.c: Ditto.
> > > > > >   * gcc.target/i386/pr95798-1.c: Ditto.
> > > > > >   * gcc.target/i386/pr95798-2.c: Ditto.
> > > > > >   * gfortran.dg/pr77498.f: Ditto.
> > > > > > ---
> > > > > >  gcc/common.opt |  2 +-
> > > > > >  gcc/doc/invoke.texi|  8 +---
> > > > > >  gcc/opts.c | 18 
> > > > > > +++---
> > > > > >  .../c-c++-common/Wstringop-overflow-2.c|  2 +-
> > > > > >  gcc/testsuite/g++.dg/tree-ssa/pr81408.C|  2 +-
> > > > > >  gcc/testsuite/g++.dg/warn/Wuninitialized-13.C  |  2 +-
> > > > > >  gcc/testsuite/gcc.dg/Warray-bounds-51.c|  2 +-
> > > > > >  gcc/testsuite/gcc.dg/Warray-parameter-3.c  |  2 +-
> > > > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-13.c   |  2 +-
> > > > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-14.c   |  2 +-
> > > > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-21.c   |  2 +-
> > > > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-68.c   |  2 +-
> > > > > >  gcc/testsuite/gcc.dg/gomp/pr46032-2.c  |  2 +-
> > > > > >  gcc/testsuite/gcc.dg/gomp/pr46032-3.c  |  2 +-
> > > > > >  gcc/testsuite/gcc.dg/gomp/simd-2.c |  2 +-
> > > > > >  gcc/testsuite/gcc.dg/gomp/simd-3.c 

[PATCH v2] C++: add type checking for static local vector variable in template

2021-09-17 Thread wangpc via Gcc-patches
This patch moves verify_type_context from start_decl_1 to cp_finish_decl
and adds type checking for static local vector variable in C++ template.

2021-08-06  wangpc  

gcc/cp/ChangeLog

* decl.c (start_decl_1): Remove verify_type_context.
(cp_finish_decl): Add more type checking.

gcc/testsuite/ChangeLog

* g++.target/aarch64/sve/static-var-in-template.C: New test.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 90111e4c786..d411963896a 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -5491,13 +5491,6 @@ start_decl_1 (tree decl, bool initialized)
   cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
 }
 
-  if (is_global_var (decl))
-{
-  type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
-  ? TCTX_THREAD_STORAGE
-  : TCTX_STATIC_STORAGE);
-  verify_type_context (input_location, context, TREE_TYPE (decl));
-}
   if (initialized)
 /* Is it valid for this decl to have an initializer at all?  */
 {
@@ -7520,6 +7513,22 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
   && DECL_INITIALIZED_IN_CLASS_P (decl))
 check_static_variable_definition (decl, type);
 
+  if (!processing_template_decl && VAR_P (decl))
+{
+  if (is_global_var (decl))
+   {
+ type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
+ ? TCTX_THREAD_STORAGE
+ : TCTX_STATIC_STORAGE);
+ verify_type_context (input_location, context, TREE_TYPE (decl));
+   }
+
+  if (DECL_FUNCTION_SCOPE_P (decl)
+ && TREE_STATIC (decl))
+   verify_type_context (DECL_SOURCE_LOCATION (decl),
+TCTX_STATIC_STORAGE, type);
+}
+
   if (init && TREE_CODE (decl) == FUNCTION_DECL)
 {
   tree clone;
diff --git a/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C 
b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
new file mode 100644
index 000..c2395d18d50
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/static-var-in-template.C
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+#include 
+
+template 
+void f()
+{
+static svbool_t pg = svwhilelt_b64(0, N);
+}
+
+int main(int argc, char **argv)
+{
+f<2>();
+return 0;
+}
+
+/* { dg-error {SVE type 'svbool_t' does not have a fixed size} } */
-- 
2.33.0.windows.1



Re: [PATCH] Enable auto-vectorization at O2 with very-cheap cost model.

2021-09-17 Thread Richard Biener via Gcc-patches
On Fri, 17 Sep 2021, Hongtao Liu wrote:

> On Thu, Sep 16, 2021 at 8:31 PM Richard Biener  wrote:
> >
> > On Thu, 16 Sep 2021, Hongtao Liu wrote:
> >
> > > On Thu, Sep 16, 2021 at 4:23 PM Richard Biener via Gcc-patches
> > >  wrote:
> > > >
> > > > On Thu, 16 Sep 2021, liuhongt wrote:
> > > >
> > > > > Ping
> > > > > rebased on latest trunk.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >   * common.opt (ftree-vectorize): Add Var(flag_tree_vectorize).
> > > > >   * doc/invoke.texi (Options That Control Optimization): Update
> > > > >   documents.
> > > > >   * opts.c (default_options_table): Enable auto-vectorization at
> > > > >   O2 with very-cheap cost model.
> > > > >   (finish_options): Use cheap cost model for
> > > > >   explicit -ftree{,-loop}-vectorize.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >   * c-c++-common/Wstringop-overflow-2.c: Adjust testcase.
> > > > >   * g++.dg/tree-ssa/pr81408.C: Ditto.
> > > > >   * g++.dg/warn/Wuninitialized-13.C: Ditto.
> > > > >   * gcc.dg/Warray-bounds-51.c: Ditto.
> > > > >   * gcc.dg/Warray-parameter-3.c: Ditto.
> > > > >   * gcc.dg/Wstringop-overflow-13.c: Ditto.
> > > > >   * gcc.dg/Wstringop-overflow-14.c: Ditto.
> > > > >   * gcc.dg/Wstringop-overflow-21.c: Ditto.
> > > > >   * gcc.dg/Wstringop-overflow-68.c: Ditto.
> > > > >   * gcc.dg/gomp/pr46032-2.c: Ditto.
> > > > >   * gcc.dg/gomp/pr46032-3.c: Ditto.
> > > > >   * gcc.dg/gomp/simd-2.c: Ditto.
> > > > >   * gcc.dg/gomp/simd-3.c: Ditto.
> > > > >   * gcc.dg/graphite/fuse-1.c: Ditto.
> > > > >   * gcc.dg/pr67089-6.c: Ditto.
> > > > >   * gcc.dg/pr82929-2.c: Ditto.
> > > > >   * gcc.dg/pr82929.c: Ditto.
> > > > >   * gcc.dg/store_merging_1.c: Ditto.
> > > > >   * gcc.dg/store_merging_11.c: Ditto.
> > > > >   * gcc.dg/store_merging_15.c: Ditto.
> > > > >   * gcc.dg/store_merging_16.c: Ditto.
> > > > >   * gcc.dg/store_merging_19.c: Ditto.
> > > > >   * gcc.dg/store_merging_24.c: Ditto.
> > > > >   * gcc.dg/store_merging_25.c: Ditto.
> > > > >   * gcc.dg/store_merging_28.c: Ditto.
> > > > >   * gcc.dg/store_merging_30.c: Ditto.
> > > > >   * gcc.dg/store_merging_5.c: Ditto.
> > > > >   * gcc.dg/store_merging_7.c: Ditto.
> > > > >   * gcc.dg/store_merging_8.c: Ditto.
> > > > >   * gcc.dg/strlenopt-85.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/dump-6.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/pr19210-1.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/pr47059.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/pr86017.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/pr91482.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/predcom-1.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/predcom-dse-3.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/prefetch-3.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/prefetch-6.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/prefetch-8.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/prefetch-9.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/ssa-dse-18.c: Ditto.
> > > > >   * gcc.dg/tree-ssa/ssa-dse-19.c: Ditto.
> > > > >   * gcc.dg/uninit-40.c: Ditto.
> > > > >   * gcc.dg/unroll-7.c: Ditto.
> > > > >   * gcc.misc-tests/help.exp: Ditto.
> > > > >   * gcc.target/i386/avx512vpopcntdqvl-vpopcntd-1.c: Ditto.
> > > > >   * gcc.target/i386/pr22141.c: Ditto.
> > > > >   * gcc.target/i386/pr34012.c: Ditto.
> > > > >   * gcc.target/i386/pr49781-1.c: Ditto.
> > > > >   * gcc.target/i386/pr95798-1.c: Ditto.
> > > > >   * gcc.target/i386/pr95798-2.c: Ditto.
> > > > >   * gfortran.dg/pr77498.f: Ditto.
> > > > > ---
> > > > >  gcc/common.opt |  2 +-
> > > > >  gcc/doc/invoke.texi|  8 +---
> > > > >  gcc/opts.c | 18 
> > > > > +++---
> > > > >  .../c-c++-common/Wstringop-overflow-2.c|  2 +-
> > > > >  gcc/testsuite/g++.dg/tree-ssa/pr81408.C|  2 +-
> > > > >  gcc/testsuite/g++.dg/warn/Wuninitialized-13.C  |  2 +-
> > > > >  gcc/testsuite/gcc.dg/Warray-bounds-51.c|  2 +-
> > > > >  gcc/testsuite/gcc.dg/Warray-parameter-3.c  |  2 +-
> > > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-13.c   |  2 +-
> > > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-14.c   |  2 +-
> > > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-21.c   |  2 +-
> > > > >  gcc/testsuite/gcc.dg/Wstringop-overflow-68.c   |  2 +-
> > > > >  gcc/testsuite/gcc.dg/gomp/pr46032-2.c  |  2 +-
> > > > >  gcc/testsuite/gcc.dg/gomp/pr46032-3.c  |  2 +-
> > > > >  gcc/testsuite/gcc.dg/gomp/simd-2.c |  2 +-
> > > > >  gcc/testsuite/gcc.dg/gomp/simd-3.c |  2 +-
> > > > >  gcc/testsuite/gcc.dg/graphite/fuse-1.c |  2 +-
> > > > >  gcc/testsuite/gcc.dg/pr67089-6.c   |  2 +-
> > > > >  gcc/testsuite/gcc.dg/pr82929-2.c   |  2 +-
> > > > >  gcc/testsuite/gcc.dg/pr82929.c   

Re: [PATCH] Fix PR 67102: Add libstdc++ dependancy to libffi

2021-09-17 Thread Thomas Schwinge
Hi Andrew!

First, I appreciate you working through all these old PRs!


On 2021-09-15T13:56:37-0700, apinski--- via Gcc-patches 
 wrote:
> The error message is obvious -funconfigured-libstdc++-v3 is used
> on the g++ command line.  So we just add the dependancy.

> --- a/Makefile.def
> +++ b/Makefile.def
> @@ -592,6 +592,7 @@ dependencies = { module=configure-target-fastjar; 
> on=configure-target-zlib; };
>  dependencies = { module=all-target-fastjar; on=all-target-zlib; };
>  dependencies = { module=configure-target-libgo; on=configure-target-libffi; 
> };
>  dependencies = { module=configure-target-libgo; on=all-target-libstdc++-v3; 
> };
> +dependencies = { module=configure-target-libffi; on=all-target-libstdc++-v3; 
> };
>  dependencies = { module=all-target-libgo; on=all-target-libbacktrace; };
>  dependencies = { module=all-target-libgo; on=all-target-libffi; };
>  dependencies = { module=all-target-libgo; on=all-target-libatomic; };

I'm confused, because given that this 'Makefile.def' change only has the
following effect:

> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -61261,6 +61261,7 @@ all-bison: maybe-all-intl
>  all-flex: maybe-all-intl
>  all-m4: maybe-all-intl
>  configure-target-libgo: maybe-all-target-libstdc++-v3
> +configure-target-libffi: maybe-all-target-libstdc++-v3
>  configure-target-liboffloadmic: maybe-configure-target-libgomp
>  all-target-liboffloadmic: maybe-all-target-libgomp
>  configure-target-newlib: maybe-all-binutils

... isn't that actually a no-op, because we already had such a dependency
listed?  Now twice:

$ grep -n -F 'configure-target-libffi: maybe-all-target-libstdc++-v3' -- 
Makefile.in
61264:configure-target-libffi: maybe-all-target-libstdc++-v3
61372:configure-target-libffi: maybe-all-target-libstdc++-v3

Compared to the existing one, the one you've added is additionally
restricted by '@unless gcc-bootstrap'.

I noticed this as I remembered that on our og[...] development branches
we have a patch in the opposite direction: get rid of this dependency via
removing 'lang_env_dependencies = { module=libffi; cxx=true; };' from
'Makefile.def'.  See

"Disable libstdc++ dependency for libffi".  (Maciej CCed in case you have
any further thoughts on that.)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS

2021-09-17 Thread Uros Bizjak via Gcc-patches
On Fri, Sep 17, 2021 at 5:15 AM Cui, Lili  wrote:
>
>
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Thursday, September 16, 2021 2:28 PM
> > To: Cui, Lili 
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; H. J. Lu
> > 
> > Subject: Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle
> > USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS
> >
> > On Wed, Sep 15, 2021 at 10:10 AM  wrote:
> > >
> > > From: "H.J. Lu" 
> > >
> > > Check TARGET_USE_VECTOR_FP_CONVERTS or
> > TARGET_USE_VECTOR_CONVERTS when
> > > handling avx_partial_xmm_update attribute.  Don't convert AVX partial
> > > XMM register update if vector packed SSE conversion should be used.
> > >
> > > gcc/
> > >
> > > PR target/101900
> > > * config/i386/i386-features.c (remove_partial_avx_dependency):
> > > Check TARGET_USE_VECTOR_FP_CONVERTS and
> > TARGET_USE_VECTOR_CONVERTS
> > > before generating vxorps.
> > >
> > > gcc/
> > >
> > > PR target/101900
> > > * testsuite/gcc.target/i386/pr101900-1.c: New test.
> > > * testsuite/gcc.target/i386/pr101900-2.c: Likewise.
> > > * testsuite/gcc.target/i386/pr101900-3.c: Likewise.
> > > ---
> > >  gcc/config/i386/i386-features.c| 21 ++---
> > >  gcc/testsuite/gcc.target/i386/pr101900-1.c | 18 ++
> > > gcc/testsuite/gcc.target/i386/pr101900-2.c | 18 ++
> > > gcc/testsuite/gcc.target/i386/pr101900-3.c | 19 +++
> > >  4 files changed, 73 insertions(+), 3 deletions(-)  create mode 100644
> > > gcc/testsuite/gcc.target/i386/pr101900-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101900-3.c
> > >
> > > diff --git a/gcc/config/i386/i386-features.c
> > > b/gcc/config/i386/i386-features.c index 5a99ea7c046..ae5ea02a002
> > > 100644
> > > --- a/gcc/config/i386/i386-features.c
> > > +++ b/gcc/config/i386/i386-features.c
> > > @@ -2210,15 +2210,30 @@ remove_partial_avx_dependency (void)
> > >   != AVX_PARTIAL_XMM_UPDATE_TRUE)
> > > continue;
> > >
> > > - if (!v4sf_const0)
> > > -   v4sf_const0 = gen_reg_rtx (V4SFmode);
> > > -
> > >   /* Convert PARTIAL_XMM_UPDATE_TRUE insns, DF -> SF, SF -> DF,
> > >  SI -> SF, SI -> DF, DI -> SF, DI -> DF, to vec_dup and
> > >  vec_merge with subreg.  */
> > >   rtx src = SET_SRC (set);
> > >   rtx dest = SET_DEST (set);
> > >   machine_mode dest_mode = GET_MODE (dest);
> > > + machine_mode src_mode;
> > > +
> > > + if (TARGET_USE_VECTOR_FP_CONVERTS)
> > > +   {
> > > + src_mode = GET_MODE (XEXP (src, 0));
> > > + if (src_mode == E_SFmode || src_mode == E_DFmode)
> > > +   continue;
> > > +   }
> > > +
> > > + if (TARGET_USE_VECTOR_CONVERTS)
> > > +   {
> > > + src_mode = GET_MODE (XEXP (src, 0));
> > > + if (src_mode == E_SImode || src_mode == E_DImode)
> > > +   continue;
> > > +   }
> > > +
> > > + if (!v4sf_const0)
> > > +   v4sf_const0 = gen_reg_rtx (V4SFmode);
> >
> > Please better move initialization of src_mode to the top of the new hunk, 
> > like:
> >
> > machine_mode src_mode = GET_MODE (XEXP (src, 0)); switch (src_mode) {
> >   case E_SFmode:
> >   case E_DFmode:
> > if (TARGET_USE_VECTOR_FP_CONVERTS)
> >   continue;
> > break;
> >   case E_SImode:
> >   case E_DImode:
> > if (TARGET_USE_VECTOR_CONVERTS)
> >   continue;
> > break;
> >   default:
> > break;
> > }
> >
> > or something like the above.
>
> Done, thanks for your good advice, I also rebased patch 4/4, since it is 
> based on patch 3/4.

OK.

Thanks,
Uros.

>
> Changed it to:
>
> + machine_mode src_mode = GET_MODE (XEXP (src, 0));
> +
> + switch (src_mode)
> +   {
> +   case E_SFmode:
> +   case E_DFmode:
> + if (TARGET_USE_VECTOR_FP_CONVERTS)
> +   continue;
> + break;
> +   case E_SImode:
> +   case E_DImode:
> + if (TARGET_USE_VECTOR_CONVERTS)
> +   continue;
> + break;
> +   default:
> + break;
> +   }
> + if (!v4sf_const0)
> +   v4sf_const0 = gen_reg_rtx (V4SFmode);
>
> Thanks,
> Lili.
>
> >
> > Uros.
> >
> > >
> > >   rtx zero;
> > >   machine_mode dest_vecmode;
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr101900-1.c
> > > b/gcc/testsuite/gcc.target/i386/pr101900-1.c
> > > new file mode 100644
> > > index 000..0a45f8e340a
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr101900-1.c
> > > @@ -0,0 +1,18 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -march=skylake -mfpmath=sse
> > > +-mtune-ctrl=use_vector_fp_converts" } */
> > > +
> > > +extern float f;
> > > +extern double d;
> > > 

  1   2   >