[r12-7693 Regression] FAIL: gcc.target/i386/pr90356.c scan-assembler pxor on Linux/x86_64

2022-03-17 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

c482c28ba4c549006deb70dead90fe8ab34dcbcf is the first bad commit
commit c482c28ba4c549006deb70dead90fe8ab34dcbcf
Author: Roger Sayle 
Date:   Thu Mar 17 21:56:32 2022 +

PR 90356: Use xor to load const_double 0.0 on SSE (always)

caused

FAIL: gcc.target/i386/pr86722.c scan-assembler-not orpd
FAIL: gcc.target/i386/pr90356.c scan-assembler pxor

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-7693/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr86722.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr90356.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr90356.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] x86: Correct march=sapphirerapids to base on icelake server

2022-03-17 Thread Cui,Lili via Gcc-patches
Hi Hongtao,

This patch is to correct march=sapphirerapids to base on icelake server.
and update sapphirerapids in the documentation.

OK for master and backport to GCC 11?


gcc/Changelog:

PR target/104963
* config/i386/i386.h (PTA_SAPPHIRERAPIDS): change it to base on ICX.
* doc/invoke.texi: Update documents for Intel sapphirerapids.

gcc/testsuite/ChangeLog

PR target/104963
* gcc.target/i386/pr104963.c: New test case.
---
 gcc/config/i386/i386.h   |  5 +++--
 gcc/doc/invoke.texi  | 11 ++-
 gcc/testsuite/gcc.target/i386/pr104963.c | 12 
 3 files changed, 21 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104963.c

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 37b523cea4f..b92955177fe 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2323,10 +2323,11 @@ constexpr wide_int_bitmask PTA_ICELAKE_SERVER = 
PTA_ICELAKE_CLIENT
   | PTA_PCONFIG | PTA_WBNOINVD | PTA_CLWB;
 constexpr wide_int_bitmask PTA_TIGERLAKE = PTA_ICELAKE_CLIENT | PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_CLWB | PTA_AVX512VP2INTERSECT | PTA_KL | PTA_WIDEKL;
-constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_COOPERLAKE | PTA_MOVDIRI
+constexpr wide_int_bitmask PTA_SAPPHIRERAPIDS = PTA_ICELAKE_SERVER | 
PTA_MOVDIRI
   | PTA_MOVDIR64B | PTA_AVX512VP2INTERSECT | PTA_ENQCMD | PTA_CLDEMOTE
   | PTA_PTWRITE | PTA_WAITPKG | PTA_SERIALIZE | PTA_TSXLDTRK | PTA_AMX_TILE
-  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI | PTA_AVX512FP16;
+  | PTA_AMX_INT8 | PTA_AMX_BF16 | PTA_UINTR | PTA_AVXVNNI | PTA_AVX512FP16
+  | PTA_AVX512BF16;
 constexpr wide_int_bitmask PTA_KNL = PTA_BROADWELL | PTA_AVX512PF
   | PTA_AVX512ER | PTA_AVX512F | PTA_AVX512CD | PTA_PREFETCHWT1;
 constexpr wide_int_bitmask PTA_BONNELL = PTA_CORE2 | PTA_MOVBE;
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d65979bba3f..59baa5e5747 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -31288,11 +31288,12 @@ AVX512VP2INTERSECT and KEYLOCKER instruction set 
support.
 Intel sapphirerapids CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3,
 SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE,
 RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MOVBE, HLE, RDSEED, ADCX, PREFETCHW,
-AES, CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, CLWB, AVX512VL, AVX512BW,
-AVX512DQ, AVX512CD, AVX512VNNI, AVX512BF16 MOVDIRI, MOVDIR64B,
-AVX512VP2INTERSECT, ENQCMD, CLDEMOTE, PTWRITE, WAITPKG, SERIALIZE, TSXLDTRK,
-UINTR, AMX-BF16, AMX-TILE, AMX-INT8, AVX-VNNI and AVX512FP16 instruction set
-support.
+AES, CLFLUSHOPT, XSAVEC, XSAVES, SGX, AVX512F, AVX512VL, AVX512BW, AVX512DQ,
+AVX512CD, PKU, AVX512VBMI, AVX512IFMA, SHA, AVX512VNNI, GFNI, VAES, AVX512VBMI2
+VPCLMULQDQ, AVX512BITALG, RDPID, AVX512VPOPCNTDQ, PCONFIG, WBNOINVD, CLWB,
+MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, ENQCMD, CLDEMOTE, PTWRITE, WAITPKG,
+SERIALIZE, TSXLDTRK, UINTR, AMX-BF16, AMX-TILE, AMX-INT8, AVX-VNNI and
+AVX512FP16 instruction set support.
 
 @item alderlake
 Intel Alderlake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3,
diff --git a/gcc/testsuite/gcc.target/i386/pr104963.c 
b/gcc/testsuite/gcc.target/i386/pr104963.c
new file mode 100644
index 000..19000671ebf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104963.c
@@ -0,0 +1,12 @@
+/* PR target/104963 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=sapphirerapids" } */
+
+#include
+
+__m512i
+foo (__m512i a, __m512i b)
+{
+return _mm512_permutexvar_epi8(a, b);
+}
+
-- 
2.17.1

Thanks.


Re: [PATCH] Ignore (possible) signed zeros in operands of FP comparisons.

2022-03-17 Thread Andrew MacLeod via Gcc-patches

On 3/17/22 19:27, Jeff Law via Gcc-patches wrote:


On 3/15/2022 2:03 AM, Roger Sayle wrote:




Speaking of tree-ssa passes that could be improved, I was wondering 
whether
you could review my EVRP patch to fix regression PR/102950. Pretty 
please?

https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589569.html


I forwarded this to Aldy & Andrew.  I suspect they missed it.


I saw it originally, and was happy to not have to figure out the bits 
myself :-)  The more people pitching in to range-ops the better!!


I'm certainly fine with it, but the gory details of the bit twiddling it 
beyond my scope of expertise. Figured Jakub or richi were better shots 
at that part.


Andrew





Re: [PATCH] Ignore (possible) signed zeros in operands of FP comparisons.

2022-03-17 Thread Jeff Law via Gcc-patches



On 3/15/2022 2:03 AM, Roger Sayle wrote:

-Original Message-
From: Richard Biener 
Sent: 15 March 2022 07:29
To: Roger Sayle 
Cc: GCC Patches 
Subject: Re: [PATCH] Ignore (possible) signed zeros in operands of FP
comparisons.

On Mon, Mar 14, 2022 at 8:26 PM Roger Sayle
 wrote:


I've been wondering about the possible performance/missed-optimization
impact of my patch for PR middle-end/98420 and similar IEEE
correctness fixes that disable constant folding optimizations when worrying

about -0.0.

In the common situation where the floating point result is used by a
FP comparison, there's no distinction between +0.0 and -0.0, so some
HONOR_SIGNED_ZEROS optimizations that we'd usually disable, are safe.

Consider the following interesting example:

int foo(int x, double y) {
 return (x * 0.0) < y;
}

Although we know that x (when converted to double) can't be NaN or
Inf, we still worry that for negative values of x that (x * 0.0) may
be -0.0 and so perform the multiplication at run-time.  But in this
case, the result of the comparison (-0.0 < y) will be exactly the same
as (+0.0 < y) for any y, hence the above may be safely constant folded to "0.0 <

y"

avoiding the multiplication at run-time.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures, and allows GCC to continue to
optimize cases that we optimized in GCC 11 (without regard to correctness).
Ok for mainline?

Isn't that something that gimple-ssa-backprop.c is designed to handle?  I wonder
if you can see whether the signed zero speciality can be retrofitted there?
It currently tracks "sign does not matter", so possibly another state, "sign of
zero does not matter" could be introduced there.

Two questions. Would adding tracking of "sign of zero does not matter" to
gimple-ssa-backprop.c be suitable for stage4?  Secondly, even if 
gimple-ssa-backprop.c
performed this kind of optimization, would that be a reason not to support
these transformations in match.pd?  Perhaps someone could open a missed
optimization PR for backprop in Bugzilla, but the above patch still needs to be
reviewed on its own merits.


Can't see how it's appropriate for stage4, but definitely interesting 
for gcc-13.


It'd fit well into some of the Ranger plans too -- Aldy and Andrew have 
been talking about tracking the special FP values in Ranger.   This is 
related, though not exactly the same since rather than tracking the 
special value, you're tracking if those special values actually matter.  
If you're going to do more work in this space, you might want to reach 
out to Aldy and Andrew to see if there's space for collaboration.





Speaking of tree-ssa passes that could be improved, I was wondering whether
you could review my EVRP patch to fix regression PR/102950.  Pretty please?
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589569.html


I forwarded this to Aldy & Andrew.  I suspect they missed it.




Thanks (as always),


No, thank you.  I'm so happy to see you contributing to GCC regularly again!


Jeff





[committed] analyzer: fixes to -fdump-analyzer-state-purge

2022-03-17 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7695-g79e210f0c8e1fa

gcc/analyzer/ChangeLog:
* state-purge.cc (state_purge_annotator::add_node_annotations):
Avoid duplicate before-supernode annotations when returning from
an interprocedural call.  Show after-supernode annotations.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/state-purge.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/analyzer/state-purge.cc b/gcc/analyzer/state-purge.cc
index e99c9cb593e..c37234ff16a 100644
--- a/gcc/analyzer/state-purge.cc
+++ b/gcc/analyzer/state-purge.cc
@@ -499,11 +499,12 @@ state_purge_annotator::add_node_annotations (graphviz_out 
*gv,
/* Different in-edges mean different names need purging.
   Determine which points to dump.  */
auto_vec points;
-   if (n.entry_p ())
+   if (n.entry_p () || n.m_returning_call)
  points.safe_push (function_point::before_supernode (, NULL));
else
  for (auto inedge : n.m_preds)
points.safe_push (function_point::before_supernode (, inedge));
+   points.safe_push (function_point::after_supernode ());
 
for (auto & point : points)
  {
-- 
2.26.3



[committed] analyzer: fix program_point::get_next for PK_BEFORE_STMT

2022-03-17 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r12-7694-g2b3404357a1f99.

gcc/analyzer/ChangeLog:
* program-point.cc (program_point::get_next): Fix missing
increment of index.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/program-point.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/analyzer/program-point.cc b/gcc/analyzer/program-point.cc
index 5318dce1490..9e264b14735 100644
--- a/gcc/analyzer/program-point.cc
+++ b/gcc/analyzer/program-point.cc
@@ -645,7 +645,7 @@ program_point::get_next () const
return after_supernode (get_supernode (), get_call_string ());
 case PK_BEFORE_STMT:
   {
-   unsigned next_idx = get_stmt_idx ();
+   unsigned next_idx = get_stmt_idx () + 1;
if (next_idx < get_supernode ()->m_stmts.length ())
  return before_stmt (get_supernode (), next_idx, get_call_string ());
else
-- 
2.26.3



Re: [PATCHv2, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-03-17 Thread will schmidt via Gcc-patches
On Thu, 2022-03-17 at 13:35 +0800, HAO CHEN GUI via Gcc-patches wrote:
> Hi,
>This patch adds V1TI mode into a new mode iterator used in vector
> comparison expands.With the patch, both built-ins and direct
> comparison
> could generate P10 new V1TI comparison instructions.

Hi,


-/* We deliberately omit RS6000_BIF_CMPGE_1TI ...
-   for now, because gimple folding produces worse code for 128-bit
-   compares.  */


I assume it is the case, but don't see a before/after example to
clarify the situation.   A clear statement that the 'worse code'
situation has been resolved with this addition of TI modes into the
iterators, would be good.

Otherwise lgtm.  :-)

Thanks,
-Will


> 
>Bootstrapped and tested on ppc64 Linux BE and LE with no
> regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-03-16 Haochen Gui 
> 
> gcc/
>   PR target/103316
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
>   gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
>   RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
>   RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
>   * config/rs6000/vector.md (VEC_IC): Define. Add support for new Power10
>   V1TI instructions.
>   (vec_cmp): Set mode iterator to VEC_IC.
>   (vec_cmpu): Likewise.
> 
> gcc/testsuite/
>   PR target/103316
>   * gcc.target/powerpc/pr103316.c: New.
>   * gcc.target/powerpc/fold-vec-cmp-int128.c: New cases for vector
>   __int128.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 5d34c1bcfc9..fac7f43f438 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -1994,16 +1994,14 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPEQUH:
>  case RS6000_BIF_VCMPEQUW:
>  case RS6000_BIF_VCMPEQUD:
> -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because
> gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPEQUT:
>fold_compare_helper (gsi, EQ_EXPR, stmt);
>return true;
> 
>  case RS6000_BIF_VCMPNEB:
>  case RS6000_BIF_VCMPNEH:
>  case RS6000_BIF_VCMPNEW:
> -/* We deliberately omit RS6000_BIF_VCMPNET for now, because
> gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPNET:
>fold_compare_helper (gsi, NE_EXPR, stmt);
>return true;
> 
> @@ -2015,9 +2013,8 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPGE_U4SI:
>  case RS6000_BIF_CMPGE_2DI:
>  case RS6000_BIF_CMPGE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPGE_1TI and
> RS6000_BIF_CMPGE_U1TI
> -   for now, because gimple folding produces worse code for 128-
> bit
> -   compares.  */
> +case RS6000_BIF_CMPGE_1TI:
> +case RS6000_BIF_CMPGE_U1TI:
>fold_compare_helper (gsi, GE_EXPR, stmt);
>return true;
> 
> @@ -2029,9 +2026,8 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPGTUW:
>  case RS6000_BIF_VCMPGTUD:
>  case RS6000_BIF_VCMPGTSD:
> -/* We deliberately omit RS6000_BIF_VCMPGTUT and
> RS6000_BIF_VCMPGTST
> -   for now, because gimple folding produces worse code for 128-
> bit
> -   compares.  */
> +case RS6000_BIF_VCMPGTUT:
> +case RS6000_BIF_VCMPGTST:
>fold_compare_helper (gsi, GT_EXPR, stmt);
>return true;
> 
> @@ -2043,9 +2039,8 @@ rs6000_gimple_fold_builtin
> (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPLE_U4SI:
>  case RS6000_BIF_CMPLE_2DI:
>  case RS6000_BIF_CMPLE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPLE_1TI and
> RS6000_BIF_CMPLE_U1TI
> -   for now, because gimple folding produces worse code for 128-
> bit
> -   compares.  */
> +case RS6000_BIF_CMPLE_1TI:
> +case RS6000_BIF_CMPLE_U1TI:
>fold_compare_helper (gsi, LE_EXPR, stmt);
>return true;
> 
> diff --git a/gcc/config/rs6000/vector.md
> b/gcc/config/rs6000/vector.md
> index b87a742cca8..d88869cc8d0 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,6 +26,9 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> +;; Vector int modes for comparison
> +(define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI
> "TARGET_POWER10")])
> +
>  ;; 128-bit int modes
>  (define_mode_iterator VEC_TI [V1TI TI])
> 
> @@ -533,10 +536,10 @@ (define_expand "vcond_mask_"
> 
>  ;; For signed integer vectors comparison.
>  (define_expand "vec_cmp"
> -  [(set (match_operand:VEC_I 0 "vint_operand")
> +  [(set (match_operand:VEC_IC 0 "vint_operand")
>   (match_operator 1 "signed_or_equality_comparison_operator"
> -   [(match_operand:VEC_I 2 "vint_operand")
> -(match_operand:VEC_I 3 "vint_operand")]))]
> +  

Re: [PATCH] libstdc++: Reduce header dependencies from PSTL headers [PR92546]

2022-03-17 Thread Thomas Rodgers via Gcc-patches
Looks ok to me. I just am curious, does the change to src/c++17/fs_path.cc
need to be part of this change (It's not obvious to me that it is related
to the other changes in the patch).

On Thu, Mar 17, 2022 at 12:01 PM Jonathan Wakely  wrote:

> Tested x86_64-linux. Tom, any objection?
>
> -- >8 --
>
> This avoids including the whole of  in , as the
>  header only actually needs std::pair.
>
> This also avoids including  in , which only
> needs , std::bad_alloc, and std::terminate (which can be
> repalced with std::__terminate). This matters less, because
>  is only included by the  headers and they
> all use  anyway, and are only included by .
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/92546
> * include/pstl/glue_algorithm_defs.h: Replace  with
> .
> * include/pstl/utils.h: Replace  with .
> (__pstl::__internal::__except_handler): Use std::__terminate
> instead of std::terminate.
> * src/c++17/fs_path.cc: Include .
> * testsuite/25_algorithms/adjacent_find/constexpr.cc: Include
> .
> * testsuite/25_algorithms/binary_search/constexpr.cc: Likewise.
> * testsuite/25_algorithms/clamp/constrained.cc: Likewise.
> * testsuite/25_algorithms/equal/constrained.cc: Likewise.
> * testsuite/25_algorithms/for_each/constrained.cc: Likewise.
> * testsuite/25_algorithms/includes/constrained.cc: Likewise.
> * testsuite/25_algorithms/is_heap/constexpr.cc: Likewise.
> * testsuite/25_algorithms/is_heap_until/constexpr.cc: Likewise.
> * testsuite/25_algorithms/is_permutation/constrained.cc: Include
> .
> * testsuite/25_algorithms/is_sorted/constexpr.cc: Include
> .
> * testsuite/25_algorithms/is_sorted_until/constexpr.cc:
> Likewise.
> * testsuite/25_algorithms/lexicographical_compare/constexpr.cc:
> Likewise.
> * testsuite/25_algorithms/lexicographical_compare/constrained.cc:
> Likewise.
> * testsuite/25_algorithms/lexicographical_compare_three_way/1.cc:
> Include .
> * testsuite/25_algorithms/lower_bound/constexpr.cc: Include
> .
> * testsuite/25_algorithms/max/constrained.cc: Likewise.
> * testsuite/25_algorithms/max_element/constrained.cc: Likewise.
> * testsuite/25_algorithms/min/constrained.cc: Likewise.
> * testsuite/25_algorithms/min_element/constrained.cc: Likewise.
> * testsuite/25_algorithms/minmax_element/constrained.cc:
> Likewise.
> * testsuite/25_algorithms/mismatch/constexpr.cc: Likewise.
> * testsuite/25_algorithms/move/93872.cc: Likewise.
> * testsuite/25_algorithms/move_backward/93872.cc: Include
> .
> * testsuite/25_algorithms/nth_element/constexpr.cc: Include
> .
> * testsuite/25_algorithms/partial_sort/constexpr.cc: Likewise.
> * testsuite/25_algorithms/partial_sort_copy/constexpr.cc:
> Likewise.
> * testsuite/25_algorithms/search/constexpr.cc: Likewise.
> * testsuite/25_algorithms/search_n/constrained.cc: Likewise.
> * testsuite/25_algorithms/set_difference/constexpr.cc: Likewise.
> * testsuite/25_algorithms/set_difference/constrained.cc:
> Likewise.
> * testsuite/25_algorithms/set_intersection/constexpr.cc:
> Likewise.
> * testsuite/25_algorithms/set_intersection/constrained.cc:
> Likewise.
> * testsuite/25_algorithms/set_symmetric_difference/constexpr.cc:
> Likewise.
> * testsuite/25_algorithms/set_union/constexpr.cc: Likewise.
> * testsuite/25_algorithms/set_union/constrained.cc: Likewise.
> * testsuite/25_algorithms/sort/constexpr.cc: Likewise.
> * testsuite/25_algorithms/sort_heap/constexpr.cc: Likewise.
> * testsuite/25_algorithms/transform/constrained.cc: Likewise.
> * testsuite/25_algorithms/unique/constexpr.cc: Likewise.
> * testsuite/25_algorithms/unique/constrained.cc: Likewise.
> * testsuite/25_algorithms/unique_copy/constexpr.cc: Likewise.
> * testsuite/25_algorithms/upper_bound/constexpr.cc: Likewise.
> * testsuite/std/ranges/adaptors/elements.cc: Include .
> * testsuite/std/ranges/adaptors/lazy_split.cc: Likewise.
> * testsuite/std/ranges/adaptors/split.cc: Likewise.
> ---
>  libstdc++-v3/include/pstl/glue_algorithm_defs.h   | 2 +-
>  libstdc++-v3/include/pstl/utils.h | 4 ++--
>  libstdc++-v3/src/c++17/fs_path.cc | 1 +
>  .../testsuite/25_algorithms/adjacent_find/constexpr.cc| 1 +
>  .../testsuite/25_algorithms/binary_search/constexpr.cc| 1 +
>  libstdc++-v3/testsuite/25_algorithms/clamp/constrained.cc | 1 +
>  libstdc++-v3/testsuite/25_algorithms/equal/constrained.cc | 1 +
>  libstdc++-v3/testsuite/25_algorithms/for_each/constrained.cc  | 1 +
>  

Re: [PATCH] PR tree-optimization/102943 - Always use dominators in the cache when available.

2022-03-17 Thread Richard Biener via Gcc-patches



> Am 17.03.2022 um 18:35 schrieb Andrew MacLeod :
> 
> As discussed in the PR, this patch adjusts rangers cache query to *always* 
> use dominators to lookup ranges in the cache.
> 
> It use to give up if it encountered a block with outgoing edge ranges. This 
> patch changes that to continue looking back until a value is found, then 
> applies any outgoing ranges encountered along the way.
> 
> This prevents us from filling the on-entry cache in blocks which don't really 
> affect the outcome, and causes significant memory reduction is large 
> functions at a nominal calculation cost.
> 
> I have tweaked the debug output as well as a couple of other little things 
> from the original patch attached in the PR. Confirmed that the performance 
> results are similar, and generated reams of debug output to ensure there are 
> no issues there.
> 
> Bootstraps on x86_64-pc-linux-gnu with no regressions.  OK for trunk?

Ok.

Thanks,
Richard 

> 
> Andrew
> <943.patch>


Re: [x86_64 PATCH] PR 90356: Use xor to load const_double 0.0 on SSE (always)

2022-03-17 Thread Uros Bizjak via Gcc-patches
On Thu, Mar 17, 2022 at 8:50 PM Uros Bizjak  wrote:
>
> On Thu, Mar 17, 2022 at 8:41 PM Roger Sayle  
> wrote:
> >
> >
> > Implementations of the x87 floating point instruction set have always
> > had some pretty strange characteristics.  For example on the original
> > Intel Pentium the FLDPI instruction (to load 3.14159... into a register)
> > took 5 cycles, and the FLDZ instruction (to load 0.0) took 2 cycles,
> > when a regular FLD (load from memory) took just 1 cycle!?  Given that
> > back then memory latencies were much lower (relatively) than they are
> > today, these instructions were all but useless except when optimizing
> > for size (impressively FLDZ/FLDPI require only two bytes).
> >
> > Such was the world back in 2006 when Uros Bizjak first added support for
> > fldz https://gcc.gnu.org/pipermail/gcc-patches/2006-November/202589.html
> > and then shortly after sensibly disabled them for !optimize_size with
> > https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204405.html
> > [which was very expertly reviewed and approved here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204487.html ]
> >
> > "And some things that should not have been forgotten were lost.
> > History became legend.  Legend became myth." -- Lord of the Rings
> >
> > Alas this vestigial logic still persists in the compiler today,
> > so for example on x86_64 for the following function:
> >
> > double foo(double x) { return x + 0.0; }
> >
> > generates with -O2
> >
> > foo:addsd   .LC0(%rip), %xmm0
> > ret
> > .LC0:   .long   0
> > .long   0
> >
> > preferring to read the constant 0.0 from memory [the constant pool],
> > except when optimizing for size.  With -Os we get:
> >
> > foo:xorps   %xmm1, %xmm1
> > addsd   %xmm1, %xmm0
> > ret
> >
> > Which is not only smaller (the two instructions require seven bytes vs.
> > eight for the original addsd from mem, even without considering the
> > constant pool) but is also faster on modern hardware.  The latter code
> > sequence is generated by both clang and msvc with -O2.  Indeed Agner
> > Fogg documents the set of floating point/SSE constants that it's
> > cheaper to materialize than to load from memory.
> >
> > This patch shuffles the conditions on the i386 backend's *movtf_internal,
> > *movdf_internal and *movsf_internal define_insns to untangle the newer
> > TARGET_SSE_MATH clauses from the historical standard_80387_constant_p
> > conditions.  Amongst the benefits of this are that it improves the code
> > generated for PR tree-optimization/90356 and resolves PR target/86722.
> > Many thanks to Hongtao whose approval of my PR 94680 "movq" patch
> > unblocked this one.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -check with no new failures.  Ok for mainline?
> >
> >
> > 2022-03-17  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/86722
> > PR tree-optimization/90356
> > * config/i386/i386.md (*movtf_internal): Don't guard
> > standard_sse_constant_p clause by optimize_function_for_size_p.
> > (*movdf_internal): Likewise.
> > (*movsf_internal): Likewise.
> >
> > gcc/testsuite/ChangeLog
> > PR target/86722
> > PR tree-optimization/90356
> > * gcc.target/i386/pr86722.c: New test case.
> > * gcc.target/i386/pr90356.c: New test case.
>
> OK, and based on your analysis, even obvious.

Maybe a little improvement for tests:

+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -msse" } */

You could add "-msse2 -mfpmath=sse" to dg-options, so it will also
compile for the ia32 target.

Uros.


Re: [x86_64 PATCH] PR 90356: Use xor to load const_double 0.0 on SSE (always)

2022-03-17 Thread Uros Bizjak via Gcc-patches
On Thu, Mar 17, 2022 at 8:41 PM Roger Sayle  wrote:
>
>
> Implementations of the x87 floating point instruction set have always
> had some pretty strange characteristics.  For example on the original
> Intel Pentium the FLDPI instruction (to load 3.14159... into a register)
> took 5 cycles, and the FLDZ instruction (to load 0.0) took 2 cycles,
> when a regular FLD (load from memory) took just 1 cycle!?  Given that
> back then memory latencies were much lower (relatively) than they are
> today, these instructions were all but useless except when optimizing
> for size (impressively FLDZ/FLDPI require only two bytes).
>
> Such was the world back in 2006 when Uros Bizjak first added support for
> fldz https://gcc.gnu.org/pipermail/gcc-patches/2006-November/202589.html
> and then shortly after sensibly disabled them for !optimize_size with
> https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204405.html
> [which was very expertly reviewed and approved here:
> https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204487.html ]
>
> "And some things that should not have been forgotten were lost.
> History became legend.  Legend became myth." -- Lord of the Rings
>
> Alas this vestigial logic still persists in the compiler today,
> so for example on x86_64 for the following function:
>
> double foo(double x) { return x + 0.0; }
>
> generates with -O2
>
> foo:addsd   .LC0(%rip), %xmm0
> ret
> .LC0:   .long   0
> .long   0
>
> preferring to read the constant 0.0 from memory [the constant pool],
> except when optimizing for size.  With -Os we get:
>
> foo:xorps   %xmm1, %xmm1
> addsd   %xmm1, %xmm0
> ret
>
> Which is not only smaller (the two instructions require seven bytes vs.
> eight for the original addsd from mem, even without considering the
> constant pool) but is also faster on modern hardware.  The latter code
> sequence is generated by both clang and msvc with -O2.  Indeed Agner
> Fogg documents the set of floating point/SSE constants that it's
> cheaper to materialize than to load from memory.
>
> This patch shuffles the conditions on the i386 backend's *movtf_internal,
> *movdf_internal and *movsf_internal define_insns to untangle the newer
> TARGET_SSE_MATH clauses from the historical standard_80387_constant_p
> conditions.  Amongst the benefits of this are that it improves the code
> generated for PR tree-optimization/90356 and resolves PR target/86722.
> Many thanks to Hongtao whose approval of my PR 94680 "movq" patch
> unblocked this one.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -check with no new failures.  Ok for mainline?
>
>
> 2022-03-17  Roger Sayle  
>
> gcc/ChangeLog
> PR target/86722
> PR tree-optimization/90356
> * config/i386/i386.md (*movtf_internal): Don't guard
> standard_sse_constant_p clause by optimize_function_for_size_p.
> (*movdf_internal): Likewise.
> (*movsf_internal): Likewise.
>
> gcc/testsuite/ChangeLog
> PR target/86722
> PR tree-optimization/90356
> * gcc.target/i386/pr86722.c: New test case.
> * gcc.target/i386/pr90356.c: New test case.

OK, and based on your analysis, even obvious.

Thanks,
Uros.


[x86_64 PATCH] PR 90356: Use xor to load const_double 0.0 on SSE (always)

2022-03-17 Thread Roger Sayle

Implementations of the x87 floating point instruction set have always
had some pretty strange characteristics.  For example on the original
Intel Pentium the FLDPI instruction (to load 3.14159... into a register)
took 5 cycles, and the FLDZ instruction (to load 0.0) took 2 cycles,
when a regular FLD (load from memory) took just 1 cycle!?  Given that
back then memory latencies were much lower (relatively) than they are
today, these instructions were all but useless except when optimizing
for size (impressively FLDZ/FLDPI require only two bytes).

Such was the world back in 2006 when Uros Bizjak first added support for
fldz https://gcc.gnu.org/pipermail/gcc-patches/2006-November/202589.html
and then shortly after sensibly disabled them for !optimize_size with
https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204405.html
[which was very expertly reviewed and approved here:
https://gcc.gnu.org/pipermail/gcc-patches/2006-November/204487.html ]

"And some things that should not have been forgotten were lost.
History became legend.  Legend became myth." -- Lord of the Rings

Alas this vestigial logic still persists in the compiler today,
so for example on x86_64 for the following function:

double foo(double x) { return x + 0.0; }

generates with -O2

foo:addsd   .LC0(%rip), %xmm0
ret
.LC0:   .long   0
.long   0

preferring to read the constant 0.0 from memory [the constant pool],
except when optimizing for size.  With -Os we get:

foo:xorps   %xmm1, %xmm1
addsd   %xmm1, %xmm0
ret

Which is not only smaller (the two instructions require seven bytes vs.
eight for the original addsd from mem, even without considering the
constant pool) but is also faster on modern hardware.  The latter code
sequence is generated by both clang and msvc with -O2.  Indeed Agner
Fogg documents the set of floating point/SSE constants that it's
cheaper to materialize than to load from memory.

This patch shuffles the conditions on the i386 backend's *movtf_internal,
*movdf_internal and *movsf_internal define_insns to untangle the newer
TARGET_SSE_MATH clauses from the historical standard_80387_constant_p
conditions.  Amongst the benefits of this are that it improves the code
generated for PR tree-optimization/90356 and resolves PR target/86722.
Many thanks to Hongtao whose approval of my PR 94680 "movq" patch
unblocked this one.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -check with no new failures.  Ok for mainline?


2022-03-17  Roger Sayle  

gcc/ChangeLog
PR target/86722
PR tree-optimization/90356
* config/i386/i386.md (*movtf_internal): Don't guard
standard_sse_constant_p clause by optimize_function_for_size_p.
(*movdf_internal): Likewise.
(*movsf_internal): Likewise.

gcc/testsuite/ChangeLog
PR target/86722
PR tree-optimization/90356
* gcc.target/i386/pr86722.c: New test case.
* gcc.target/i386/pr90356.c: New test case.

Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 46a2663..5b44c65 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3455,9 +3455,7 @@
&& !(MEM_P (operands[0]) && MEM_P (operands[1]))
&& (lra_in_progress || reload_completed
|| !CONST_DOUBLE_P (operands[1])
-   || ((optimize_function_for_size_p (cfun)
-   || (ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC))
-  && standard_sse_constant_p (operands[1], TFmode) == 1
+   || (standard_sse_constant_p (operands[1], TFmode) == 1
   && !memory_operand (operands[0], TFmode))
|| (!TARGET_MEMORY_MISMATCH_STALL
   && memory_operand (operands[0], TFmode)))"
@@ -3590,10 +3588,11 @@
|| !CONST_DOUBLE_P (operands[1])
|| ((optimize_function_for_size_p (cfun)
|| (ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC))
-  && ((IS_STACK_MODE (DFmode)
-   && standard_80387_constant_p (operands[1]) > 0)
-  || (TARGET_SSE2 && TARGET_SSE_MATH
-  && standard_sse_constant_p (operands[1], DFmode) == 1))
+  && IS_STACK_MODE (DFmode)
+  && standard_80387_constant_p (operands[1]) > 0
+  && !memory_operand (operands[0], DFmode))
+   || (TARGET_SSE2 && TARGET_SSE_MATH
+  && standard_sse_constant_p (operands[1], DFmode) == 1
   && !memory_operand (operands[0], DFmode))
|| ((TARGET_64BIT || !TARGET_MEMORY_MISMATCH_STALL)
   && memory_operand (operands[0], DFmode))
@@ -3762,10 +3761,10 @@
|| !CONST_DOUBLE_P (operands[1])
|| ((optimize_function_for_size_p (cfun)
|| (ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC))
-  && ((IS_STACK_MODE (SFmode)
-   && standard_80387_constant_p (operands[1]) > 0)
-  || (TARGET_SSE && TARGET_SSE_MATH
-  && standard_sse_constant_p (operands[1], SFmode) == 

[PATCH] libstdc++: Reduce header dependencies from PSTL headers [PR92546]

2022-03-17 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Tom, any objection?

-- >8 --

This avoids including the whole of  in , as the
 header only actually needs std::pair.

This also avoids including  in , which only
needs , std::bad_alloc, and std::terminate (which can be
repalced with std::__terminate). This matters less, because
 is only included by the  headers and they
all use  anyway, and are only included by .

libstdc++-v3/ChangeLog:

PR libstdc++/92546
* include/pstl/glue_algorithm_defs.h: Replace  with
.
* include/pstl/utils.h: Replace  with .
(__pstl::__internal::__except_handler): Use std::__terminate
instead of std::terminate.
* src/c++17/fs_path.cc: Include .
* testsuite/25_algorithms/adjacent_find/constexpr.cc: Include
.
* testsuite/25_algorithms/binary_search/constexpr.cc: Likewise.
* testsuite/25_algorithms/clamp/constrained.cc: Likewise.
* testsuite/25_algorithms/equal/constrained.cc: Likewise.
* testsuite/25_algorithms/for_each/constrained.cc: Likewise.
* testsuite/25_algorithms/includes/constrained.cc: Likewise.
* testsuite/25_algorithms/is_heap/constexpr.cc: Likewise.
* testsuite/25_algorithms/is_heap_until/constexpr.cc: Likewise.
* testsuite/25_algorithms/is_permutation/constrained.cc: Include
.
* testsuite/25_algorithms/is_sorted/constexpr.cc: Include
.
* testsuite/25_algorithms/is_sorted_until/constexpr.cc:
Likewise.
* testsuite/25_algorithms/lexicographical_compare/constexpr.cc:
Likewise.
* testsuite/25_algorithms/lexicographical_compare/constrained.cc:
Likewise.
* testsuite/25_algorithms/lexicographical_compare_three_way/1.cc:
Include .
* testsuite/25_algorithms/lower_bound/constexpr.cc: Include
.
* testsuite/25_algorithms/max/constrained.cc: Likewise.
* testsuite/25_algorithms/max_element/constrained.cc: Likewise.
* testsuite/25_algorithms/min/constrained.cc: Likewise.
* testsuite/25_algorithms/min_element/constrained.cc: Likewise.
* testsuite/25_algorithms/minmax_element/constrained.cc:
Likewise.
* testsuite/25_algorithms/mismatch/constexpr.cc: Likewise.
* testsuite/25_algorithms/move/93872.cc: Likewise.
* testsuite/25_algorithms/move_backward/93872.cc: Include
.
* testsuite/25_algorithms/nth_element/constexpr.cc: Include
.
* testsuite/25_algorithms/partial_sort/constexpr.cc: Likewise.
* testsuite/25_algorithms/partial_sort_copy/constexpr.cc:
Likewise.
* testsuite/25_algorithms/search/constexpr.cc: Likewise.
* testsuite/25_algorithms/search_n/constrained.cc: Likewise.
* testsuite/25_algorithms/set_difference/constexpr.cc: Likewise.
* testsuite/25_algorithms/set_difference/constrained.cc:
Likewise.
* testsuite/25_algorithms/set_intersection/constexpr.cc:
Likewise.
* testsuite/25_algorithms/set_intersection/constrained.cc:
Likewise.
* testsuite/25_algorithms/set_symmetric_difference/constexpr.cc:
Likewise.
* testsuite/25_algorithms/set_union/constexpr.cc: Likewise.
* testsuite/25_algorithms/set_union/constrained.cc: Likewise.
* testsuite/25_algorithms/sort/constexpr.cc: Likewise.
* testsuite/25_algorithms/sort_heap/constexpr.cc: Likewise.
* testsuite/25_algorithms/transform/constrained.cc: Likewise.
* testsuite/25_algorithms/unique/constexpr.cc: Likewise.
* testsuite/25_algorithms/unique/constrained.cc: Likewise.
* testsuite/25_algorithms/unique_copy/constexpr.cc: Likewise.
* testsuite/25_algorithms/upper_bound/constexpr.cc: Likewise.
* testsuite/std/ranges/adaptors/elements.cc: Include .
* testsuite/std/ranges/adaptors/lazy_split.cc: Likewise.
* testsuite/std/ranges/adaptors/split.cc: Likewise.
---
 libstdc++-v3/include/pstl/glue_algorithm_defs.h   | 2 +-
 libstdc++-v3/include/pstl/utils.h | 4 ++--
 libstdc++-v3/src/c++17/fs_path.cc | 1 +
 .../testsuite/25_algorithms/adjacent_find/constexpr.cc| 1 +
 .../testsuite/25_algorithms/binary_search/constexpr.cc| 1 +
 libstdc++-v3/testsuite/25_algorithms/clamp/constrained.cc | 1 +
 libstdc++-v3/testsuite/25_algorithms/equal/constrained.cc | 1 +
 libstdc++-v3/testsuite/25_algorithms/for_each/constrained.cc  | 1 +
 libstdc++-v3/testsuite/25_algorithms/includes/constrained.cc  | 1 +
 libstdc++-v3/testsuite/25_algorithms/is_heap/constexpr.cc | 1 +
 .../testsuite/25_algorithms/is_heap_until/constexpr.cc| 1 +
 .../testsuite/25_algorithms/is_permutation/constrained.cc | 1 +
 libstdc++-v3/testsuite/25_algorithms/is_sorted/constexpr.cc   | 1 +
 .../testsuite/25_algorithms/is_sorted_until/constexpr.cc  | 1 +
 

Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-17 Thread Martin Sebor via Gcc-patches

On 3/17/22 12:02, Siddhesh Poyarekar wrote:

On 17/03/2022 23:21, Martin Sebor wrote:

On 3/17/22 11:22, Siddhesh Poyarekar wrote:

On 17/03/2022 22:16, Jeff Law wrote:

    #include 
    char a[] = "abc";
    char b[] = "abcd";

    int f (void)
    {
   return strncmp (a, b, 8);
    }

    where I get

    t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source 
size 5

    [-Wstringop-overread]
     7 |   return strncmp (a, b, 8);   // -Wstringop-overread
       |          ^

    even without -Wall.  strncmp sees that a[3] is '\0' so it stops
    comparing
    and there's no UB.

This one is a clear case where warning is bad.   Both arguments are 
constant and we can determine they are NUL terminated and an 
overread will never occur.  No deep analysis really needed here.


THe far more interesting case in my mind is when one or both 
arguments have an unknown NUL termination state.  I could argue 
either side of that case.  I lean towards warning but I understand 
that opinions differ and my priorities have moved away from 
distro-level issues, so identifying code that needs a careful review 
for correctness, particularly old or security sensitive code, has 
become a much lower priority for me.   Combine that with the fact 
that we're really just dealing with over-reads here, I can support 
whatever the broadest consensus is.


Actually in the above reproducer a and b are not const, so this is in 
fact the case where the NUL termination state of the strings is in 
theory unknown.  From the distro level (and in general for 
applications) the question is how common this is and I gathered from 
a Red Hat internal conversation that it's not uncommon.  However 
David pointed out that I need to share more specific examples to 
quantify this, so I need to work on that.  I'll share an update once 
I have it.


One case I am aware of is the pmix package in Fedora/RHEL, which has 
the following warning:


pmix-3.2.3/examples/alloc.c: scope_hint: In function 'main'
pmix-3.2.3/examples/alloc.c:179:31: warning[-Wstringop-overread]: 
'PMIx_Get' reading 512 bytes from a region of size 15
   179 | if (PMIX_SUCCESS != (rc = PMIx_Get(, 
PMIX_UNIV_SIZE, NULL, 0, ))) {

   | ^~
pmix-3.2.3/examples/alloc.c:179:31: note: referencing argument 2 of 
type 'const char *'

pmix-3.2.3/examples/alloc.c:33: included_from: Included from here.
pmix-3.2.3/include/pmix.h:203:27: note: in a call to function 'PMIx_Get'
   203 | PMIX_EXPORT pmix_status_t PMIx_Get(const pmix_proc_t *proc, 
const pmix_key_t key,

   |   ^~~~
   177|   PMIX_PROC_CONSTRUCT();
   178|   PMIX_LOAD_PROCID(, myproc.nspace, 
PMIX_RANK_WILDCARD);
   179|-> if (PMIX_SUCCESS != (rc = PMIx_Get(, 
PMIX_UNIV_SIZE, NULL, 0, ))) {
   180|   fprintf(stderr, "Client ns %s rank %d: PMIx_Get 
universe size failed: %d\n", myproc.nspace, myproc.rank, rc);

   181|   goto done;

which is due to PMIx_Get calling strncmp a few levels within with 
non-const strings and a max size of 512 (the maximum size that a key 
could be; AFAICT it's the size of the buffer into which the key gets 
written out), where the strings are always NULL terminated.


This warning has nothing to do with strncmp.

It's issued for the call to PMIx_Get(), where the caller passes as
the second argument PMIX_UNIV_SIZE, a macro that expands to
the string "pmix.univ.size".

The function is declared like so:

   PMIX_EXPORT pmix_status_t
   PMIx_Get(const pmix_proc_t *proc, const pmix_key_t key,
    const pmix_info_t info[], size_t ninfo,
    pmix_value_t **val);

The type of the second function argument, pmix_key_t, defined as

   typedef char pmix_key_t[PMIX_MAX_KEYLEN+1];

an array of 512 elements (PMIX_MAX_KEYLEN is defined to 511), but
PMIX_UNIV_SIZE is much smaller (just 15 bytes).

The warning detects passing smaller arrays to parameters of larger
types declared using the array syntax.  It's controlled by
-Warray-parameter.


That's odd, shouldn't it show up as -Warray-parameter then and not 
-Wstringop-overread?


I should have said: the array parameter feature is controlled by
-Warray-parameter.  (The warning above is obviously
-Wstringop-overread).

But since -Warray-parameter controls the array parameter feature,
it might be appropriate to also make the -Wstringop-overread and
-Wstringop-overflow instances for calls to such functions conditional
on the former option being enabled.  (Otherwise, -Warray-parameter
has a separate function of its own.)

I do see a minor issue with this warning in GCC 12: it's issued three
times for the same call, rather than once.  That's probably because
the new warning pass that issues it runs multiple times and doesn't
suppress the warning after issuing it the first time.

Martin



Siddhesh





[r12-7687 Regression] FAIL: gcc.target/i386/bt-5.c scan-assembler-times bt[lq][ \t] 7 on Linux/x86_64

2022-03-17 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

3a7ba8fd0cda387809e4902328af2473662b6a4a is the first bad commit
commit 3a7ba8fd0cda387809e4902328af2473662b6a4a
Author: Richard Biener 
Date:   Thu Mar 17 08:10:59 2022 +0100

tree-optimization/104960 - unsplit edges after late sinking

caused

FAIL: gcc.target/i386/bt-5.c scan-assembler-not sar[lq][ \t]
FAIL: gcc.target/i386/bt-5.c scan-assembler-times bt[lq][ \t] 7

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-7687/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/bt-5.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/bt-5.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-17 Thread Siddhesh Poyarekar

On 17/03/2022 23:21, Martin Sebor wrote:

On 3/17/22 11:22, Siddhesh Poyarekar wrote:

On 17/03/2022 22:16, Jeff Law wrote:

    #include 
    char a[] = "abc";
    char b[] = "abcd";

    int f (void)
    {
   return strncmp (a, b, 8);
    }

    where I get

    t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5
    [-Wstringop-overread]
     7 |   return strncmp (a, b, 8);   // -Wstringop-overread
       |          ^

    even without -Wall.  strncmp sees that a[3] is '\0' so it stops
    comparing
    and there's no UB.

This one is a clear case where warning is bad.   Both arguments are 
constant and we can determine they are NUL terminated and an overread 
will never occur.  No deep analysis really needed here.


THe far more interesting case in my mind is when one or both 
arguments have an unknown NUL termination state.  I could argue 
either side of that case.  I lean towards warning but I understand 
that opinions differ and my priorities have moved away from 
distro-level issues, so identifying code that needs a careful review 
for correctness, particularly old or security sensitive code, has 
become a much lower priority for me.   Combine that with the fact 
that we're really just dealing with over-reads here, I can support 
whatever the broadest consensus is.


Actually in the above reproducer a and b are not const, so this is in 
fact the case where the NUL termination state of the strings is in 
theory unknown.  From the distro level (and in general for 
applications) the question is how common this is and I gathered from a 
Red Hat internal conversation that it's not uncommon.  However David 
pointed out that I need to share more specific examples to quantify 
this, so I need to work on that.  I'll share an update once I have it.


One case I am aware of is the pmix package in Fedora/RHEL, which has 
the following warning:


pmix-3.2.3/examples/alloc.c: scope_hint: In function 'main'
pmix-3.2.3/examples/alloc.c:179:31: warning[-Wstringop-overread]: 
'PMIx_Get' reading 512 bytes from a region of size 15
   179 | if (PMIX_SUCCESS != (rc = PMIx_Get(, PMIX_UNIV_SIZE, 
NULL, 0, ))) {

   | ^~
pmix-3.2.3/examples/alloc.c:179:31: note: referencing argument 2 of 
type 'const char *'

pmix-3.2.3/examples/alloc.c:33: included_from: Included from here.
pmix-3.2.3/include/pmix.h:203:27: note: in a call to function 'PMIx_Get'
   203 | PMIX_EXPORT pmix_status_t PMIx_Get(const pmix_proc_t *proc, 
const pmix_key_t key,

   |   ^~~~
   177|   PMIX_PROC_CONSTRUCT();
   178|   PMIX_LOAD_PROCID(, myproc.nspace, PMIX_RANK_WILDCARD);
   179|-> if (PMIX_SUCCESS != (rc = PMIx_Get(, 
PMIX_UNIV_SIZE, NULL, 0, ))) {
   180|   fprintf(stderr, "Client ns %s rank %d: PMIx_Get 
universe size failed: %d\n", myproc.nspace, myproc.rank, rc);

   181|   goto done;

which is due to PMIx_Get calling strncmp a few levels within with 
non-const strings and a max size of 512 (the maximum size that a key 
could be; AFAICT it's the size of the buffer into which the key gets 
written out), where the strings are always NULL terminated.


This warning has nothing to do with strncmp.

It's issued for the call to PMIx_Get(), where the caller passes as
the second argument PMIX_UNIV_SIZE, a macro that expands to
the string "pmix.univ.size".

The function is declared like so:

   PMIX_EXPORT pmix_status_t
   PMIx_Get(const pmix_proc_t *proc, const pmix_key_t key,
    const pmix_info_t info[], size_t ninfo,
    pmix_value_t **val);

The type of the second function argument, pmix_key_t, defined as

   typedef char pmix_key_t[PMIX_MAX_KEYLEN+1];

an array of 512 elements (PMIX_MAX_KEYLEN is defined to 511), but
PMIX_UNIV_SIZE is much smaller (just 15 bytes).

The warning detects passing smaller arrays to parameters of larger
types declared using the array syntax.  It's controlled by
-Warray-parameter.


That's odd, shouldn't it show up as -Warray-parameter then and not 
-Wstringop-overread?


Siddhesh


[committed] libstdc++: Avoid including in [PR92546]

2022-03-17 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, x86_64-mingw, pushed to trunk.

-- >8 --

This only affects Windows, but reduces the preprocessed size of
 significantly.

libstdc++-v3/ChangeLog:

PR libstdc++/92546
* include/bits/fs_path.h (path::make_preferred): Use
handwritten loop instead of std::replace.
---
 libstdc++-v3/include/bits/fs_path.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index b16a878f24b..9e06fa679d8 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -52,7 +52,6 @@
 
 #if defined(_WIN32) && !defined(__CYGWIN__)
 # define _GLIBCXX_FILESYSTEM_IS_WINDOWS 1
-# include 
 #endif
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -1060,8 +1059,12 @@ namespace __detail
   path::make_preferred()
   {
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-std::replace(_M_pathname.begin(), _M_pathname.end(), L'/',
-preferred_separator);
+auto __pos = _M_pathname.find(L'/');
+while (__pos != _M_pathname.npos)
+  {
+   _M_pathname[__pos] = preferred_separator;
+   __pos = _M_pathname.find(L'/', __pos);
+  }
 #endif
 return *this;
   }
-- 
2.34.1



[committed] libstdc++: Rewrite __moneypunct_cache::_M_cache [PR104966]

2022-03-17 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.

-- >8 --

GCC thinks the following can lead to a buffer overflow when __ns.size()
equals zero:

  const basic_string<_CharT>& __ns = __mp.negative_sign();
  _M_negative_sign_size = __ns.size();
  __negative_sign = new _CharT[_M_negative_sign_size];
  __ns.copy(__negative_sign, _M_negative_sign_size);

This happens because operator new might be replaced with something that
writes to this->_M_negative_sign_size and so the basic_string::copy call
could use a non-zero size to write to a zero-length buffer.

The solution suggested by Richi is to cache the size in a local variable
so that the compiler knows it won't be changed between the allocation
and the copy.

This commit goes further and rewrites the whole function to use RAII and
delay all modifications of *this until after all allocations have
succeeded. The RAII helper type caches the size and copies the string
and owns the memory until told to release it.

libstdc++-v3/ChangeLog:

PR middle-end/104966
* include/bits/locale_facets_nonio.tcc
(__moneypunct_cache::_M_cache): Replace try-catch with RAII and
make all string copies before any stores to *this.
---
 .../include/bits/locale_facets_nonio.tcc  | 100 +-
 1 file changed, 50 insertions(+), 50 deletions(-)

diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.tcc 
b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
index 98442418f51..8c37a706db8 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
@@ -71,61 +71,61 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const moneypunct<_CharT, _Intl>& __mp =
use_facet >(__loc);
 
+  struct _Scoped_str
+  {
+   size_t _M_len;
+   _CharT* _M_str;
+
+   explicit
+   _Scoped_str(const basic_string<_CharT>& __str)
+   : _M_len(__str.size()), _M_str(new _CharT[_M_len])
+   { __str.copy(_M_str, _M_len); }
+
+   ~_Scoped_str() { delete[] _M_str; }
+
+   void
+   _M_release(const _CharT*& __p, size_t& __n)
+   {
+ __p = _M_str;
+ __n = _M_len;
+ _M_str = 0;
+   }
+  };
+
+  _Scoped_str __curr_symbol(__mp.curr_symbol());
+  _Scoped_str __positive_sign(__mp.positive_sign());
+  _Scoped_str __negative_sign(__mp.negative_sign());
+
+  const string& __g = __mp.grouping();
+  const size_t __g_size = __g.size();
+  char* const __grouping = new char[__g_size];
+  __g.copy(__grouping, __g_size);
+
+  // All allocations succeeded without throwing, OK to modify *this now.
+
+  _M_grouping = __grouping;
+  _M_grouping_size = __g_size;
+  _M_use_grouping = (__g_size
+&& static_cast(__grouping[0]) > 0
+&& (__grouping[0]
+!= __gnu_cxx::__numeric_traits::__max));
+
   _M_decimal_point = __mp.decimal_point();
   _M_thousands_sep = __mp.thousands_sep();
+
+  __curr_symbol._M_release(_M_curr_symbol, _M_curr_symbol_size);
+  __positive_sign._M_release(_M_positive_sign, _M_positive_sign_size);
+  __negative_sign._M_release(_M_negative_sign, _M_negative_sign_size);
+
   _M_frac_digits = __mp.frac_digits();
+  _M_pos_format = __mp.pos_format();
+  _M_neg_format = __mp.neg_format();
 
-  char* __grouping = 0;
-  _CharT* __curr_symbol = 0;
-  _CharT* __positive_sign = 0;
-  _CharT* __negative_sign = 0; 
-  __try
-   {
- const string& __g = __mp.grouping();
- _M_grouping_size = __g.size();
- __grouping = new char[_M_grouping_size];
- __g.copy(__grouping, _M_grouping_size);
- _M_use_grouping = (_M_grouping_size
-&& static_cast(__grouping[0]) > 0
-&& (__grouping[0]
-!= __gnu_cxx::__numeric_traits::__max));
+  const ctype<_CharT>& __ct = use_facet >(__loc);
+  __ct.widen(money_base::_S_atoms,
+money_base::_S_atoms + money_base::_S_end, _M_atoms);
 
- const basic_string<_CharT>& __cs = __mp.curr_symbol();
- _M_curr_symbol_size = __cs.size();
- __curr_symbol = new _CharT[_M_curr_symbol_size];
- __cs.copy(__curr_symbol, _M_curr_symbol_size);
-
- const basic_string<_CharT>& __ps = __mp.positive_sign();
- _M_positive_sign_size = __ps.size();
- __positive_sign = new _CharT[_M_positive_sign_size];
- __ps.copy(__positive_sign, _M_positive_sign_size);
-
- const basic_string<_CharT>& __ns = __mp.negative_sign();
- _M_negative_sign_size = __ns.size();
- __negative_sign = new _CharT[_M_negative_sign_size];
- __ns.copy(__negative_sign, _M_negative_sign_size);
-
- _M_pos_format = __mp.pos_format();
- _M_neg_format = __mp.neg_format();
-
- const ctype<_CharT>& __ct = use_facet >(__loc);
-   

Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-17 Thread Martin Sebor via Gcc-patches

On 3/17/22 11:22, Siddhesh Poyarekar wrote:

On 17/03/2022 22:16, Jeff Law wrote:

    #include 
    char a[] = "abc";
    char b[] = "abcd";

    int f (void)
    {
   return strncmp (a, b, 8);
    }

    where I get

    t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5
    [-Wstringop-overread]
     7 |   return strncmp (a, b, 8);   // -Wstringop-overread
       |          ^

    even without -Wall.  strncmp sees that a[3] is '\0' so it stops
    comparing
    and there's no UB.

This one is a clear case where warning is bad.   Both arguments are 
constant and we can determine they are NUL terminated and an overread 
will never occur.  No deep analysis really needed here.


THe far more interesting case in my mind is when one or both arguments 
have an unknown NUL termination state.  I could argue either side of 
that case.  I lean towards warning but I understand that opinions 
differ and my priorities have moved away from distro-level issues, so 
identifying code that needs a careful review for correctness, 
particularly old or security sensitive code, has become a much lower 
priority for me.   Combine that with the fact that we're really just 
dealing with over-reads here, I can support whatever the broadest 
consensus is.


Actually in the above reproducer a and b are not const, so this is in 
fact the case where the NUL termination state of the strings is in 
theory unknown.  From the distro level (and in general for applications) 
the question is how common this is and I gathered from a Red Hat 
internal conversation that it's not uncommon.  However David pointed out 
that I need to share more specific examples to quantify this, so I need 
to work on that.  I'll share an update once I have it.


One case I am aware of is the pmix package in Fedora/RHEL, which has the 
following warning:


pmix-3.2.3/examples/alloc.c: scope_hint: In function 'main'
pmix-3.2.3/examples/alloc.c:179:31: warning[-Wstringop-overread]: 
'PMIx_Get' reading 512 bytes from a region of size 15
   179 | if (PMIX_SUCCESS != (rc = PMIx_Get(, PMIX_UNIV_SIZE, 
NULL, 0, ))) {

   | ^~
pmix-3.2.3/examples/alloc.c:179:31: note: referencing argument 2 of type 
'const char *'

pmix-3.2.3/examples/alloc.c:33: included_from: Included from here.
pmix-3.2.3/include/pmix.h:203:27: note: in a call to function 'PMIx_Get'
   203 | PMIX_EXPORT pmix_status_t PMIx_Get(const pmix_proc_t *proc, 
const pmix_key_t key,

   |   ^~~~
   177|   PMIX_PROC_CONSTRUCT();
   178|   PMIX_LOAD_PROCID(, myproc.nspace, PMIX_RANK_WILDCARD);
   179|-> if (PMIX_SUCCESS != (rc = PMIx_Get(, PMIX_UNIV_SIZE, 
NULL, 0, ))) {
   180|   fprintf(stderr, "Client ns %s rank %d: PMIx_Get 
universe size failed: %d\n", myproc.nspace, myproc.rank, rc);

   181|   goto done;

which is due to PMIx_Get calling strncmp a few levels within with 
non-const strings and a max size of 512 (the maximum size that a key 
could be; AFAICT it's the size of the buffer into which the key gets 
written out), where the strings are always NULL terminated.


This warning has nothing to do with strncmp.

It's issued for the call to PMIx_Get(), where the caller passes as
the second argument PMIX_UNIV_SIZE, a macro that expands to
the string "pmix.univ.size".

The function is declared like so:

  PMIX_EXPORT pmix_status_t
  PMIx_Get(const pmix_proc_t *proc, const pmix_key_t key,
   const pmix_info_t info[], size_t ninfo,
   pmix_value_t **val);

The type of the second function argument, pmix_key_t, defined as

  typedef char pmix_key_t[PMIX_MAX_KEYLEN+1];

an array of 512 elements (PMIX_MAX_KEYLEN is defined to 511), but
PMIX_UNIV_SIZE is much smaller (just 15 bytes).

The warning detects passing smaller arrays to parameters of larger
types declared using the array syntax.  It's controlled by
-Warray-parameter.

Martin



Re: Patch ping (Re: [PATCH] libatomic: Improve 16-byte atomics on Intel AVX [PR104688])

2022-03-17 Thread Uros Bizjak via Gcc-patches
On Wed, Mar 16, 2022 at 6:50 PM Jakub Jelinek  wrote:
>
> Hi!
>
> I'd like to ping this patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590960.html
>
> Thanks.
>
> On Mon, Feb 28, 2022 at 07:06:30AM +0100, Jakub Jelinek wrote:
> > As mentioned in the PR, the latest Intel SDM has added:
> > "Processors that enumerate support for Intel® AVX (by setting the feature 
> > flag CPUID.01H:ECX.AVX[bit 28])
> > guarantee that the 16-byte memory operations performed by the following 
> > instructions will always be
> > carried out atomically:
> > • MOVAPD, MOVAPS, and MOVDQA.
> > • VMOVAPD, VMOVAPS, and VMOVDQA when encoded with VEX.128.
> > • VMOVAPD, VMOVAPS, VMOVDQA32, and VMOVDQA64 when encoded with EVEX.128 and 
> > k0 (masking disabled).
> > (Note that these instructions require the linear addresses of their memory 
> > operands to be 16-byte
> > aligned.)"
> >
> > The following patch deals with it just on the libatomic library side so far,
> > currently (since ~ 2017) we emit all the __atomic_* 16-byte builtins as
> > library calls since and this is something that we can hopefully backport.
> >
> > The patch simply introduces yet another ifunc variant that takes priority
> > over the pure CMPXCHG16B one, one that checks AVX and CMPXCHG16B bits and
> > on non-Intel clears the AVX bit during detection for now (if AMD comes
> > with the same guarantee, we could revert the config/x86/init.c hunk),
> > which implements 16-byte atomic load as vmovdqa and 16-byte atomic store
> > as vmovdqa followed by mfence.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk so far?
> >
> > 2022-02-28  Jakub Jelinek  
> >
> >   PR target/104688
> >   * Makefile.am (IFUNC_OPTIONS): Change on x86_64 to -mcx16 -mcx16.
> >   (libatomic_la_LIBADD): Add $(addsuffix _16_2_.lo,$(SIZEOBJS)) for
> >   x86_64.
> >   * Makefile.in: Regenerated.
> >   * config/x86/host-config.h (IFUNC_COND_1): For x86_64 define to
> >   both AVX and CMPXCHG16B bits.
> >   (IFUNC_COND_2): Define.
> >   (IFUNC_NCOND): For x86_64 define to 2 * (N == 16).
> >   (MAYBE_HAVE_ATOMIC_CAS_16, MAYBE_HAVE_ATOMIC_EXCHANGE_16,
> >   MAYBE_HAVE_ATOMIC_LDST_16): Define to IFUNC_COND_2 rather than
> >   IFUNC_COND_1.
> >   (HAVE_ATOMIC_CAS_16): Redefine to 1 whenever IFUNC_ALT != 0.
> >   (HAVE_ATOMIC_LDST_16): Redefine to 1 whenever IFUNC_ALT == 1.
> >   (atomic_compare_exchange_n): Define whenever IFUNC_ALT != 0
> >   on x86_64 for N == 16.
> >   (__atomic_load_n, __atomic_store_n): Redefine whenever IFUNC_ALT == 1
> >   on x86_64 for N == 16.
> >   (atomic_load_n, atomic_store_n): New functions.
> >   * config/x86/init.c (__libat_feat1_init): On x86_64 clear bit_AVX
> >   if CPU vendor is not Intel.

LGTM.

Thanks,
Uros.


[PATCH] PR tree-optimization/102943 - Always use dominators in the cache when available.

2022-03-17 Thread Andrew MacLeod via Gcc-patches
As discussed in the PR, this patch adjusts rangers cache query to 
*always* use dominators to lookup ranges in the cache.


It use to give up if it encountered a block with outgoing edge ranges. 
This patch changes that to continue looking back until a value is found, 
then applies any outgoing ranges encountered along the way.


This prevents us from filling the on-entry cache in blocks which don't 
really affect the outcome, and causes significant memory reduction is 
large functions at a nominal calculation cost.


I have tweaked the debug output as well as a couple of other little 
things from the original patch attached in the PR. Confirmed that the 
performance results are similar, and generated reams of debug output to 
ensure there are no issues there.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  OK for trunk?

Andrew
commit bcb3f3534412e1774ac9791f0de7ceb000bf2184
Author: Andrew MacLeod 
Date:   Thu Mar 17 10:52:10 2022 -0400

Always use dominators in the cache when available.

This patch adjusts range_from_dom to follow the dominator tree through the
cache until value is found, then apply any outgoing ranges encountered
along the way.  This reduces the amount of cache storage required.

PR tree-optimization/102943
* gimple-range-cache.cc (ranger_cache::range_from_dom): Find range via
dominators and apply intermediary outgoing edge ranges.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 583ba29eb63..421ea1a20ef 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "gimple-iterator.h"
 #include "gimple-walk.h"
+#include "cfganal.h"
 
 #define DEBUG_RANGE_CACHE (dump_file	\
 			   && (param_ranger_debug & RANGER_DEBUG_CACHE))
@@ -1398,62 +1399,108 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
 }
 
 
-// Check to see if we can simply get the range from the dominator.
+// Get the range of NAME from dominators of BB and return it in R.
 
 bool
-ranger_cache::range_from_dom (irange , tree name, basic_block bb)
+ranger_cache::range_from_dom (irange , tree name, basic_block start_bb)
 {
-  gcc_checking_assert (dom_info_available_p (CDI_DOMINATORS));
+  if (!dom_info_available_p (CDI_DOMINATORS))
+return false;
 
   // Search back to the definition block or entry block.
   basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (name));
   if (def_bb == NULL)
 def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
 
+  basic_block bb;
+  basic_block prev_bb = start_bb;
   // Flag if we encounter a block with non-null set.
   bool non_null = false;
-  for (bb = get_immediate_dominator (CDI_DOMINATORS, bb);
-   bb && bb != def_bb;
-   bb = get_immediate_dominator (CDI_DOMINATORS, bb))
+
+  // Range on entry to the DEF block should not be queried.
+  gcc_checking_assert (start_bb != def_bb);
+  m_workback.truncate (0);
+
+  // Default value is global range.
+  get_global_range (r, name);
+
+  // Search until a value is found, pushing outgoing edges encountered.
+  for (bb = get_immediate_dominator (CDI_DOMINATORS, start_bb);
+   bb;
+   prev_bb = bb, bb = get_immediate_dominator (CDI_DOMINATORS, bb))
 {
-  // If there is an outgoing range, the on-entry value won't work.
+  if (!non_null)
+	non_null |= m_non_null.non_null_deref_p (name, bb, false);
+
+  // This block has an outgoing range.
   if (m_gori.has_edge_range_p (name, bb))
 	{
-	  // Check if we can seed this block with a dominator value. THis will
-	  // prevent the ache from being filled back further than this.
-	  if (bb != def_bb && range_from_dom (r, name, bb))
-	m_on_entry.set_bb_range (name, bb, r);
-	  return false;
+	  // Only outgoing ranges to single_pred blocks are dominated by
+	  // outgoing edge ranges, so only those need to be considered.
+	  edge e = find_edge (bb, prev_bb);
+	  if (e && single_pred_p (prev_bb))
+	m_workback.quick_push (prev_bb);
 	}
 
-  // Flag if we see a non-null reference during this walk.
-  if (m_non_null.non_null_deref_p (name, bb, false))
-	non_null = true;
+  if (def_bb == bb)
+	break;
 
-  // If range-on-entry is set in this block, it can be used.
   if (m_on_entry.get_bb_range (r, name, bb))
+	break;
+}
+
+  if (DEBUG_RANGE_CACHE)
+{
+  fprintf (dump_file, "CACHE: BB %d DOM query, found ", start_bb->index);
+  r.dump (dump_file);
+  if (bb)
+	fprintf (dump_file, " at BB%d\n", bb->index);
+  else
+	fprintf (dump_file, " at function top\n");
+}
+
+  // Now process any outgoing edges that we seen along the way.
+  while (m_workback.length () > 0)
+{
+  int_range_max edge_range;
+  prev_bb = m_workback.pop ();
+  edge e = single_pred_edge (prev_bb);
+  bb = e->src;
+
+  if (m_gori.outgoing_edge_range_p (edge_range, e, name, *this))
 	{
-	  // Apply 

Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-17 Thread Siddhesh Poyarekar

On 17/03/2022 22:16, Jeff Law wrote:

#include 
char a[] = "abc";
char b[] = "abcd";

int f (void)
{
   return strncmp (a, b, 8);
}

where I get

t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5
[-Wstringop-overread]
     7 |   return strncmp (a, b, 8);   // -Wstringop-overread
       |          ^

even without -Wall.  strncmp sees that a[3] is '\0' so it stops
comparing
and there's no UB.

This one is a clear case where warning is bad.   Both arguments are 
constant and we can determine they are NUL terminated and an overread 
will never occur.  No deep analysis really needed here.


THe far more interesting case in my mind is when one or both arguments 
have an unknown NUL termination state.  I could argue either side of 
that case.  I lean towards warning but I understand that opinions differ 
and my priorities have moved away from distro-level issues, so 
identifying code that needs a careful review for correctness, 
particularly old or security sensitive code, has become a much lower 
priority for me.   Combine that with the fact that we're really just 
dealing with over-reads here, I can support whatever the broadest 
consensus is.


Actually in the above reproducer a and b are not const, so this is in 
fact the case where the NUL termination state of the strings is in 
theory unknown.  From the distro level (and in general for applications) 
the question is how common this is and I gathered from a Red Hat 
internal conversation that it's not uncommon.  However David pointed out 
that I need to share more specific examples to quantify this, so I need 
to work on that.  I'll share an update once I have it.


One case I am aware of is the pmix package in Fedora/RHEL, which has the 
following warning:


pmix-3.2.3/examples/alloc.c: scope_hint: In function 'main'
pmix-3.2.3/examples/alloc.c:179:31: warning[-Wstringop-overread]: 
'PMIx_Get' reading 512 bytes from a region of size 15
  179 | if (PMIX_SUCCESS != (rc = PMIx_Get(, PMIX_UNIV_SIZE, 
NULL, 0, ))) {
  | 
^~
pmix-3.2.3/examples/alloc.c:179:31: note: referencing argument 2 of type 
'const char *'

pmix-3.2.3/examples/alloc.c:33: included_from: Included from here.
pmix-3.2.3/include/pmix.h:203:27: note: in a call to function 'PMIx_Get'
  203 | PMIX_EXPORT pmix_status_t PMIx_Get(const pmix_proc_t *proc, 
const pmix_key_t key,

  |   ^~~~
  177|   PMIX_PROC_CONSTRUCT();
  178|   PMIX_LOAD_PROCID(, myproc.nspace, PMIX_RANK_WILDCARD);
  179|-> if (PMIX_SUCCESS != (rc = PMIx_Get(, PMIX_UNIV_SIZE, 
NULL, 0, ))) {
  180|   fprintf(stderr, "Client ns %s rank %d: PMIx_Get 
universe size failed: %d\n", myproc.nspace, myproc.rank, rc);

  181|   goto done;

which is due to PMIx_Get calling strncmp a few levels within with 
non-const strings and a max size of 512 (the maximum size that a key 
could be; AFAICT it's the size of the buffer into which the key gets 
written out), where the strings are always NULL terminated.


Thanks,
Siddhesh


Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-17 Thread Andreas Schwab
On Mär 17 2022, Jeff Law via Gcc-patches wrote:

> On Thu, Mar 17, 2022 at 9:32 AM Marek Polacek via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>> I think I agree, I've tried
>>
>> #include 
>> char a[] = "abc";
>> char b[] = "abcd";
>>
>> int f (void)
>> {
>>   return strncmp (a, b, 8);
>> }
>>
>> where I get
>>
>> t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5
>> [-Wstringop-overread]
>> 7 |   return strncmp (a, b, 8);   // -Wstringop-overread
>>   |  ^
>>
>> even without -Wall.  strncmp sees that a[3] is '\0' so it stops comparing
>> and there's no UB.
>>
> This one is a clear case where warning is bad.   Both arguments are
> constant and we can determine they are NUL terminated and an overread will
> never occur.  No deep analysis really needed here.

Both a and b are modifiable, thus the compiler cannot assume anything.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-17 Thread Jeff Law via Gcc-patches
On Thu, Mar 17, 2022 at 9:32 AM Marek Polacek via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On Wed, Mar 16, 2022 at 06:54:51AM +0530, Siddhesh Poyarekar wrote:
> > On 16/03/2022 02:06, Martin Sebor wrote:
> > > The intended use of the strncmp bound is to limit the comparison to
> > > at most the size of the arrays or (in a subset of cases) the length
> > > of an initial substring. Providing an arbitrary bound that's not
> > > related to the sizes as you describe sounds very much like a misuse.
> >
> > Nothing in the standard says that the bound is related to the sizes of
> input
> > buffers.  I don't think deducing that intent makes sense either, nor
> > concluding that any other use case is misuse.
> >
> > > As a historical note, strncmp was first introduced in UNIX v7 where
> > > its purpose, alongside strncpy, was to manipulate (potentially)
> > > unterminated character arrays like file names stored in fixed size
> > > arrays (typically 14 bytes).  Strncpy would fill the buffers with
> > > ASCII data up to their size and pad the rest with nuls only if there
> > > was room.
> > >
> > > Strncmp was then used to compare these potentially unterminated
> > > character arrays (e.g., archive headers in ld and ranlib).  The bound
> > > was the size of the fixed size array.  Its other use case was to
> compare
> > > leading portions of strings (e.g, when looking for an environment
> > > variable or when stripping "./" from path names).
> >
> > Thanks for sharing the historical perspective.
> >
> > > Since the early UNIX days, both strncpy and to a lesser extent strncmp
> > > have been widely misused and, along with many other functions in
> > > , a frequent source of bugs due to common misunderstanding
> > > of their intended purpose.  The aim of these warnings is to detect
> > > the common (and sometimes less common) misuses and bugs.
> >
> > They're all valid uses however since they do not violate the standard.
> If we
> > find at compile time that the strings don't terminate at the bounds,
> > emitting the warning is OK but the more pessimistic check seems like
> > overkill.
>
> I think I agree, I've tried
>
> #include 
> char a[] = "abc";
> char b[] = "abcd";
>
> int f (void)
> {
>   return strncmp (a, b, 8);
> }
>
> where I get
>
> t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5
> [-Wstringop-overread]
> 7 |   return strncmp (a, b, 8);   // -Wstringop-overread
>   |  ^
>
> even without -Wall.  strncmp sees that a[3] is '\0' so it stops comparing
> and there's no UB.
>
This one is a clear case where warning is bad.   Both arguments are
constant and we can determine they are NUL terminated and an overread will
never occur.  No deep analysis really needed here.

THe far more interesting case in my mind is when one or both arguments have
an unknown NUL termination state.  I could argue either side of that case.
I lean towards warning but I understand that opinions differ and my
priorities have moved away from distro-level issues, so identifying code
that needs a careful review for correctness, particularly old or security
sensitive code, has become a much lower priority for me.   Combine that
with the fact that we're really just dealing with over-reads here, I can
support whatever the broadest consensus is.

jeff


[PATCH] openmp: Fix up gomp_affinity_init_numa_domains

2022-03-17 Thread Jakub Jelinek via Gcc-patches
Hi!

On Thu, Nov 11, 2021 at 02:14:05PM +0100, Thomas Schwinge wrote:
> There appears to be yet another issue: there still are quite a number of
> 'FAIL: libgomp.c/places-10.c execution test' reports on
> .  Also in my testing testing, on a system
> where '/sys/devices/system/node/online' contains '0-1', I get a FAIL:
> 
> [...]
> OPENMP DISPLAY ENVIRONMENT BEGIN
>   _OPENMP = '201511'
>   OMP_DYNAMIC = 'FALSE'
>   OMP_NESTED = 'FALSE'
>   OMP_NUM_THREADS = '8'
>   OMP_SCHEDULE = 'DYNAMIC'
>   OMP_PROC_BIND = 'TRUE'
>   OMP_PLACES = '{0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30},{FAIL: 
> libgomp.c/places-10.c execution test

I've finally managed to debug this (by dumping used /sys/ files from
an affected system in Fedora build system, replacing /sys/ with /tmp/
in gcc sources and populating there those files), I think following patch
ought to fix it.
I'll test it tonight in my bootstrap/regtest (but that's a non-numa box),
can somebody with a box where places-10.c fails reliably test this too?

Thanks.

2022-03-17  Jakub Jelinek  

* config/linux/affinity.c (gomp_affinity_init_numa_domains): Move seen
variable next to pl variable.

--- libgomp/config/linux/affinity.c.jj  2022-01-11 23:11:23.887269117 +0100
+++ libgomp/config/linux/affinity.c 2022-03-17 17:05:38.129008653 +0100
@@ -411,11 +411,11 @@ gomp_affinity_init_numa_domains (unsigne
{
  char *p = line;
  void *pl = NULL;
+ bool seen = false;
 
  while (*p && *p != '\n')
{
  unsigned long first, last;
- bool seen = false;
 
  errno = 0;
  first = strtoul (p, , 10);


Jakub



Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-17 Thread Marek Polacek via Gcc-patches
On Wed, Mar 16, 2022 at 06:54:51AM +0530, Siddhesh Poyarekar wrote:
> On 16/03/2022 02:06, Martin Sebor wrote:
> > The intended use of the strncmp bound is to limit the comparison to
> > at most the size of the arrays or (in a subset of cases) the length
> > of an initial substring. Providing an arbitrary bound that's not
> > related to the sizes as you describe sounds very much like a misuse.
> 
> Nothing in the standard says that the bound is related to the sizes of input
> buffers.  I don't think deducing that intent makes sense either, nor
> concluding that any other use case is misuse.
> 
> > As a historical note, strncmp was first introduced in UNIX v7 where
> > its purpose, alongside strncpy, was to manipulate (potentially)
> > unterminated character arrays like file names stored in fixed size
> > arrays (typically 14 bytes).  Strncpy would fill the buffers with
> > ASCII data up to their size and pad the rest with nuls only if there
> > was room.
> > 
> > Strncmp was then used to compare these potentially unterminated
> > character arrays (e.g., archive headers in ld and ranlib).  The bound
> > was the size of the fixed size array.  Its other use case was to compare
> > leading portions of strings (e.g, when looking for an environment
> > variable or when stripping "./" from path names).
> 
> Thanks for sharing the historical perspective.
> 
> > Since the early UNIX days, both strncpy and to a lesser extent strncmp
> > have been widely misused and, along with many other functions in
> > , a frequent source of bugs due to common misunderstanding
> > of their intended purpose.  The aim of these warnings is to detect
> > the common (and sometimes less common) misuses and bugs.
> 
> They're all valid uses however since they do not violate the standard. If we
> find at compile time that the strings don't terminate at the bounds,
> emitting the warning is OK but the more pessimistic check seems like
> overkill.

I think I agree, I've tried

#include 
char a[] = "abc";
char b[] = "abcd";

int f (void)
{
  return strncmp (a, b, 8);
}
 
where I get

t.c:7:10: warning: ‘strncmp’ specified bound 8 exceeds source size 5 
[-Wstringop-overread]
7 |   return strncmp (a, b, 8);   // -Wstringop-overread
  |  ^

even without -Wall.  strncmp sees that a[3] is '\0' so it stops comparing
and there's no UB.

GCC 11 didn't emit that, so +1 for dialing this warning down.

> > I haven't seen these so I can't very well comment on them.  But I can
> > assure you that warning for the code above is intentional.  Whether
> > or not the arrays are nul-terminated, the expected way to call
> > the function is with a bound no greater than their size (some coding
> > guidelines are explicit about this; see for example the CERT C Secure
> > Coding standard rule ARR38-C).
> > 
> > (Granted, the manual makes it sound like -Wstringop-overread only
> > detects provable past-the-end reads.  That's a mistake in
> > the documentation that should be fixed.  The warning was never quite
> > so limited, nor was it intended to be.)
> 
> The contention is not that it's not provable, it's more that it's doesn't
> even pass the "based on available information this is definitely buggy"
> assertion, making it more a strong suggestion than a warning that something
> is definitely amiss.  Which is why IMO it is more suitable as an analyzer
> check than a warning.
> 
> Thanks,
> Siddhesh
> 

Marek



Re: [PATCH] c++: memory corruption during name lookup w/ modules [PR99479]

2022-03-17 Thread Patrick Palka via Gcc-patches
On Tue, Mar 1, 2022 at 8:13 AM Patrick Palka  wrote:
>
> On Thu, Feb 17, 2022 at 3:24 PM Patrick Palka  wrote:
> >
> > name_lookup::search_unqualified uses a statically allocated vector
> > in order to avoid repeated reallocation, under the assumption that
> > the function can't be called recursively.  With modules however,
> > this assumption turns out to be false, and search_unqualified can
> > be called recursively as demonstrated by testcase in comment #19
> > of PR99479[1] where the recursive call causes the vector to get
> > reallocated which invalidates the reference held by the parent call.
> >
> > This patch makes search_unqualified instead use an auto_vec with 16
> > elements of internal storage (since with the various libraries I tested,
> > the size of the vector never exceeded 12).  In turn we can simplify the
> > API of subroutines to take the vector by reference and return void.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk?
>
> Ping.

Ping.

>
> >
> > [1]: https://gcc.gnu.org/PR99479#c19
> >
> > PR c++/99479
> >
> > gcc/cp/ChangeLog:
> >
> > * name-lookup.cc (name_lookup::using_queue): Change to an
> > auto_vec (with 16 elements of internal storage).
> > (name_lookup::queue_namespace): Change return type to void,
> > take queue parameter by reference and adjust function body
> > accordingly.
> > (name_lookup::do_queue_usings): Inline into ...
> > (name_lookup::queue_usings): ... here.  As in queue_namespace.
> > (name_lookup::search_unqualified): Don't make queue static,
> > assume its incoming length is 0, and adjust function body
> > accordingly.
> > ---
> >  gcc/cp/name-lookup.cc | 62 +++
> >  1 file changed, 22 insertions(+), 40 deletions(-)
> >
> > diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
> > index 93c4eb7193b..5c965d6fba1 100644
> > --- a/gcc/cp/name-lookup.cc
> > +++ b/gcc/cp/name-lookup.cc
> > @@ -429,7 +429,7 @@ class name_lookup
> >  {
> >  public:
> >typedef std::pair using_pair;
> > -  typedef vec using_queue;
> > +  typedef auto_vec using_queue;
> >
> >  public:
> >tree name;   /* The identifier being looked for.  */
> > @@ -528,16 +528,8 @@ private:
> >bool search_usings (tree scope);
> >
> >  private:
> > -  using_queue *queue_namespace (using_queue *queue, int depth, tree scope);
> > -  using_queue *do_queue_usings (using_queue *queue, int depth,
> > -   vec *usings);
> > -  using_queue *queue_usings (using_queue *queue, int depth,
> > -vec *usings)
> > -  {
> > -if (usings)
> > -  queue = do_queue_usings (queue, depth, usings);
> > -return queue;
> > -  }
> > +  void queue_namespace (using_queue& queue, int depth, tree scope);
> > +  void queue_usings (using_queue& queue, int depth, vec 
> > *usings);
> >
> >  private:
> >void add_fns (tree);
> > @@ -1084,39 +1076,35 @@ name_lookup::search_qualified (tree scope, bool 
> > usings)
> >  /* Add SCOPE to the unqualified search queue, recursively add its
> > inlines and those via using directives.  */
> >
> > -name_lookup::using_queue *
> > -name_lookup::queue_namespace (using_queue *queue, int depth, tree scope)
> > +void
> > +name_lookup::queue_namespace (using_queue& queue, int depth, tree scope)
> >  {
> >if (see_and_mark (scope))
> > -return queue;
> > +return;
> >
> >/* Record it.  */
> >tree common = scope;
> >while (SCOPE_DEPTH (common) > depth)
> >  common = CP_DECL_CONTEXT (common);
> > -  vec_safe_push (queue, using_pair (common, scope));
> > +  queue.safe_push (using_pair (common, scope));
> >
> >/* Queue its inline children.  */
> >if (vec *inlinees = DECL_NAMESPACE_INLINEES (scope))
> >  for (unsigned ix = inlinees->length (); ix--;)
> > -  queue = queue_namespace (queue, depth, (*inlinees)[ix]);
> > +  queue_namespace (queue, depth, (*inlinees)[ix]);
> >
> >/* Queue its using targets.  */
> > -  queue = queue_usings (queue, depth, NAMESPACE_LEVEL 
> > (scope)->using_directives);
> > -
> > -  return queue;
> > +  queue_usings (queue, depth, NAMESPACE_LEVEL (scope)->using_directives);
> >  }
> >
> >  /* Add the namespaces in USINGS to the unqualified search queue.  */
> >
> > -name_lookup::using_queue *
> > -name_lookup::do_queue_usings (using_queue *queue, int depth,
> > - vec *usings)
> > +void
> > +name_lookup::queue_usings (using_queue& queue, int depth, vec 
> > *usings)
> >  {
> > -  for (unsigned ix = usings->length (); ix--;)
> > -queue = queue_namespace (queue, depth, (*usings)[ix]);
> > -
> > -  return queue;
> > +  if (usings)
> > +for (unsigned ix = usings->length (); ix--;)
> > +  queue_namespace (queue, depth, (*usings)[ix]);
> >  }
> >
> >  /* Unqualified namespace lookup in SCOPE.
> > @@ -1128,15 +1116,12 @@ 

[PING^2][PATCH][final] Handle compiler-generated asm insn

2022-03-17 Thread Tom de Vries via Gcc-patches

On 3/9/22 13:50, Tom de Vries wrote:

On 2/22/22 14:55, Tom de Vries wrote:

Hi,

For the nvptx port, with -mptx-comment we have in pr53465.s:
...
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // Start: Added by -minit-regs=3:
 // #NO_APP
 mov.u32 %r26, 0;
 // #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
 // End: Added by -minit-regs=3:
 // #NO_APP
...

The comments where generated using the compiler-generated equivalent of:
...
   asm ("// Comment");
...
but both the printed location and the NO_APP/APP are unnecessary for a
compiler-generated asm insn.

Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in
final_scan_insn_1, such what we simply get:
...
 // Start: Added by -minit-regs=3:
 mov.u32 %r26, 0;
 // End: Added by -minit-regs=3:
...

Tested on nvptx.

OK for trunk?





Ping^2.

Tobias just reported an ICE in PR104968, and this patch fixes it.

I'd like to known whether this patch is acceptable for stage 4 or not.

If not, I need to fix PR104968 in a different way.  Say, disable 
-mcomment by default, or trying harder to propagate source info on 
outlined functions.


Thanks,
- Tom


[final] Handle compiler-generated asm insn

gcc/ChangeLog:

2022-02-21  Tom de Vries  

PR rtl-optimization/104596
* config/nvptx/nvptx.cc (gen_comment): Use gen_rtx_ASM_INPUT instead
of gen_rtx_ASM_INPUT_loc.
* final.cc (final_scan_insn_1): Handle
ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION.

---
  gcc/config/nvptx/nvptx.cc |  3 +--
  gcc/final.cc  | 17 +++--
  2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 858789e6df7..4124c597f24 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5381,8 +5381,7 @@ gen_comment (const char *s)
    size_t len = strlen (ASM_COMMENT_START) + strlen (sep) + strlen 
(s) + 1;

    char *comment = (char *) alloca (len);
    snprintf (comment, len, "%s%s%s", ASM_COMMENT_START, sep, s);
-  return gen_rtx_ASM_INPUT_loc (VOIDmode, ggc_strdup (comment),
-    cfun->function_start_locus);
+  return gen_rtx_ASM_INPUT (VOIDmode, ggc_strdup (comment));
  }
  /* Initialize all declared regs at function entry.
diff --git a/gcc/final.cc b/gcc/final.cc
index a9868861bd2..e6443ef7a4f 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -2642,15 +2642,20 @@ final_scan_insn_1 (rtx_insn *insn, FILE *file, 
int optimize_p ATTRIBUTE_UNUSED,

  if (string[0])
    {
  expanded_location loc;
+    bool unknown_loc_p
+  = ASM_INPUT_SOURCE_LOCATION (body) == UNKNOWN_LOCATION;
-    app_enable ();
-    loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body));
-    if (*loc.file && loc.line)
-  fprintf (asm_out_file, "%s %i \"%s\" 1\n",
-   ASM_COMMENT_START, loc.line, loc.file);
+    if (!unknown_loc_p)
+  {
+    app_enable ();
+    loc = expand_location (ASM_INPUT_SOURCE_LOCATION (body));
+    if (*loc.file && loc.line)
+  fprintf (asm_out_file, "%s %i \"%s\" 1\n",
+   ASM_COMMENT_START, loc.line, loc.file);
+  }
  fprintf (asm_out_file, "\t%s\n", string);
  #if HAVE_AS_LINE_ZERO
-    if (*loc.file && loc.line)
+    if (!unknown_loc_p && loc.file && *loc.file && loc.line)
    fprintf (asm_out_file, "%s 0 \"\" 2\n", ASM_COMMENT_START);
  #endif
    }


Re: [PATCH v2] x86: Also check _SOFT_FLOAT in

2022-03-17 Thread H.J. Lu via Gcc-patches
On Mon, Mar 14, 2022 at 7:31 AM H.J. Lu  wrote:
>
> Push target("general-regs-only") in  if x87 is enabled.
>
> gcc/
>
> PR target/104890
> * config/i386/x86gprintrin.h: Also check _SOFT_FLOAT before
> pushing target("general-regs-only").
>
> gcc/testsuite/
>
> PR target/104890
> * gcc.target/i386/pr104890.c: New test.
> ---
>  gcc/config/i386/x86gprintrin.h   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr104890.c | 11 +++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104890.c
>
> diff --git a/gcc/config/i386/x86gprintrin.h b/gcc/config/i386/x86gprintrin.h
> index 017ec299793..e0be01d5e78 100644
> --- a/gcc/config/i386/x86gprintrin.h
> +++ b/gcc/config/i386/x86gprintrin.h
> @@ -24,7 +24,7 @@
>  #ifndef _X86GPRINTRIN_H_INCLUDED
>  #define _X86GPRINTRIN_H_INCLUDED
>
> -#if defined __MMX__ || defined __SSE__
> +#if !defined _SOFT_FLOAT || defined __MMX__ || defined __SSE__
>  #pragma GCC push_options
>  #pragma GCC target("general-regs-only")
>  #define __DISABLE_GENERAL_REGS_ONLY__
> diff --git a/gcc/testsuite/gcc.target/i386/pr104890.c 
> b/gcc/testsuite/gcc.target/i386/pr104890.c
> new file mode 100644
> index 000..cb430eef688
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr104890.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-options "-O2 -mshstk -march=i686" } */
> +
> +#include 
> +
> +__attribute__((target ("general-regs-only")))
> +int
> +foo ()
> +{
> +  return _get_ssp ();
> +}
> --
> 2.35.1
>

OK to backport to GCC 11?

Thanks.

-- 
H.J.


Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-17 Thread Aldy Hernandez via Gcc-patches
I agree with Siddhesh (and Jonathan).

Is there a release manager that can either review his patch, or
provide feedback as to what the course of action should be here?

Thanks.
Aldy

On Thu, Mar 17, 2022 at 2:44 AM Siddhesh Poyarekar  wrote:
>
> On 17/03/2022 05:11, Martin Sebor wrote:
> > As the GCC manual prominently states (and as I already pointed out)
> > warnings are:
>
> We are indeed going around in circles.  Hopefully someone else will
> pitch in and break the deadlock.
>
> Siddhesh
>



Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-17 Thread David Malcolm via Gcc-patches
On Tue, 2022-03-15 at 22:10 +0530, Siddhesh Poyarekar wrote:
> On 15/03/2022 21:09, Martin Sebor wrote:
> > The strncmp function takes arrays as arguments (not necessarily
> > strings).  The main purpose of the -Wstringop-overread warning
> > for calls to it is to detect calls where one of the arrays is
> > not a nul-terminated string and the bound is larger than the size
> > of the array.  For example:
> > 
> >    char a[4], b[4];
> > 
> >    int f (void)
> >    {
> >  return strncmp (a, b, 8);   // -Wstringop-overread
> >    }
> > 
> > Such a call is suspect: if one of the arrays isn't nul-terminated
> > the call is undefined.  Otherwise, if both are nul-terminated there
> 
> Isn't "suspect" too harsh a description though?  The bound does not 
> specify the size of a or b, it specifies the maximum extent to which to
> compare a and b, the extent being any application-specific limit.  In
> fact the limit could be the size of some arbitrary third buffer that
> the 
> contents of a or b must be copied to, truncating to the bound.
> 
> I agree the call is undefined if one of the arrays is not nul-
> terminated 
> and that's the thing; nothing about the bound is undefined in this 
> context, it's the NUL termination that is key.
> 
> > is no point in calling strncmp with a bound greater than their sizes.
> 
> There is, when the bound describes something else, e.g. the size of a
> third destination buffer into which one of the input buffers may get 
> copied into.  Or when the bound describes the maximum length of a set
> of 
> strings where only a subset of the strings are reachable in the current
> function and ranger sees it, allowing us to reduce our input string
> size 
> estimate.  The bounds being the maximum of the lengths of two input 
> strings is just one of many possibilities.
> 
> > With no evidence that this warning is ever harmful I'd consider
> 
> There is, the false positives were seen in Fedora/RHEL builds.
> 
> > suppressing it a regression.  Since the warning is a deliberate
> > feature in a released compiler and GCC is now in a regression
> > fixing stage, this patch is out of scope even if a case where
> > the warning wasn't helpful did turn up (none has been reported
> > so far).
> 
> Wait, I just reported an issue and it's across multiple packages in 
> Fedora/RHEL :)

I think having more examples of where this is happening on real-world
code might help this discussion.

Do you have some URLs? (e.g. bug reports, build logs, etc)

Reading through this thread, all of the examples I've seen look like
they've been constructed; I'm interested in seeing what this looks like
"in the wild" as it were.

Quoting myself in the UX Guidelines:
  https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html

"28.1.3 Try the diagnostic on real-world code

It’s worth testing a new warning on many instances of real-world code,
written by different people, and seeing what it complains about, and
what it doesn’t complain about.

This may suggest heuristics that silence common false positives.

It may also suggest ways to improve the precision of the message."

...and various other parts of those guidelines may apply here.

Hope this is constructive
Dave





> 
> I think this is a regression since gcc 11 due to misunderstanding the
> specification and assuming too strong a relationship between the size
> argument of strncmp (and indeed strnlen and strndup) and the size of 
> objects being passed to it.  Compliant code relies on the compiler to
> do 
> the right thing here, i.e. optimize the strncmp call to strcmp and not 
> panic about the size argument being larger than the input buffer size. 
> If at all such a diagnostic needs to stay, it ought to go into the 
> analyzer, where such looser heuristic suggestions are more acceptable
> and sometimes even appreciated.
> 
> FWIW, I'm open to splitting the warning levels as you suggested if 
> that's the consensus since it at least provides a way to make these 
> warnings saner. However I still haven't found the rationale presented
> so 
> far compelling enough to justify these false positives; I just don't
> see 
> a proportional enough reward.  Hopefully more people can chime in with 
> their perspective on this.
> 
> Thanks,
> Siddhesh
> 




[committed] libstdc++: Fix comment in testsuite utility

2022-03-17 Thread Jonathan Wakely via Gcc-patches
Pushed as obvious.

-- >8 --

libstdc++-v3/ChangeLog:

* testsuite/util/testsuite_character.h: Fix comment.
---
 libstdc++-v3/testsuite/util/testsuite_character.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/testsuite/util/testsuite_character.h 
b/libstdc++-v3/testsuite/util/testsuite_character.h
index 1547029786a..050159b1271 100644
--- a/libstdc++-v3/testsuite/util/testsuite_character.h
+++ b/libstdc++-v3/testsuite/util/testsuite_character.h
@@ -138,7 +138,7 @@ namespace __gnu_cxx
   inline V2
   __gnu_test::pod_uchar::char_type::to(const char_type& c)
   { return static_cast(c.value << 5); }
-} // namespace __gnu_test
+} // namespace __gnu_cxx
 
 namespace std
 {
@@ -562,8 +562,7 @@ namespace std
   pattern
   do_neg_format() const
   { return pattern(); }
- };
+};
 } // namespace std
 
 #endif // _GLIBCXX_TESTSUITE_CHARACTER_H
-
-- 
2.34.1



[PATCH RFC] Fix ICE due to shared BLOCK node in coroutine generation [PR103328]

2022-03-17 Thread Benno Evers via Gcc-patches
The coroutine transformation moves the original function body into a
newly created actor function, but the block of the
`current_binding_level` still points into the original function,
causing the block to be shared between the two functions if it is
subsequently used. This may cause havoc later on, as subsequent
compiler passes for one function will also implicitly modify the
other. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103328#c19 for
a more detailed writeup.

This patch fixes the issue locally, but I'm not familiar with the GCC
code base so I'm not sure if this is the right way to go about it or
if there's a cleaner way to reset the current binding level. If this
is the way to go I'm happy to extend the patch with a testcase and
changelog entry.

---
 gcc/cp/coroutines.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index b1bfdc767a4..eb5f80f499b 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4541,6 +4541,8 @@ morph_fn_to_coro (tree orig, tree *resumer, tree
*destroyer)
   BLOCK_VARS (top_block) = BIND_EXPR_VARS (ramp_bind);
   BLOCK_SUBBLOCKS (top_block) = NULL_TREE;

+  current_binding_level->blocks = top_block;
+
   /* The decl_expr for the coro frame pointer, initialize to zero so that we
  can pass it to the IFN_CO_FRAME (since there's no way to pass a type,
  directly apparently).  This avoids a "used uninitialized" warning.  */
-- 
2.32.0


[PATCH] [GIMPLE FE] allow to unit test loop passes

2022-03-17 Thread Richard Biener via Gcc-patches
The following arranges for the GIMPLE frontend to parse an
additional loops(...) specification, currently consisting of
'normal' and 'lcssa'.  The intent is to allow unit testing
of passes inside the GIMPLE loop optimization pipeline where
we keep the IL in loop-closed SSA and have SCEV initialized.

The GIMPLE parser side is only half of the story - the rest
of the patch makes sure we do not destroy loop state when
moving the IL through the skipped portion of the pass pipeline
which shows we are not careful with IPA passes here and those
tend to call loop_optimizer_init which destroys state.  The
patch arranges for IPA passes to honor fn->pass_startwith and
if set, refrain from considering the function.

Since the SCEV state is global we cannot initialize it during
GIMPLE parsing but we have to arrange for that to happen before
the pass we want to start with.  The patch heuristically
enables the loop-init pass based on the fact whether the IL
is in loop-closed SSA state and makes that aware of GIMPLE
testcases already arriving with properly initialized loops.

That's all quite some hacks but I think it's worth the ability
to unit test loop passes.

I've tried to do this before but PR104931 now triggered me to
try again (I have still to see whether that's enough to make   
a GIMPLE testcase trigger that bug).  I've skipped existing
GIMPLE testcases for -flto since when starting after IPA
using LTO doesn't make much sense and my IPA mangling ends up
crashing in the LTO writing.  There's possibly some better
way to "hide" the late to be started functions from IPA
(but we would still need to stream the body).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Any comments?

Thanks,
Richard.

2022-03-17  Richard Biener  

gcc/c/
* c-tree.h (c_declspecs::loops_state): Add.
* c-parser.cc (c_parser_declaration_or_fndef): Pass down
loops_state to c_parser_parse_gimple_body.
* gimple-parser.h (c_parser_parse_gimple_body): Adjust.
* gimple-parser.cc (c_parser_parse_gimple_body): Get
and initialize loops state according to specification.
(c_parser_gimple_or_rtl_pass_list): Parse loops[(spec...)]
with supported spec 'normal' and 'lcssa'.

gcc/
* passes.cc (should_skip_pass_p): When in LC SSA do not
skip loopinit.
* tree-ssa-loop.cc (pass_tree_loop_init::execute): Handle
functions already in LC SSA.
* ipa-cp.cc (ipcp_cloning_candidate_p): Skip functions
with pending startwith pass.
* ipa-fnsummary.c (ipa_fn_summary_generate): Likewise.
* ipa-inline.cc (inline_small_functions): Likewise.
(ipa_inline): Likewise.
* ipa-modref.cc (modref_generate): Likewise.

gcc/testsuite/
* gcc.dg/gimplefe-50.c: New testcase.
* gcc.dg/torture/pr89595.c: Skip -flto.
* gcc.dg/vect/bb-slp-48.c: Likewise.
* gcc.dg/vect/slp-reduc-10a.c: Likewise.
* gcc.dg/vect/slp-reduc-10b.c: Likewise.
* gcc.dg/vect/slp-reduc-10c.c: Likewise.
* gcc.dg/vect/slp-reduc-10d.c: Likewise.
* gcc.dg/vect/slp-reduc-10e.c: Likewise.
* gcc.dg/vect/vect-cond-arith-2.c: Likewise.
---
 gcc/c/c-parser.cc |  1 +
 gcc/c/c-tree.h|  3 ++
 gcc/c/gimple-parser.cc| 37 +-
 gcc/c/gimple-parser.h |  1 +
 gcc/ipa-cp.cc |  4 +-
 gcc/ipa-fnsummary.cc  |  4 +-
 gcc/ipa-inline.cc |  8 +++-
 gcc/ipa-modref.cc |  2 +-
 gcc/passes.cc |  7 +++
 gcc/testsuite/gcc.dg/gimplefe-50.c| 48 +++
 gcc/testsuite/gcc.dg/torture/pr89595.c|  1 +
 gcc/testsuite/gcc.dg/vect/bb-slp-48.c |  1 +
 gcc/testsuite/gcc.dg/vect/slp-reduc-10a.c |  1 +
 gcc/testsuite/gcc.dg/vect/slp-reduc-10b.c |  1 +
 gcc/testsuite/gcc.dg/vect/slp-reduc-10c.c |  1 +
 gcc/testsuite/gcc.dg/vect/slp-reduc-10d.c |  1 +
 gcc/testsuite/gcc.dg/vect/slp-reduc-10e.c |  1 +
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  1 +
 gcc/tree-ssa-loop.cc  | 11 +++--
 19 files changed, 125 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gimplefe-50.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 129dd727ef3..80091d81bb4 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -2537,6 +2537,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
  in_late_binary_op = true;
  c_parser_parse_gimple_body (parser, specs->gimple_or_rtl_pass,
  specs->declspec_il,
+ specs->loops_state,
  specs->entry_bb_count);
  in_late_binary_op = saved;
}
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 

[PATCH]rs6000: optimize li+rldicr+cmpd==>rotldi+cmpldi for 16bits

2022-03-17 Thread Jiufu Guo via Gcc-patches
When checking eq/neq with a constant which has only 16bits, then it can
be optimized to check the rotated data.  By this, the constant building
is optimized.

As the example in PR103743:
For "in == 0x8000LL", this patch generates:
rotldi %r3,%r3,16
cmpldi %cr0,%r3,32768
instead:
li %r9,-1
rldicr %r9,%r9,0,0
cmpd %cr0,%r3,%r9

This patch pass bootstrap and regtest on ppc64 and ppc64le.
Ok for trunk?  Thanks!

BR,
Jiufu


PR target/103743

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rot_to_16bits): New function.
(rs6000_generate_compare): Compare rotated const for eq.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr103743.c: New test.
* gcc.target/powerpc/pr103743_1.c: New test.

---
 gcc/config/rs6000/rs6000.cc   | 62 +
 gcc/testsuite/gcc.target/powerpc/pr103743.c   | 48 ++
 gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 87 +++
 3 files changed, 197 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 3afe78f5d04..4aa1d0ca4ab 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -14860,6 +14860,42 @@ rs6000_reverse_condition (machine_mode mode, enum 
rtx_code code)
 return reverse_condition (code);
 }
 
+/* If SRC is a constant with only 16bits, return the number by which
+   the constant can be rotated to lowest 16bits.
+   Return 0, if SRC does not meet requirements.
+   Set *sgn to 1 if the rotated cst is negative, otherwise set it to 0.
+   Set *res_cst to the rotated constant.  */
+
+static int
+rot_to_16bits (rtx src, machine_mode mode, bool *sgn, rtx *res_cst)
+{
+  if (!(src && CONST_INT_P (src)))
+return 0;
+
+  unsigned HOST_WIDE_INT C = INTVAL (src);
+
+  /* If all 0 except low 16bits.  */
+  int leadz = clz_hwi (C) + 16;
+  rtx cst = simplify_gen_binary (ROTATE, mode, src, GEN_INT (leadz));
+  if (satisfies_constraint_K (cst))
+{
+  *res_cst = cst;
+  return leadz;
+}
+
+  /* If all 1 except low 15bits.  */
+  leadz = clz_hwi (~C) + 15;
+  cst = simplify_gen_binary (ROTATE, mode, src, GEN_INT (leadz));
+  if (satisfies_constraint_I (cst))
+{
+  *sgn = true;
+  *res_cst = cst;
+  return leadz;
+}
+
+  return 0;
+}
+
 /* Generate a compare for CODE.  Return a brand-new rtx that
represents the result of the compare.  */
 
@@ -14889,6 +14925,32 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
   else
 comp_mode = CCmode;
 
+  /* "i == C" ==> "rotl(i,N) == rotl(C,N)" if rotl(C,N) only low 16bits.  */
+  if ((code == NE || code == EQ) && mode == DImode)
+{
+  /* The constant would already been set to a reg in the last insn.  */
+  rtx_insn *last = get_last_insn_anywhere ();
+  rtx set = last ? single_set (last) : NULL_RTX;
+  rtx src = (set && SET_DEST (set) == op1) ? SET_SRC (set) : NULL_RTX;
+
+  /* It constant may be in constant pool. */
+  if (src && MEM_P (src))
+   src = avoid_constant_pool_reference (src);
+
+  rtx cst = NULL_RTX;
+  bool sgn = false;
+  int n = rot_to_16bits (src, mode, , );
+  if (n >= 16 && n <= 48)
+   {
+ rtx dest = gen_reg_rtx (mode);
+ emit_insn (
+   gen_rtx_SET (dest, gen_rtx_ROTATE (mode, op0, GEN_INT (n;
+ op0 = dest;
+ op1 = cst;
+ comp_mode = sgn ? CCmode : CCUNSmode;
+   }
+}
+
   /* If we have an unsigned compare, make sure we don't have a signed value as
  an immediate.  */
   if (comp_mode == CCUNSmode && CONST_INT_P (op1)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743.c 
b/gcc/testsuite/gcc.target/powerpc/pr103743.c
new file mode 100644
index 000..606e5e51c4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "cmpldi" 8  } } */
+/* { dg-final { scan-assembler-times "cmpdi" 4  } } */
+/* { dg-final { scan-assembler-times "rotldi" 8  } } */
+
+int foo (int a);
+
+int __attribute__ ((noinline)) udi_fun (unsigned long long in)
+{
+  if (in == (0x8642ULL))
+return foo (1);
+  if (in == (0x7642ULL))
+return foo (12);
+  if (in == (0x8000ULL))
+return foo (32);
+  if (in == (0x8642ULL))
+return foo (46);
+  if (in == (0x7642ULL))
+return foo (51);
+  if (in == (0x756700ULL))
+return foo (9);
+  if (in == (0xFFF8567FULL))
+return foo (19);
+
+  return 0;
+}
+
+int __attribute__ ((noinline)) di_fun (long long in)
+{
+  if (in == (0x8642LL))
+return foo (1);
+  if (in == (0x7642LL))
+return foo (12);
+  if (in == (0x8000LL))
+return foo (32);

Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings

2022-03-17 Thread Jonathan Wakely via Gcc-patches

On 15/03/22 14:36 -0600, Martin Sebor wrote:

On 3/15/22 10:40, Siddhesh Poyarekar wrote:

On 15/03/2022 21:09, Martin Sebor wrote:

The strncmp function takes arrays as arguments (not necessarily
strings).? The main purpose of the -Wstringop-overread warning
for calls to it is to detect calls where one of the arrays is
not a nul-terminated string and the bound is larger than the size
of the array.? For example:

?? char a[4], b[4];

?? int f (void)
?? {
 return strncmp (a, b, 8);?? // -Wstringop-overread
?? }

Such a call is suspect: if one of the arrays isn't nul-terminated
the call is undefined.? Otherwise, if both are nul-terminated there


Isn't "suspect" too harsh a description though?? The bound does not
specify the size of a or b, it specifies the maximum extent to which to
compare a and b, the extent being any application-specific limit.? In
fact the limit could be the size of some arbitrary third buffer that the
contents of a or b must be copied to, truncating to the bound.


The intended use of the strncmp bound is to limit the comparison to
at most the size of the arrays or (in a subset of cases) the length
of an initial substring. Providing an arbitrary bound that's not
related to the sizes as you describe sounds very much like a misuse.

As a historical note, strncmp was first introduced in UNIX v7 where
its purpose, alongside strncpy, was to manipulate (potentially)
unterminated character arrays like file names stored in fixed size
arrays (typically 14 bytes).  Strncpy would fill the buffers with
ASCII data up to their size and pad the rest with nuls only if there
was room.

Strncmp was then used to compare these potentially unterminated
character arrays (e.g., archive headers in ld and ranlib).  The bound
was the size of the fixed size array.  Its other use case was to compare
leading portions of strings (e.g, when looking for an environment
variable or when stripping "./" from path names).

Since the early UNIX days, both strncpy and to a lesser extent strncmp
have been widely misused and, along with many other functions in
, a frequent source of bugs due to common misunderstanding
of their intended purpose.  The aim of these warnings is to detect
the common (and sometimes less common) misuses and bugs.


I agree the call is undefined if one of the arrays is not nul-terminated
and that's the thing; nothing about the bound is undefined in this
context, it's the NUL termination that is key.


is no point in calling strncmp with a bound greater than their sizes.


There is, when the bound describes something else, e.g. the size of a
third destination buffer into which one of the input buffers may get
copied into.? Or when the bound describes the maximum length of a set of
strings where only a subset of the strings are reachable in the current
function and ranger sees it, allowing us to reduce our input string size
estimate.? The bounds being the maximum of the lengths of two input
strings is just one of many possibilities.


With no evidence that this warning is ever harmful I'd consider


There is, the false positives were seen in Fedora/RHEL builds.


I haven't seen these so I can't very well comment on them.  But I can
assure you that warning for the code above is intentional.  Whether


I don't think anybody is saying it wasn't intentional, the point is
that we can change our minds and do it differently based on feedback
and usage experience.

If users no longer have faith in these warnings and just disable them
or ignore them, then the warnings do not find real bugs and are not
fit for purpose.


or not the arrays are nul-terminated, the expected way to call
the function is with a bound no greater than their size (some coding
guidelines are explicit about this; see for example the CERT C Secure
Coding standard rule ARR38-C).


That's fine. It shouldn't be in -Wall though. Please don't quote the
docs about suspicious constructs again, we're all smart enough to
consider things on balance and make trade offs instead of just taking
a dogmatic reading of the manual (especially when other parts of the
docs for these warnings are provably wrong).

I disagree this shouldn't be changed in stage 4, because the
ever-increasing false positive rate from the -Wstringop-xxx warnings
is a major regression in user experience with GCC.

You can keep insisting it's not a problem, but that doesn't match
other people's experience.




Re: [PATCH v2] x86: Also check _SOFT_FLOAT in

2022-03-17 Thread Marc Poulhiès via Gcc-patches
"H.J. Lu"  writes:

Hello,

> I am checking it in.

Are you planning on also applying this patch in the gcc-11 branch?

Thanks,
Marc


[PATCH] testsuite: fixup pr97521.c and pr96713.c on i686-*

2022-03-17 Thread Marc Poulhiès via Gcc-patches
On targets that do not have MXX/SSE enabled by default, pr97521
and pr96713 fail because they emit warnings:

pr97521.c:12:1: warning: MMX vector return without MMX enabled
changes the ABI [-Wpsabi]
pr97521.c:11:1: note: the ABI for passing parameters with
16-byte alignment has changed in GCC 4.6
pr97521.c:11:1: warning: SSE vector argument without SSE enabled
changes the ABI [-Wpsabi]

Add -Wno-abi to dg-options.

Ok for master?

gcc/testsuite/ChangeLog:
* gcc.target/i386/pr97521.c: Add -Wno-psabi to dg-options.
* gcc.dg/analyzer/pr96713.c: Likewise.

---
 gcc/testsuite/gcc.dg/analyzer/pr96713.c | 1 +
 gcc/testsuite/gcc.target/i386/pr97521.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96713.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96713.c
index fe9cafd73f1..12170bd4c64 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr96713.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96713.c
@@ -1,3 +1,4 @@
+/* { dg-options "-Wno-psabi" } */
 typedef int __attribute__ ((vector_size (8))) V;
 
 void
diff --git a/gcc/testsuite/gcc.target/i386/pr97521.c 
b/gcc/testsuite/gcc.target/i386/pr97521.c
index 804ffd61e13..5970bcff383 100644
--- a/gcc/testsuite/gcc.target/i386/pr97521.c
+++ b/gcc/testsuite/gcc.target/i386/pr97521.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O -mno-sse2" } */
+/* { dg-options "-O -mno-sse2 -Wno-psabi" } */
 
 typedef unsigned char __attribute__ ((__vector_size__ (8))) V;
 typedef unsigned long long __attribute__ ((__vector_size__ (16))) W;
-- 
2.25.1



Re: [PATCH] gcc-12: Re-enable split-stack support for GNU/Hurd.

2022-03-17 Thread Svante Signell via Gcc-patches
ping

On Wed, 2022-02-23 at 11:13 +0100, Svante Signell wrote:
> Hello,
> 
> In line of porting the latest build of libgo/go with gcc-12 to GNU/Hurd,
> support
> of split-stack was found to be removed.
>  
> After patching the files in libgo the build of gotools fails:
> go1: error: '-fsplit-stack' currently only supported on GNU/Linux
> go1: error: '-fsplit-stack' is not supported by this compiler configuration
> 
> The attached patch defines OPTION_GLIBC_P(opts) and OPTION_GLIBC that was lost
> in config/gnu.h, needed to enable split-stack support for GNU/Hurd. 
> 
> This problem happened with the latest commit as discussed in the mail thread
> starting with 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588973.html
> .
> 
> The file first doing this check is: (first error: ..)
> src/gcc/common/config/i386/i386-common.cc
> in function:
> static bool ix86_supports_split_stack (bool report,
> struct gcc_options *opts ATTRIBUTE_UNUSED)
> 
> and secondly in:src/gcc/opts.cc: (second error: ...)
> in function:
> void
> finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
> location_t loc)
> 
> The checking logic is in function ix86_supports_split_stack():
> #if defined(TARGET_THREAD_SPLIT_STACK_OFFSET) && defined(OPTION_GLIBC_P)
>   if (!OPTION_GLIBC_P (opts))
> #endif
> {
>   if (report)
> error ("%<-fsplit-stack%> currently only supported on GNU/Linux");
>   return false;
> }
> 
>   bool ret = true;
> 
> In case of GNU/Hurd TARGET_THREAD_SPLIT_STACK_OFFSET is defined as well as
> OPTION_GLIBC_P but OPTION_GLIBC_P(opts) is needed to. The attached patch to
> src/gcc/config/gnu.h creates that definition. For GNU/Hurd, gnu.h is included
> in
> the configure stage:
> Configuring stage 1 in ./gcc
> ...
> Using the following target machine macro files:
> ...
> ../../src/gcc/config/gnu.h
> 
> For a longer history about this bug see:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104290
> 
> Additionally, I would propose the text in gcc/common/config/i386/i386-
> common.cc
> to change from:
> error ("%<-fsplit-stack%> currently only supported on GNU/Linux");
> to:
> error ("%<-fsplit-stack%> currently only supported on GLIBC-based systems");
> 
> Thanks!



Re: [PATCH] doc: Document Solaris D bootstrap requirements [PR 103528]

2022-03-17 Thread Rainer Orth
Hi Iain,

>> I suspected that the 32-bit issue might be due to several stdint types
>> being wrong, which was just fixed on master.  At least the issue seemed
>> similar to PR d/104738.  I'm building a 64-bit trunk gdc as we speak in
>> order to try that as a bootstrap compiler for a 32-bit build.
>
> The issue on 11.x is related to code that tries to synthesize section-start
> symbols, it causes a link fail on m32 - which is a bootstrap breaker.
>
> ..  if that has been fixed for m32 codegenon master then, yes - presumably we
> could build an x86_64 compiler and use that “-m32” to make an i686 bootstrap.

I tried just that: 64-bit-default gdc 12.0.1 as of 20220311 as bootstrap
compiler with -m32 for i386-apple-darwin11.4.2 target: same link
failures, unfortunately.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] tree-optimization/104960 - unsplit edges after late sinking

2022-03-17 Thread Richard Biener via Gcc-patches
Something went wrong when testing the earlier patch to move the
late sinking to before the late phiopt for PR102008.  The following
makes sure to unsplit edges after the late sinking since the split
edges confuse the following phiopt leading to missed optimizations.

I've went for a new pass parameter for this to avoid changing the
CFG after the early sinking pass at this point.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-03-17  Richard Biener  

PR tree-optimization/104960
* passes.def: Add pass parameter to pass_sink_code, mark
last one to unsplit edges.
* tree-ssa-sink.cc (pass_sink_code::set_pass_param): New.
(pass_sink_code::execute): Always execute TODO_cleanup_cfg
when we need to unsplit edges.

* gcc.dg/gimplefe-37.c: Adjust to allow either the true
or false edge to have a forwarder.
---
 gcc/passes.def |  4 ++--
 gcc/testsuite/gcc.dg/gimplefe-37.c |  2 +-
 gcc/tree-ssa-sink.cc   | 13 +++--
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index c8903b4ec16..3e44797b10f 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -259,7 +259,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lim);
   NEXT_PASS (pass_walloca, false);
   NEXT_PASS (pass_pre);
-  NEXT_PASS (pass_sink_code);
+  NEXT_PASS (pass_sink_code, false /* unsplit edges */);
   NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
@@ -349,7 +349,7 @@ along with GCC; see the file COPYING3.  If not see
   /* After late CD DCE we rewrite no longer addressed locals into SSA
 form if possible.  */
   NEXT_PASS (pass_forwprop);
-  NEXT_PASS (pass_sink_code);
+  NEXT_PASS (pass_sink_code, true /* unsplit edges */);
   NEXT_PASS (pass_phiopt, false /* early_p */);
   NEXT_PASS (pass_fold_builtins);
   NEXT_PASS (pass_optimize_widening_mul);
diff --git a/gcc/testsuite/gcc.dg/gimplefe-37.c 
b/gcc/testsuite/gcc.dg/gimplefe-37.c
index d3ea186d7f9..12f6f64fc2c 100644
--- a/gcc/testsuite/gcc.dg/gimplefe-37.c
+++ b/gcc/testsuite/gcc.dg/gimplefe-37.c
@@ -22,6 +22,6 @@ main (int argc)
 
 
 /* { dg-final { scan-tree-dump-times " \\\[count: 3" 2 "optimized" 
} } */
-/* { dg-final { scan-tree-dump-times " \\\[count: 2" 1 "optimized" 
} } */
+/* { dg-final { scan-tree-dump-times " \\\[count: \[12\]" 1 
"optimized" } } */
 /* { dg-final { scan-tree-dump-times "goto ; \\\[33\\\.33%\\\]" 1 
"optimized" } } */
 /* { dg-final { scan-tree-dump-times "goto ; \\\[66\\\.67%\\\]" 1 
"optimized" } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 66d7ae89e90..1c226406feb 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -822,14 +822,21 @@ class pass_sink_code : public gimple_opt_pass
 {
 public:
   pass_sink_code (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_sink_code, ctxt)
+: gimple_opt_pass (pass_data_sink_code, ctxt), unsplit_edges (false)
   {}
 
   /* opt_pass methods: */
   virtual bool gate (function *) { return flag_tree_sink != 0; }
   virtual unsigned int execute (function *);
   opt_pass *clone (void) { return new pass_sink_code (m_ctxt); }
+  void set_pass_param (unsigned n, bool param)
+{
+  gcc_assert (n == 0);
+  unsplit_edges = param;
+}
 
+private:
+  bool unsplit_edges;
 }; // class pass_sink_code
 
 unsigned int
@@ -837,11 +844,13 @@ pass_sink_code::execute (function *fun)
 {
   loop_optimizer_init (LOOPS_NORMAL);
   split_edges_for_insertion ();
+  /* Arrange for the critical edge splitting to be undone if requested.  */
+  unsigned todo = unsplit_edges ? TODO_cleanup_cfg : 0;
   connect_infinite_loops_to_exit ();
   memset (_stats, 0, sizeof (sink_stats));
   calculate_dominance_info (CDI_DOMINATORS);
   calculate_dominance_info (CDI_POST_DOMINATORS);
-  unsigned todo = sink_code_in_bb (EXIT_BLOCK_PTR_FOR_FN (fun));
+  todo |= sink_code_in_bb (EXIT_BLOCK_PTR_FOR_FN (fun));
   statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
   statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
   free_dominance_info (CDI_POST_DOMINATORS);
-- 
2.34.1


Enhance further testcases to verify Openacc 'kernels' decomposition

2022-03-17 Thread Thomas Schwinge
Hi!

Pushed to master branch commit c43cb355f25dd22133d15819bd6ec03d3d3939fd
"Enhance further testcases to verify Openacc 'kernels' decomposition",
see attached.


Kwok, this ought to resolve some testsuite noise from your upcoming og12
branch.  Please let me know if there are any more such issues.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From c43cb355f25dd22133d15819bd6ec03d3d3939fd Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 16 Mar 2022 14:19:41 +0100
Subject: [PATCH] Enhance further testcases to verify Openacc 'kernels'
 decomposition

	gcc/testsuite/
	* c-c++-common/goacc-gomp/nesting-1.c: Enhance.
	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
	* c-c++-common/goacc/nesting-1.c: Likewise.
	* gcc.dg/goacc/nested-function-1.c: Likewise.
	* gfortran.dg/goacc/common-block-3.f90: Likewise.
	* gfortran.dg/goacc/nested-function-1.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
	Enhance.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
---
 .../c-c++-common/goacc-gomp/nesting-1.c   |  7 ++-
 .../c-c++-common/goacc/kernels-loop-g.c   |  3 ++
 gcc/testsuite/c-c++-common/goacc/nesting-1.c  | 18 +--
 .../gcc.dg/goacc/nested-function-1.c  | 22 
 .../gfortran.dg/goacc/common-block-3.f90  | 18 +--
 .../gfortran.dg/goacc/nested-function-1.f90   | 10 
 .../acc_prof-kernels-1.c  | 19 +--
 .../kernels-loop-g.c  |  3 ++
 .../testsuite/libgomp.oacc-fortran/if-1.f90   | 51 ++-
 9 files changed, 134 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
index 39b92712b31..51c5e359f29 100644
--- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
+++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "--param=openacc-kernels=decompose" }
+
 /* { dg-additional-options "-fopt-info-omp-note" } */
 
 /* { dg-additional-options "--param=openacc-privatization=noisy" }
@@ -21,10 +23,11 @@ void
 f_acc_kernels (void)
 {
 #pragma acc kernels
-  /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } .-1 }
- { dg-note {variable 'i' ought to be adjusted for OpenACC privatization level: 'gang'} "" { target *-*-* } .-2 } */
+  /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } .-1 } */
   {
 int i;
+/* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} "" { target c } .+3 }
+   { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} "" { target c++ } .+1 } */
 #pragma omp atomic write
 i = 0;
   }
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
index 73b469d7061..5bdaa40b02c 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
@@ -1,5 +1,8 @@
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-g" } */
+/*TODO PR100400 { dg-additional-options -fcompare-debug } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/nesting-1.c b/gcc/testsuite/c-c++-common/goacc/nesting-1.c
index 83cbff767a4..8c3d1adc785 100644
--- a/gcc/testsuite/c-c++-common/goacc/nesting-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/nesting-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+
 /* { dg-additional-options "-fopt-info-all-omp" } */
 
 /* { dg-additional-options "--param=openacc-privatization=noisy" } */
@@ -32,11 +34,13 @@ void
 f_acc_kernels (void)
 {
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
-  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
   {
 #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+/* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
 /* { dg-note {variable 'i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
 /* { dg-note {variable 'i' in 'private' 

Enhance further testcases to verify handling of OpenACC privatization level [PR90115]

2022-03-17 Thread Thomas Schwinge
Hi!

On 2021-05-21T21:29:19+0200, I wrote:
> I've pushed "[OpenACC privatization] Largely extend diagnostics and
> corresponding testsuite coverage [PR90115]" to master branch in commit
> 11b8286a83289f5b54e813f14ff56d730c3f3185

Pushed to master branch commit 004fc4f2fc686d3366c9e1a2d8b9183796073866
"Enhance further testcases to verify handling of OpenACC privatization
level [PR90115]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 004fc4f2fc686d3366c9e1a2d8b9183796073866 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 16 Mar 2022 12:15:01 +0100
Subject: [PATCH] Enhance further testcases to verify handling of OpenACC
 privatization level [PR90115]

As originally introduced in commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".

	PR middle-end/90115
	gcc/testsuite/
	* c-c++-common/goacc-gomp/nesting-1.c: Enhance.
	* gfortran.dg/goacc/common-block-3.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Enhance.
	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
---
 .../c-c++-common/goacc-gomp/nesting-1.c   |  9 +-
 .../gfortran.dg/goacc/common-block-3.f90  | 13 ++-
 .../acc_prof-kernels-1.c  | 35 ++--
 .../testsuite/libgomp.oacc-fortran/if-1.f90   | 82 +--
 4 files changed, 68 insertions(+), 71 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
index b0b78374016..39b92712b31 100644
--- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
+++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
@@ -1,14 +1,15 @@
 /* { dg-additional-options "-fopt-info-omp-note" } */
-/* { dg-additional-options "--param=openacc-privatization=noisy" } for
-   testing/documenting aspects of that functionality.  */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
 
 
 void
 f_acc_data (void)
 {
 #pragma acc data
-  /* { dg-note {variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-1 } */
-  /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } .-2 } */
+  /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } .-1 } */
   {
 int i;
 #pragma omp atomic write
diff --git a/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90 b/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
index 5defe2ea85d..9dbfa4cd2f0 100644
--- a/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
@@ -1,5 +1,11 @@
 ! { dg-options "-fopenacc -fdump-tree-omplower" }
 
+! { dg-additional-options "-fopt-info-omp-all" }
+
+! { dg-additional-options "--param=openacc-privatization=noisy" }
+! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+! { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} }
+
 module consts
   integer, parameter :: n = 100
 end module consts
@@ -15,11 +21,14 @@ program main
   common /KERNELS_BLOCK/ x, y, z
 
   c = 1.0
-  !$acc parallel loop copy(/BLOCK/)
+  !$acc parallel loop copy(/BLOCK/) ! { dg-line l1 }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l1 }
+  ! { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { target *-*-* } l1 }
   do i = 1, n
  a(i) = b(i) + c
   end do
-  !$acc kernels
+  !$acc kernels ! { dg-line l2 }
+  ! { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l2 }
   do i = 1, n
  x(i) = y(i) + c
   end do
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index d7f47627438..c82a7edbfa0 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -1,5 +1,21 @@
 /* Test dispatch of events to callbacks.  */
 
+/* { dg-additional-options "-fopt-info-omp-all" }
+   { dg-additional-options "-foffload=-fopt-info-omp-all" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   { 

Re: [PATCH] gimplify: Emit clobbers for TARGET_EXPR_SLOT vars later [PR103984]

2022-03-17 Thread Richard Biener via Gcc-patches
On Wed, 16 Mar 2022, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, we emit a bogus uninitialized warning but
> easily could emit wrong-code for it or similar testcases too.
> The bug is that we emit clobber for a TARGET_EXPR_SLOT too early:
>   D.2499.e = B::qux (); [return slot optimization]
>   D.2516 = 1;
>   try
> {
>   B::B (, );
>   try
> {
>   _2 = baz ();
>   D.2499.f = _2;
>   D.2516 = 0;
>   try
> {
>   try
> {
>   bar ();
> }
>   finally
> {
>   C::~C ();
> }
> }
>   finally
> {
>   D.2499 = {CLOBBER(eol)};
> }
> }
>   finally
> {
>   D.2498 = {CLOBBER(eol)};
> }
> }
>   catch
> {
>   if (D.2516 != 0) goto ; else goto ;
>   :
>   A::~A ();
>   goto ;
>   :
>   :
> }
> The CLOBBER for D.2499 is essentially only emitted on the non-exceptional
> path, if B::B or baz throws, then there is no CLOBBER for it but there
> is a conditional destructor A::~A ().  Now, ehcleanup1
> sink_clobbers optimization assumes that clobbers in the EH cases are
> emitted after last use and so sinks the D.2499 = {CLOBBER(eol)}; later,
> so we then have
>   # _3 = PHI <1(3), 0(9)>
> :
>   D.2499 ={v} {CLOBBER(eol)};
>   D.2498 ={v} {CLOBBER(eol)};
>   if (_3 != 0)
> goto ; [INV]
>   else
> goto ; [INV]
> 
>:
>   _35 = D.2499.a;
>   if ( != _35)
> where that _35 = D.2499.a comes from inline expansion of the A::~A dtor,
> and that is a load from a clobbered memory.
> 
> Now, what the gimplifier sees in this case is a CLEANUP_POINT_EXPR with
> somewhere inside of it a TARGET_EXPR for D.2499 (with the C::~C ()
> cleanup) which in its TARGET_EXPR_INITIAL has another TARGET_EXPR for
> D.2516 bool flag which has CLEANUP_EH_ONLY which performs that conditional
> A::~A () call.
> The following patch ensures that CLOBBERs (and asan poisoning) are emitted
> after even those gimple_push_cleanup pushed cleanups from within the
> TARGET_EXPR_INITIAL gimplification (i.e. the last point where the slot could
> be in theory used).  In my first version of the patch I've done it by just
> moving the
>   /* Add a clobber for the temporary going out of scope, like
>  gimplify_bind_expr.  */
>   if (gimplify_ctxp->in_cleanup_point_expr
>   && needs_to_live_in_memory (temp))
> {
> ...
> }
> block earlier in gimplify_target_expr, but that regressed a couple of tests
> where temp is marked TREE_ADDRESSABLE only during (well, very early during
> that) the gimplification of TARGET_EXPR_INITIAL, so we didn't emit e.g. on
> pr80032.C or stack2.C tests any clobbers for the slots and thus stack slot
> reuse wasn't performed.
> So that we don't regress those tests, this patch gimplifies
> TARGET_EXPR_INITIAL as before, but doesn't emit it directly into pre_p,
> emits it into a temporary sequence.  Then emits the CLOBBER cleanup
> into pre_p, then asan poisoning if needed, then appends the
> TARGET_EXPR_INITIAL temporary sequence and finally adds TARGET_EXPR_CLEANUP
> gimple_push_cleanup.  The earlier a GIMPLE_WCE appears in the sequence, the
> outer try/finally or try/catch it is.
> So, with this patch the part of the testcase in gimple dump cited above
> looks instead like:
>   try
> {
>   D.2499.e = B::qux (); [return slot optimization]
>   D.2516 = 1;
>   try
> {
>   try
> {
>   B::B (, );
>   _2 = baz ();
>   D.2499.f = _2;
>   D.2516 = 0;
>   try
> {
>   bar ();
> }
>   finally
> {
>   C::~C ();
> }
> }
>   finally
> {
>   D.2498 = {CLOBBER(eol)};
> }
> }
>   catch
> {
>   if (D.2516 != 0) goto ; else goto ;
>   :
>   A::~A ();
>   goto ;
>   :
>   :
> }
> }
>   finally
> {
>   D.2499 = {CLOBBER(eol)};
> }
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

LGTM.

Thanks,

Re: [PATCH] [i386] Add extra cost for unsigned_load which may have stall forward issue.

2022-03-17 Thread Hongtao Liu via Gcc-patches
On Wed, Mar 16, 2022 at 5:54 PM Richard Biener via Gcc-patches
 wrote:
>
> On Wed, Mar 16, 2022 at 3:19 AM liuhongt  wrote:
> >
> > This patch only handle pure-slp for by-value passed parameter which
> > has nothing to do with IPA but psABI. For by-reference passed
> > parameter IPA is required.
> >
> > The patch is aggressive in determining STLF failure, any
> > unaligned_load for parm_decl passed by stack is thought to have STLF
> > stall issue. It could lose some perf where there's no such issue(1
> > vector_load vs n scalar_load + CTOR).
> >
> > According to microbenchmark in PR, cost of STLF failure is generally
> > between 8 scalar_loads and 16 scalar loads on most latest Intel/AMD
> > processors.
> >
> > gcc/ChangeLog:
> >
> > PR target/101908
> > * config/i386/i386.cc (ix86_load_maybe_stfs_p): New.
> > (ix86_vector_costs::add_stmt_cost): Add extra cost for
> > unsigned_load which may have store forwarding stall issue.
> > * config/i386/i386.h (processor_costs): Add new member
> > stfs.
> > * config/i386/x86-tune-costs.h (i386_size_cost): Initialize
> > stfs.
> > (i386_cost, i486_cost, pentium_cost, lakemont_cost,
> > pentiumpro_cost, geode_cost, k6_cost, athlon_cost, k8_cost,
> > amdfam10_cost, bdver_cost, znver1_cost, znver2_cost,
> > znver3_cost, skylake_cost, icelake_cost, alderlake_cost,
> > btver1_cost, btver2_cost, pentium4_cost, nocano_cost,
> > atom_cost, slm_cost, tremont_cost, intel_cost, generic_cost,
> > core_cost): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr101908-1.c: New test.
> > * gcc.target/i386/pr101908-2.c: New test.
> > * gcc.target/i386/pr101908-3.c: New test.
> > * gcc.target/i386/pr101908-v16hi.c: New test.
> > * gcc.target/i386/pr101908-v16qi.c: New test.
> > * gcc.target/i386/pr101908-v16sf.c: New test.
> > * gcc.target/i386/pr101908-v16si.c: New test.
> > * gcc.target/i386/pr101908-v2df.c: New test.
> > * gcc.target/i386/pr101908-v2di.c: New test.
> > * gcc.target/i386/pr101908-v2hi.c: New test.
> > * gcc.target/i386/pr101908-v2qi.c: New test.
> > * gcc.target/i386/pr101908-v2sf.c: New test.
> > * gcc.target/i386/pr101908-v2si.c: New test.
> > * gcc.target/i386/pr101908-v4df.c: New test.
> > * gcc.target/i386/pr101908-v4di.c: New test.
> > * gcc.target/i386/pr101908-v4hi.c: New test.
> > * gcc.target/i386/pr101908-v4qi.c: New test.
> > * gcc.target/i386/pr101908-v4sf.c: New test.
> > * gcc.target/i386/pr101908-v4si.c: New test.
> > * gcc.target/i386/pr101908-v8df-adl.c: New test.
> > * gcc.target/i386/pr101908-v8df.c: New test.
> > * gcc.target/i386/pr101908-v8di-adl.c: New test.
> > * gcc.target/i386/pr101908-v8di.c: New test.
> > * gcc.target/i386/pr101908-v8hi-adl.c: New test.
> > * gcc.target/i386/pr101908-v8hi.c: New test.
> > * gcc.target/i386/pr101908-v8qi-adl.c: New test.
> > * gcc.target/i386/pr101908-v8qi.c: New test.
> > * gcc.target/i386/pr101908-v8sf-adl.c: New test.
> > * gcc.target/i386/pr101908-v8sf.c: New test.
> > * gcc.target/i386/pr101908-v8si-adl.c: New test.
> > * gcc.target/i386/pr101908-v8si.c: New test.
> > ---
> >  gcc/config/i386/i386.cc   | 51 +++
> >  gcc/config/i386/i386.h|  1 +
> >  gcc/config/i386/x86-tune-costs.h  | 28 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-1.c| 12 +++
> >  gcc/testsuite/gcc.target/i386/pr101908-2.c| 12 +++
> >  gcc/testsuite/gcc.target/i386/pr101908-3.c| 90 +++
> >  .../gcc.target/i386/pr101908-v16hi.c  |  6 ++
> >  .../gcc.target/i386/pr101908-v16qi.c  | 30 +++
> >  .../gcc.target/i386/pr101908-v16sf.c  |  6 ++
> >  .../gcc.target/i386/pr101908-v16si.c  |  6 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v2df.c |  6 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v2di.c |  7 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v2hi.c |  6 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v2qi.c | 16 
> >  gcc/testsuite/gcc.target/i386/pr101908-v2sf.c |  6 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v2si.c |  6 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v4df.c |  6 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v4di.c |  7 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v4hi.c |  6 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v4qi.c | 18 
> >  gcc/testsuite/gcc.target/i386/pr101908-v4sf.c |  6 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v4si.c |  6 ++
> >  .../gcc.target/i386/pr101908-v8df-adl.c   |  6 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v8df.c |  6 ++
> >  .../gcc.target/i386/pr101908-v8di-adl.c   |  7 ++
> >  gcc/testsuite/gcc.target/i386/pr101908-v8di.c |  7 ++
> >