[PATCH] tree-optimization/106293 - fix testcase

2023-01-10 Thread Richard Biener via Gcc-patches
The following removes a problematic initializer which causes
excess diagnostics with -m32 and isn't actually required.

Tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106293
* gcc.dg/tree-ssa/ssa-dse-46.c: Remove long initializer.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-46.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-46.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-46.c
index 68b36433ffc..c98038a4d2d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-46.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-46.c
@@ -2,7 +2,7 @@
 /* { dg-options "-O2 -fdump-tree-dse1" } */
 
 int a;
-static long b = 4073709551612, d;
+static long b, d;
 short c;
 void foo();
 char e(int **f) {
-- 
2.35.3


Re: [committed] libstdc++: Fix deadlock in debug iterator increment [PR108288]

2023-01-10 Thread François Dumont via Gcc-patches

Thanks for fixing this.

Here is the extension of the fix to all post-increment/decrement 
operators we have on _GLIBCXX_DEBUG iterator.


I prefer to restore somehow previous implementation to continue to have 
_GLIBCXX_DEBUG post operators implemented in terms of normal post operators.


I also plan to remove the debug check in the _Safe_iterator constructor 
from base iterator to avoid the redundant check we have now. But I need 
to make sure first that we are never calling it with an unchecked base 
iterator. And it might not be the right moment to do such a change.


    libstdc++: Fix deadlock in debug local_iterator increment [PR108288]

    Complete fix on all _Safe_iterator post-increment and 
post-decrement implementations

    and on _Safe_local_iterator.

    libstdc++-v3/ChangeLog:

    * include/debug/safe_iterator.h 
(_Safe_iterator<>::operator++(int)): Extend deadlock fix to

    other iterator category.
    (_Safe_iterator<>::operator--(int)): Likewise.
    * include/debug/safe_local_iterator.h 
(_Safe_local_iterator<>::operator++(int)): Fix deadlock.
    * testsuite/util/debug/unordered_checks.h 
(invalid_local_iterator_pre_increment): New.

    (invalid_local_iterator_post_increment): New.
    * 
testsuite/23_containers/unordered_map/debug/invalid_local_iterator_post_increment_neg.cc:

    New test.
    * 
testsuite/23_containers/unordered_map/debug/invalid_local_iterator_pre_increment_neg.cc:

    New test.

Tested under Linux x86_64.

Ok to commit ?

François

On 06/01/23 12:54, Jonathan Wakely via Libstdc++ wrote:

Tested x86_64-linux. Pushed to trunk.

I think we should backport this too, after some soak time on trunk.

-- >8 --

With -fno-elide-constructors the debug iterator post-increment and
post-decrement operators are susceptible to deadlock. They take a mutex
lock and then return a temporary, which also attempts to take a lock to
attach itself to the sequence. If the return value and *this happen to
Note that the chosen mutex depends on the sequence so there is no need 
for conditional sentense here, it will necessarily be the same mutex.

collide and use the same mutex from the pool, then you get a deadlock
trying to lock a mutex that is already held by the current thread.
diff --git a/libstdc++-v3/include/debug/safe_iterator.h b/libstdc++-v3/include/debug/safe_iterator.h
index f9068eaf8d6..e7c96d1af27 100644
--- a/libstdc++-v3/include/debug/safe_iterator.h
+++ b/libstdc++-v3/include/debug/safe_iterator.h
@@ -129,14 +129,6 @@ namespace __gnu_debug
 	typename _Sequence::_Base::iterator,
 	typename _Sequence::_Base::const_iterator>::__type _OtherIterator;
 
-  struct _Attach_single
-  { };
-
-  _Safe_iterator(_Iterator __i, _Safe_sequence_base* __seq, _Attach_single)
-  _GLIBCXX_NOEXCEPT
-  : _Iter_base(__i)
-  { _M_attach_single(__seq); }
-
 public:
   typedef _Iterator	iterator_type;
   typedef typename _Traits::iterator_category	iterator_category;
@@ -347,8 +339,13 @@ namespace __gnu_debug
 	_GLIBCXX_DEBUG_VERIFY(this->_M_incrementable(),
 			  _M_message(__msg_bad_inc)
 			  ._M_iterator(*this, "this"));
-	__gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
-	return _Safe_iterator(base()++, this->_M_sequence, _Attach_single());
+	_Iter_base __cur;
+	{
+	  __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
+	  __cur = base()++;
+	}
+
+	return _Safe_iterator(__cur, this->_M_sequence);
   }
 
   // -- Utilities --
@@ -520,12 +517,6 @@ namespace __gnu_debug
 
 protected:
   typedef typename _Safe_base::_OtherIterator _OtherIterator;
-  typedef typename _Safe_base::_Attach_single _Attach_single;
-
-  _Safe_iterator(_Iterator __i, _Safe_sequence_base* __seq, _Attach_single)
-  _GLIBCXX_NOEXCEPT
-  : _Safe_base(__i, __seq, _Attach_single())
-  { }
 
 public:
   /// @post the iterator is singular and unattached
@@ -609,9 +600,13 @@ namespace __gnu_debug
 	_GLIBCXX_DEBUG_VERIFY(this->_M_incrementable(),
 			  _M_message(__msg_bad_inc)
 			  ._M_iterator(*this, "this"));
-	__gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
-	return _Safe_iterator(this->base()++, this->_M_sequence,
-			  _Attach_single());
+	_Iter_base __cur;
+	{
+	  __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
+	  __cur = this->base()++;
+	}
+
+	return _Safe_iterator(__cur, this->_M_sequence);
   }
 
   // -- Bidirectional iterator requirements --
@@ -640,9 +635,13 @@ namespace __gnu_debug
 	_GLIBCXX_DEBUG_VERIFY(this->_M_decrementable(),
 			  _M_message(__msg_bad_dec)
 			  ._M_iterator(*this, "this"));
-	__gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
-	return _Safe_iterator(this->base()--, this->_M_sequence,
-			  _Attach_single());
+	_Iter_base __cur;
+	{
+	  __gnu_cxx::__scoped_lock __l(this->_M_get_mutex());
+	  __cur = this->base()--;
+	}
+
+	return _Safe_iterator(__cur, 

Re: [PATCH] xtensa: Make instruction cost estimation for size more accurate

2023-01-10 Thread Max Filippov via Gcc-patches
On Mon, Jan 9, 2023 at 7:34 PM Takayuki 'January June' Suwa
 wrote:
>
> Until now, we applied COSTS_N_INSNS() (multiplying by 4) after dividing
> the instruction length by 3, so we couldn't express the difference less
> than modulo 3 in insn cost for size (e.g. 11 Bytes and 12 bytes cost the
> same).
>
> This patch fixes that.
>
> ;; 2 bytes
> addi.n  a2, a2, -1  ; cost 3
>
> ;; 3 bytes
> addmi   a2, a2, 1024; cost 4
>
> ;; 4 bytes
> movi.n  a3, 80  ; cost 5
> bnez.n  a2, a3, .L4
>
> ;; 5 bytes
> srlia2, a3, 1   ; cost 7
> add.n   a2, a2, a2
>
> ;; 6 bytes
> ssai8   ; cost 8
> src a4, a2, a3
>
> :: 3 + 4 bytes
> l32ra2, .L5 ; cost 9
>
> ;; 11 bytes ; cost 15
> ;; 12 bytes ; cost 16
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.cc (xtensa_insn_cost):
> Let insn cost for size be obtained by applying COSTS_N_INSNS()
> to instruction length and then dividing by 3.
> ---
>  gcc/config/xtensa/xtensa.cc | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


[PATCH] ifcvt.cc: Prevent excessive if-conversion for conditional moves

2023-01-10 Thread Takayuki 'January June' Suwa via Gcc-patches
Currently, cond_move_process_if_block() does the conversion without
balancing the cost of the converted sequence with the original one, but
this should be checked by calling targetm.noce_conversion_profitable_p().

Doing so allows us to provide a way based on the target-specific cost
estimate, to prevent unwanted size growth due to excessive conditional
moves on optimizing for size.

On optimizing for speed, default_noce_conversion_profitable_p() allows
plenty of headroom, so this patch has little impact.

Also, if the target-specific cost estimate is accurate or allows for
margins, the impact should be similarly small.

gcc/ChangeLog:

* ifcvt.cc (cond_move_process_if_block):
Consider the result of targetm.noce_conversion_profitable_p()
when replacing the original sequence with the converted one.
---
 gcc/ifcvt.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 008796838f7..a896e14bb3c 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -4350,7 +4350,7 @@ cond_move_process_if_block (struct noce_if_info *if_info)
   goto done;
 }
   seq = end_ifcvt_sequence (if_info);
-  if (!seq)
+  if (!seq || !targetm.noce_conversion_profitable_p (seq, if_info))
 goto done;
 
   loc_insn = first_active_insn (then_bb);
-- 
2.30.2


Re: [PATCH] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd

2023-01-10 Thread Jiufu Guo via Gcc-patches
Hi Segher,

Thanks for your help to review!

Segher Boessenkool  writes:

> Hi!
>
> On Tue, Jan 10, 2023 at 09:45:27PM +0800, Jiufu Guo wrote:
>> As mentioned in PR108338, on p9, we could use mtvsrws to implement
>> the conversion from SI#0 to SF (or lowpart DI to SF).  And we find
>> we can also enhance the conversion from highpart DI to SF (as the
>> case in this patch).
>> 
>> This patch enhances these conversions accordingly.
>
> Those aren't conversions, they are just bitcasts, reinterpreting the
> same datum as something else, but keeping all bits the same.
Yeap, bitcast is accurate. 

>
> The mtvsrws insn moves a SImode value from a GPR to a VSR, splatting it
> in all four lanes.  You'll typically want a xscvspdpn or similar after
> that -- but with the value splat in all lanes it will surely be in the
> lane that later instruction needs the data to be in :-)
Right.  xscvspdpn is needed typically.

>
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -158,6 +158,7 @@ (define_c_enum "unspec"
>> UNSPEC_HASHCHK
>> UNSPEC_XXSPLTIDP_CONST
>> UNSPEC_XXSPLTIW_CONST
>> +   UNSPEC_P9V_MTVSRWS
>>])
>
> Is it hard to decribe this without unspec?  Unspecs prevent the compiler
> from optimising things (unless you very carefully implement all of that
> manually -- but if you just write it as plain RTL most things fall into
> place automatically).
>
> There are many existing patterns that needlessly and detrimentally use
> unspecs, but we should improve on that, not make it worse :-)
Thanks for pointing out this!
I also notice this.  Some patterns for bitcast int->float are also using
unspecs. 
For example, in expand pass:
(set (reg:SF 117 [  ])
(unspec:SF [
(reg:SI 121)
] UNSPEC_SF_FROM_SI)) 
it is "movsf_from_si" which is generated for "BIT_FIELD_REF ".  "movsf_from_si2" is also using unspec. And clobbers is used in
"movsf_from_si".
While they may be not needed for some cases.

I'm wondering if "TARGET_NO_SF_SUBREG" is accurated for those patterns.
/* Whether we should avoid (SUBREG:SI (REG:SF) and (SUBREG:SF (REG:SI).  */
#define TARGET_NO_SF_SUBREG TARGET_DIRECT_MOVE_64BIT
#define TARGET_ALLOW_SF_SUBREG  (!TARGET_DIRECT_MOVE_64BIT)

We may allow (SUBREG:SF (REG:SI) at early passes, and keep it untill
later passes (RA/reload, or splitter) at least.


In this patch, to avoid risk and make it straightforward, I define a new
insn 'mtvsrws' with unspec.

I would try another ways to avoid using unspec. Maybe keep to use subreg
pattern: "(set (reg:SF 117) (subreg:SF (reg/v:SI 118) 0))"?

>
>> @@ -8203,10 +8204,19 @@ (define_insn_and_split "movsf_from_si"
>>rtx op2 = operands[2];
>>rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>>  
>> -  /* Move SF value to upper 32-bits for xscvspdpn.  */
>> -  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
>> -  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
>> -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
>> +  if (TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE)
>> +{
>> +   emit_insn (gen_p9_mtvsrws (op0, op1_di));
>> +   emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
>> +}
>
> This does not require TARGET_POWERPC64?
Oh, accurately speaking: 'mtvsrws' is using 32bit only. :)
>
> P9 implies we have direct moves (those are implied by P8 already).  We
> also do not need to test for vector I think?

Adding TARGET_P9_VECTOR, because I think, 'mtvsrws' operates on vector
register. (Or just treat them as FP regiters? But I feel it seems more
accurate vector registers.)

We have TARGET_DIRECT_MOVE_128 defined for P9:
"(TARGET_P9_VECTOR && TARGET_DIRECT_MOVE && TARGET_POWERPC64)"
like "TARGET_DIRECT_MOVE_64BIT" for P8.
So, we still need TARGET_DIRECT_MOVE, right?

>
>> +(define_code_iterator any_rshift [ashiftrt lshiftrt])
>> +
>>  ;; For extracting high part element from DImode register like:
>>  ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>>  ;; split it before reload with "and mask" to avoid generating shift right
>>  ;; 32 bit then shift left 32 bit.
>> -(define_insn_and_split "movsf_from_si2"
>> +(define_insn_and_split "movsf_from_si2_"
>>[(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>>  (unspec:SF
>>   [(subreg:SI
>> -   (ashiftrt:DI
>> +   (any_rshift:DI
>>  (match_operand:DI 1 "input_operand" "r")
>>  (const_int 32))
>> 0)]
>
> Hrm.  You can write this with just a subreg, no shift is needed at all.
> Can you please try that instead?  That is nastiness for endianness, but
> that is still preferable over introducing new complexities like this.
Currently, this define_insn_and_split would be used to combine
"shift;subreg"; because "highpart subreg DI->SF" is expanded to
"rightshift:DI 32 ; lowpart subreg:DI->SI; SI#0->SF". It seems, we are
not in favor of generating highpart subreg. :)

While I agree with you: this is just a subreg 

[r13-5092 Regression] FAIL: gcc.dg/tree-ssa/ssa-dse-46.c (test for excess errors) on Linux/x86_64

2023-01-10 Thread Jiang, Haochen via Gcc-patches
Hi all,



This is the bisect result for the latest regression which fail to send to 
mailing list.



It seems that the mail command in s-nail went down after my machine got 
upgraded, still investigating why.



On Linux/x86_64,



4e0b504f26f78ff02e80ad98ebbf8ded3aa6ffa1 is the first bad commit

commit 4e0b504f26f78ff02e80ad98ebbf8ded3aa6ffa1

Author: Richard Biener mailto:rguent...@suse.de>>

Date:   Tue Jan 10 13:48:51 2023 +0100



tree-optimization/106293 - missed DSE with virtual LC PHI



caused



FAIL: gcc.dg/tree-ssa/ssa-dse-46.c (test for excess errors)



with GCC configured with



../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-5092/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap



To reproduce:



$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-dse-46.c 
--target_board='unix{-m32}'"

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/ssa-dse-46.c 
--target_board='unix{-m32\ -march=cascadelake}'"



BRs,

Haochen


Re: [PATCH] libsanitizer/mips: always build with largefile support

2023-01-10 Thread Hans-Peter Nilsson
On Fri, 6 Jan 2023, YunQiang Su wrote:

> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 is always used for mips
> when build libsanitizer in LLVM. Thus
>FIRST_32_SECOND_64((_MIPS_SIM == _ABIN32) ? 176 : 160, 216);
> instead of
>FIRST_32_SECOND_64((_MIPS_SIM == _ABIN32) ? 160 : 144, 216);
> in sanitizer_platform_limits_posix.h.
> 
> To keep sync with LLVM and to make the code simple, we use the
> largefile options always.
> 
> libsanitizer/
>   * configure.ac: set -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
> always for mips*.
>   * configure: Regenerate.

Hm, yes, that might be the most pragmatic way to solve the mips
stat-size issue...  But shouldn't then largefile-options also be 
forced when libsanitizer is *used*?  IOW, mips*-linux 
gcc-options be tweaked to include -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64 conditional on sanitizer-options?

brgds, H-P


Re: [PATCH v2] bpf: correct bpf_print_operand for floats [PR108293]

2023-01-10 Thread Jose E. Marchesi via Gcc-patches


> Hi Jose,
>
> As we discussed on IRC, since we don't currently define
> TARGET_SUPPORTS_WIDE_INT it is safer to keep the handling for VOIDmode
> CONST_DOUBLEs. My current understanding is that it may be needed if the
> host is a 32-bit platform.
>
> I also added a gcc_unreachable () as you pointed out. V2 below.
> Tested with bpf-unknown-none on x86_64 host, no known regressions.
>
> WDYT?

OK for master.
Thanks!

>
> Thanks,
> David
>
> ---
>
> [Changes from v1:
>  - Keep handling for VOIDmode CONST_DOUBLE, just in case.
>  - Add a gcc_unreachable () if `op` is none of VOIDmode, SFmode,
>nor DFmode. ]
>
> The existing logic in bpf_print_operand was only correct for integral
> CONST_DOUBLEs, and emitted garbage for floating point modes. Fix it so
> floating point mode operands are correctly handled.
>
>   PR target/108293
>
> gcc/
>
>   * config/bpf/bpf.cc (bpf_print_operand): Correct handling for
>   floating point modes.
>
> gcc/testsuite/
>
>   * gcc.target/bpf/double-1.c: New test.
>   * gcc.target/bpf/double-2.c: New test.
>   * gcc.target/bpf/float-1.c: New test.
> ---
>  gcc/config/bpf/bpf.cc   | 34 -
>  gcc/testsuite/gcc.target/bpf/double-1.c | 12 +
>  gcc/testsuite/gcc.target/bpf/double-2.c | 12 +
>  gcc/testsuite/gcc.target/bpf/float-1.c  | 12 +
>  4 files changed, 64 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/bpf/double-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/double-2.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/float-1.c
>
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index 2aeaeaf309b..576a1fe8eab 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -880,13 +880,35 @@ bpf_print_operand (FILE *file, rtx op, int code 
> ATTRIBUTE_UNUSED)
>output_address (GET_MODE (op), XEXP (op, 0));
>break;
>  case CONST_DOUBLE:
> -  if (CONST_DOUBLE_HIGH (op))
> - fprintf (file, HOST_WIDE_INT_PRINT_DOUBLE_HEX,
> -  CONST_DOUBLE_HIGH (op), CONST_DOUBLE_LOW (op));
> -  else if (CONST_DOUBLE_LOW (op) < 0)
> - fprintf (file, HOST_WIDE_INT_PRINT_HEX, CONST_DOUBLE_LOW (op));
> +  if (GET_MODE (op) == VOIDmode)
> + {
> +   if (CONST_DOUBLE_HIGH (op))
> + fprintf (file, HOST_WIDE_INT_PRINT_DOUBLE_HEX,
> +  CONST_DOUBLE_HIGH (op), CONST_DOUBLE_LOW (op));
> +   else if (CONST_DOUBLE_LOW (op) < 0)
> + fprintf (file, HOST_WIDE_INT_PRINT_HEX, CONST_DOUBLE_LOW (op));
> +   else
> + fprintf (file, HOST_WIDE_INT_PRINT_DEC, CONST_DOUBLE_LOW (op));
> + }
>else
> - fprintf (file, HOST_WIDE_INT_PRINT_DEC, CONST_DOUBLE_LOW (op));
> + {
> +   long vals[2];
> +   real_to_target (vals, CONST_DOUBLE_REAL_VALUE (op), GET_MODE (op));
> +   vals[0] &= 0x;
> +   vals[1] &= 0x;
> +   if (GET_MODE (op) == SFmode)
> + fprintf (file, "0x%08lx", vals[0]);
> +   else if (GET_MODE (op) == DFmode)
> + {
> +   /* Note: real_to_target puts vals in target word order.  */
> +   if (WORDS_BIG_ENDIAN)
> + fprintf (file, "0x%08lx%08lx", vals[0], vals[1]);
> +   else
> + fprintf (file, "0x%08lx%08lx", vals[1], vals[0]);
> + }
> +   else
> + gcc_unreachable ();
> + }
>break;
>  default:
>output_addr_const (file, op);
> diff --git a/gcc/testsuite/gcc.target/bpf/double-1.c 
> b/gcc/testsuite/gcc.target/bpf/double-1.c
> new file mode 100644
> index 000..200f1bd18f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/double-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mlittle-endian" } */
> +
> +double f;
> +double a() { f = 1.0; return 1.0; }
> +double b() { f = 2.0; return 2.0; }
> +double c() { f = 2.0; return 3.0; }
> +double d() { f = 3.0; return 3.0; }
> +
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x3ff0" 2 } } */
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x4000" 3 } } */
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x4008" 3 } } */
> diff --git a/gcc/testsuite/gcc.target/bpf/double-2.c 
> b/gcc/testsuite/gcc.target/bpf/double-2.c
> new file mode 100644
> index 000..d04ddd0c575
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/double-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mbig-endian" } */
> +
> +double f;
> +double a() { f = 1.0; return 1.0; }
> +double b() { f = 2.0; return 2.0; }
> +double c() { f = 2.0; return 3.0; }
> +double d() { f = 3.0; return 3.0; }
> +
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x3ff0" 2 } } */
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x4000" 3 } } */
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x4008" 3 } } */
> diff --git a/gcc/testsuite/gcc.target/bpf/float-1.c 
> 

Re: [PATCH v3 17/19] modula2 front end: dejagnu expect library scripts

2023-01-10 Thread Gaius Mulley via Gcc-patches
Jason Merrill  writes:

> On 12/6/22 09:47, Gaius Mulley via Gcc-patches wrote:
>> Here are the dejagnu expect library scripts for the gm2
>> testsuite.
>
> A couple of weeks ago I noticed on a testrun that the modula tests
> didn't seem to be timing out properly, so I made this change.  It
> looks like they didn't run at all in the bootstrap/test I did just
> now, so I don't know if this change is actually helpful, but here it
> is if you think it makes sense:

awesome, many thanks for the patch - it certainly looks as if the
timeout library was absent from lib/gm2.exp (and also gm2-torture.exp).
I've also applied this fix to gm2-torture.exp and will git commit/git
push your patch

regards,
Gaius


[PATCH] Fortran: frontend passes do_subscript leaks gmp memory [PR97345]

2023-01-10 Thread Harald Anlauf via Gcc-patches
Dear all,

the attached obvious patch fixes a memory leak with gmp variables
that are set but apparently (=valgrind) never cleared.  No new
testcase as the current testsuite gives enough coverage
(confirmed by trying a faulty version...) and as per discussion
with Steve (see PR).

Committed after regtesting on x86_64-pc-linux-gnu as:
r13-5095-gfec9fc1a17ec44461cee841513f1b6b8ad680fe4

Thanks,
Harald

From fec9fc1a17ec44461cee841513f1b6b8ad680fe4 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 10 Jan 2023 22:41:17 +0100
Subject: [PATCH] Fortran: frontend passes do_subscript leaks gmp memory
 [PR97345]

gcc/fortran/ChangeLog:

	PR fortran/97345
	* frontend-passes.cc (do_subscript): Clear used gmp variables.
---
 gcc/fortran/frontend-passes.cc | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc
index 612c12d233d..db2b98290d6 100644
--- a/gcc/fortran/frontend-passes.cc
+++ b/gcc/fortran/frontend-passes.cc
@@ -2892,7 +2892,12 @@ do_subscript (gfc_expr **e)

 		  cmp = mpz_cmp (do_end, do_start);
 		  if ((sgn > 0 && cmp < 0) || (sgn < 0 && cmp > 0))
-		break;
+		{
+		  mpz_clear (do_start);
+		  mpz_clear (do_end);
+		  mpz_clear (do_step);
+		  break;
+		}
 		}

 	  /* May have to correct the end value if the step does not equal
@@ -2965,6 +2970,12 @@ do_subscript (gfc_expr **e)
 		  mpz_clear (val);
 		}
 		}
+
+	  if (have_do_start)
+		mpz_clear (do_start);
+	  if (have_do_end)
+		mpz_clear (do_end);
+	  mpz_clear (do_step);
 	}
 	}
 }
--
2.35.3



Re: [PATCH] longlong.h: Do no use asm input cast for clang

2023-01-10 Thread Joseph Myers
On Tue, 10 Jan 2023, Adhemerval Zanella Netto via Gcc-patches wrote:

> That's my original intention [1], but Joseph stated that GCC is the upstream
> source of this file.  Joseph, would you be ok for a similar patch to glibc
> since gcc is reluctant to accept it?

I don't think it's a good idea for the copies to diverge.  I also think 
the file is more heavily used in GCC (as part of the libgcc sources, 
effectively) than in glibc and so it's best to use GCC as the upstream for 
this shared file.

Ideally maybe most of the macros in this file would be replaced by 
built-in functions (that are guaranteed to expand inline rather than 
possibly circularly calling a libgcc function defined using the same 
macro), so that the inline asm could be avoided (when building libgcc, or 
when building glibc with a new-enough compiler).  But that would be a 
substantial project.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH v2] bpf: correct bpf_print_operand for floats [PR108293]

2023-01-10 Thread David Faust via Gcc-patches
Hi Jose,

As we discussed on IRC, since we don't currently define
TARGET_SUPPORTS_WIDE_INT it is safer to keep the handling for VOIDmode
CONST_DOUBLEs. My current understanding is that it may be needed if the
host is a 32-bit platform.

I also added a gcc_unreachable () as you pointed out. V2 below.
Tested with bpf-unknown-none on x86_64 host, no known regressions.

WDYT?

Thanks,
David

---

[Changes from v1:
 - Keep handling for VOIDmode CONST_DOUBLE, just in case.
 - Add a gcc_unreachable () if `op` is none of VOIDmode, SFmode,
   nor DFmode. ]

The existing logic in bpf_print_operand was only correct for integral
CONST_DOUBLEs, and emitted garbage for floating point modes. Fix it so
floating point mode operands are correctly handled.

PR target/108293

gcc/

* config/bpf/bpf.cc (bpf_print_operand): Correct handling for
floating point modes.

gcc/testsuite/

* gcc.target/bpf/double-1.c: New test.
* gcc.target/bpf/double-2.c: New test.
* gcc.target/bpf/float-1.c: New test.
---
 gcc/config/bpf/bpf.cc   | 34 -
 gcc/testsuite/gcc.target/bpf/double-1.c | 12 +
 gcc/testsuite/gcc.target/bpf/double-2.c | 12 +
 gcc/testsuite/gcc.target/bpf/float-1.c  | 12 +
 4 files changed, 64 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/double-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/double-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/float-1.c

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 2aeaeaf309b..576a1fe8eab 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -880,13 +880,35 @@ bpf_print_operand (FILE *file, rtx op, int code 
ATTRIBUTE_UNUSED)
   output_address (GET_MODE (op), XEXP (op, 0));
   break;
 case CONST_DOUBLE:
-  if (CONST_DOUBLE_HIGH (op))
-   fprintf (file, HOST_WIDE_INT_PRINT_DOUBLE_HEX,
-CONST_DOUBLE_HIGH (op), CONST_DOUBLE_LOW (op));
-  else if (CONST_DOUBLE_LOW (op) < 0)
-   fprintf (file, HOST_WIDE_INT_PRINT_HEX, CONST_DOUBLE_LOW (op));
+  if (GET_MODE (op) == VOIDmode)
+   {
+ if (CONST_DOUBLE_HIGH (op))
+   fprintf (file, HOST_WIDE_INT_PRINT_DOUBLE_HEX,
+CONST_DOUBLE_HIGH (op), CONST_DOUBLE_LOW (op));
+ else if (CONST_DOUBLE_LOW (op) < 0)
+   fprintf (file, HOST_WIDE_INT_PRINT_HEX, CONST_DOUBLE_LOW (op));
+ else
+   fprintf (file, HOST_WIDE_INT_PRINT_DEC, CONST_DOUBLE_LOW (op));
+   }
   else
-   fprintf (file, HOST_WIDE_INT_PRINT_DEC, CONST_DOUBLE_LOW (op));
+   {
+ long vals[2];
+ real_to_target (vals, CONST_DOUBLE_REAL_VALUE (op), GET_MODE (op));
+ vals[0] &= 0x;
+ vals[1] &= 0x;
+ if (GET_MODE (op) == SFmode)
+   fprintf (file, "0x%08lx", vals[0]);
+ else if (GET_MODE (op) == DFmode)
+   {
+ /* Note: real_to_target puts vals in target word order.  */
+ if (WORDS_BIG_ENDIAN)
+   fprintf (file, "0x%08lx%08lx", vals[0], vals[1]);
+ else
+   fprintf (file, "0x%08lx%08lx", vals[1], vals[0]);
+   }
+ else
+   gcc_unreachable ();
+   }
   break;
 default:
   output_addr_const (file, op);
diff --git a/gcc/testsuite/gcc.target/bpf/double-1.c 
b/gcc/testsuite/gcc.target/bpf/double-1.c
new file mode 100644
index 000..200f1bd18f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/double-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mlittle-endian" } */
+
+double f;
+double a() { f = 1.0; return 1.0; }
+double b() { f = 2.0; return 2.0; }
+double c() { f = 2.0; return 3.0; }
+double d() { f = 3.0; return 3.0; }
+
+/* { dg-final { scan-assembler-times "lddw\t%r.,0x3ff0" 2 } } */
+/* { dg-final { scan-assembler-times "lddw\t%r.,0x4000" 3 } } */
+/* { dg-final { scan-assembler-times "lddw\t%r.,0x4008" 3 } } */
diff --git a/gcc/testsuite/gcc.target/bpf/double-2.c 
b/gcc/testsuite/gcc.target/bpf/double-2.c
new file mode 100644
index 000..d04ddd0c575
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/double-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mbig-endian" } */
+
+double f;
+double a() { f = 1.0; return 1.0; }
+double b() { f = 2.0; return 2.0; }
+double c() { f = 2.0; return 3.0; }
+double d() { f = 3.0; return 3.0; }
+
+/* { dg-final { scan-assembler-times "lddw\t%r.,0x3ff0" 2 } } */
+/* { dg-final { scan-assembler-times "lddw\t%r.,0x4000" 3 } } */
+/* { dg-final { scan-assembler-times "lddw\t%r.,0x4008" 3 } } */
diff --git a/gcc/testsuite/gcc.target/bpf/float-1.c 
b/gcc/testsuite/gcc.target/bpf/float-1.c
new file mode 100644
index 000..05ed7bb651d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/float-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options 

Re: [RFC/PATCH] Remove the workaround for _Float128 precision [PR107299]

2023-01-10 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 09, 2023 at 10:21:52PM -0500, Michael Meissner wrote:
> I had the patches to change the precision to 128, and I just ran them.  C and
> C++ do not seem to be bothered by changing the precision to 128 (once I got it
> to build, etc.).  But Fortran on the other hand does actually use the 
> precision
> to differentiate between IBM extended double and IEEE 128-bit.  In particular,
> the following 3 tests fail when long double is IBM extended double:
> 
>   gfortran.dg/PR100914.f90
>   gfortran.dg/c-interop/typecodes-array-float128.f90
>   gfortran.dg/c-interop/typecodes-scalar-float128.f90
> 
> I tried adding code to use the old precisions for Fortran, but not for C/C++,
> but it didn't seem to work.
> 
> So while it might be possible to use a single 128 for the precision, it needs
> more work and attention, particularly on the Fortran side.

Can't be more than a few lines changed in the fortran FE.
Yes, the FE needs to know if it is IBM extended double or IEEE 128-bit so
that it can decide on the mangling - where to use the artificial kind 17 and
where to use 16.  But as long as it can figure that out, it doesn't need to
rely on a particular precision.

Jakub



Re: [PATCH] longlong.h: Do no use asm input cast for clang

2023-01-10 Thread Segher Boessenkool
On Tue, Jan 10, 2023 at 03:35:37PM +0100, Andreas Schwab wrote:
> On Jan 10 2023, Segher Boessenkool wrote:
> 
> > The file starts with
> >
> > /* longlong.h -- definitions for mixed size 32/64 bit arithmetic.
> >Copyright (C) 1991-2022 Free Software Foundation, Inc.
> >
> >This file is part of the GNU C Library.
> >
> > Please change that first then?
> 
> GCC is the source of the original version of longlong.h (from 1991).  It
> has then been imported into GMP, from where it found its way into GLIBC.
> After that, the file has been synchronized back and forth between GCC
> and GLIBC.

Then change the header to make that clear?  The current state suggests
that Glibc is the master copy.

I don't care what way this is resolved, but it would be good if it was
resolved *some* way :-)  We have rules and policies only to make clear
to everyone what to expect and what to do.  To make live easier for
everyone!


Segher


Re: [PATCH] libgcc: Fix uninitialized RA signing on AArch64 [PR107678]

2023-01-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 10, 2023 at 04:33:59PM +, Wilco Dijkstra via Gcc-patches wrote:
> @@ -1204,10 +1203,15 @@ execute_cfa_program (const unsigned char *insn_ptr,
>   case DW_CFA_GNU_window_save:
>  #if defined (__aarch64__) && !defined (__ILP32__)
> /* This CFA is multiplexed with Sparc.  On AArch64 it's used to toggle
> -  return address signing status.  */
> +  return address signing status.  The REG_UNDEFINED/UNSAVED states
> +  mean RA signing is enabled/disabled.  */
> reg = DWARF_REGNUM_AARCH64_RA_STATE;
> -   gcc_assert (fs->regs.how[reg] == REG_UNSAVED);
> -   fs->regs.reg[reg].loc.offset ^= 1;
> +   gcc_assert (fs->regs.how[reg] == REG_UNSAVED
> +   || fs->regs.how[reg] == REG_UNDEFINED);
> +   if (fs->regs.how[reg] == REG_UNSAVED)
> + fs->regs.how[reg] = REG_UNDEFINED;
> +   else
> + fs->regs.how[reg] = REG_UNSAVED;

Wouldn't the assert be better written just as:
  if (fs->regs.how[reg] == REG_UNSAVED)
fs->regs.how[reg] = REG_UNDEFINED;
  else
{
  gcc_assert (fs->regs.how[reg] == REG_UNDEFINED);
  fs->regs.how[reg] = REG_UNSAVED;
}
?

Anyway, the sooner this makes it into gcc trunk, the better, it breaks quite
a lot of stuff.

Jakub



Re: Missed lowering to ld1rq from svld1rq for memory operand

2023-01-10 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 5 Aug 2022 at 17:49, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > Hi Richard,
> > Following from off-list discussion, in the attached patch, I wrote pattern
> > similar to vec_duplicate_reg, which seems to work for the svld1rq 
> > tests.
> > Does it look OK ?
> >
> > Sorry, I didn't fully understand your suggestion on integrating with
> > vec_duplicate_reg
> > pattern. For vec_duplicate_reg, the operand to vec_duplicate expects
> > mode to be , while the pattern in patch expects operand of
> > vec_duplicate to have mode .
> > How do we write a pattern so an operand can accept either of the 2 modes ?
>
> I quoted the wrong one, sorry, should have been
> aarch64_vec_duplicate_vq_le.
>
> > Also it seems  cannot be used with SVE_ALL ?
>
> Yeah, these would be SVE_FULL only.
Hi Richard,
Sorry for the very late reply. I have attached patch, to integrate
with vec_duplicate_vq_le.
Bootstrapped+tested on aarch64-linux-gnu.
OK to commit ?

Thanks,
Prathamesh
>
> Richard
>
gcc/
* config/aarch64/aarch64-sve.md (aarch64_vec_duplicate_vq_le):
Change to define_insn_and_split to fold ldr+dup to ld1rq.
* config/aarch64/predicates.md (aarch64_sve_dup_ld1rq_operand): New.

testsuite/
* gcc.target/aarch64/sve/acle/general/pr96463-2.c: Adjust.

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index b8cc47ef5fc..4548375b8d6 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -2533,14 +2533,34 @@
 )
 
 ;; Duplicate an Advanced SIMD vector to fill an SVE vector (LE version).
-(define_insn "@aarch64_vec_duplicate_vq_le"
-  [(set (match_operand:SVE_FULL 0 "register_operand" "=w")
+
+(define_insn_and_split "@aarch64_vec_duplicate_vq_le"
+  [(set (match_operand:SVE_FULL 0 "register_operand" "=w, w")
(vec_duplicate:SVE_FULL
- (match_operand: 1 "register_operand" "w")))]
+ (match_operand: 1 "aarch64_sve_dup_ld1rq_operand" "w, UtQ")))
+   (clobber (match_scratch:VNx16BI 2 "=X, Upl"))]
   "TARGET_SVE && !BYTES_BIG_ENDIAN"
   {
-operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
-return "dup\t%0.q, %1.q[0]";
+switch (which_alternative)
+  {
+   case 0:
+ operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
+ return "dup\t%0.q, %1.q[0]";
+   case 1:
+ return "#";
+   default:
+ gcc_unreachable ();
+  }
+  }
+  "&& MEM_P (operands[1])"
+  [(const_int 0)]
+  {
+if (GET_CODE (operands[2]) == SCRATCH)
+  operands[2] = gen_reg_rtx (VNx16BImode);
+emit_move_insn (operands[2], CONSTM1_RTX (VNx16BImode));
+rtx gp = gen_lowpart (mode, operands[2]);
+emit_insn (gen_aarch64_sve_ld1rq (operands[0], operands[1], gp));
+DONE;
   }
 )
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index ff7f73d3f30..6062f37025e 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -676,6 +676,10 @@
   (ior (match_operand 0 "register_operand")
(match_operand 0 "aarch64_sve_ld1r_operand")))
 
+(define_predicate "aarch64_sve_dup_ld1rq_operand"
+  (ior (match_operand 0 "register_operand")
+   (match_operand 0 "aarch64_sve_ld1rq_operand")))
+
 (define_predicate "aarch64_sve_ptrue_svpattern_immediate"
   (and (match_code "const")
(match_test "aarch64_sve_ptrue_svpattern_p (op, NULL)")))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
index 196de3f5e0a..c38204e6874 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr96463-2.c
@@ -26,4 +26,4 @@ TEST(svfloat64_t, float64_t, f64)
 
 TEST(svbfloat16_t, bfloat16_t, bf16)
 
-/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]} 12 { 
target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-not {\tdup\t} } } */


Re: [PATCH] libgcc: Fix uninitialized RA signing on AArch64 [PR107678]

2023-01-10 Thread Wilco Dijkstra via Gcc-patches
Hi Szabolcs,

> i would keep the assert: how[reg] must be either UNSAVED or UNDEFINED
> here, other how[reg] means the toggle cfi instruction is mixed with
> incompatible instructions for the pseudo reg.
>
> and i would add a comment about this e.g. saying that UNSAVED/UNDEFINED
> how[reg] is used for tracking the return address signing status and
> other how[reg] is not allowed here.

I've added the assert back and updated the comment.

Cheers,
Wilco

v3: Improve comments, add assert.

A recent change only initializes the regs.how[] during Dwarf unwinding
which resulted in an uninitialized offset used in return address signing
and random failures during unwinding.  The fix is to encode the return
address signing state in REG_UNSAVED and REG_UNDEFINED.

Passes bootstrap & regress, OK for commit?

libgcc/
PR target/107678
* unwind-dw2.c (execute_cfa_program): Use REG_UNSAVED/UNDEFINED
to encode return address signing state.
* config/aarch64/aarch64-unwind.h (aarch64_demangle_return_addr)
Check current return address signing state.
(aarch64_frob_update_contex): Remove.

---

diff --git a/libgcc/config/aarch64/aarch64-unwind.h 
b/libgcc/config/aarch64/aarch64-unwind.h
index 
26db9cbd9e5c526e0c410a4fc6be2bedb7d261cf..1afc3f9d308b95bc787398263e629bab226ff1ba
 100644
--- a/libgcc/config/aarch64/aarch64-unwind.h
+++ b/libgcc/config/aarch64/aarch64-unwind.h
@@ -29,8 +29,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 
 #define MD_DEMANGLE_RETURN_ADDR(context, fs, addr) \
   aarch64_demangle_return_addr (context, fs, addr)
-#define MD_FROB_UPDATE_CONTEXT(context, fs) \
-  aarch64_frob_update_context (context, fs)
 
 static inline int
 aarch64_cie_signed_with_b_key (struct _Unwind_Context *context)
@@ -55,42 +53,27 @@ aarch64_cie_signed_with_b_key (struct _Unwind_Context 
*context)
 
 static inline void *
 aarch64_demangle_return_addr (struct _Unwind_Context *context,
- _Unwind_FrameState *fs ATTRIBUTE_UNUSED,
+ _Unwind_FrameState *fs,
  _Unwind_Word addr_word)
 {
   void *addr = (void *)addr_word;
-  if (context->flags & RA_SIGNED_BIT)
+  const int reg = DWARF_REGNUM_AARCH64_RA_STATE;
+
+  if (fs->regs.how[reg] == REG_UNSAVED)
+return addr;
+
+  /* Return-address signing state is toggled by DW_CFA_GNU_window_save (where
+ REG_UNDEFINED means enabled), or set by a DW_CFA_expression.  */
+  if (fs->regs.how[reg] == REG_UNDEFINED
+  || (_Unwind_GetGR (context, reg) & 0x1) != 0)
 {
   _Unwind_Word salt = (_Unwind_Word) context->cfa;
   if (aarch64_cie_signed_with_b_key (context) != 0)
return __builtin_aarch64_autib1716 (addr, salt);
   return __builtin_aarch64_autia1716 (addr, salt);
 }
-  else
-return addr;
-}
-
-/* Do AArch64 private initialization on CONTEXT based on frame info FS.  Mark
-   CONTEXT as return address signed if bit 0 of DWARF_REGNUM_AARCH64_RA_STATE 
is
-   set.  */
-
-static inline void
-aarch64_frob_update_context (struct _Unwind_Context *context,
-_Unwind_FrameState *fs)
-{
-  const int reg = DWARF_REGNUM_AARCH64_RA_STATE;
-  int ra_signed;
-  if (fs->regs.how[reg] == REG_UNSAVED)
-ra_signed = fs->regs.reg[reg].loc.offset & 0x1;
-  else
-ra_signed = _Unwind_GetGR (context, reg) & 0x1;
-  if (ra_signed)
-/* The flag is used for re-authenticating EH handler's address.  */
-context->flags |= RA_SIGNED_BIT;
-  else
-context->flags &= ~RA_SIGNED_BIT;
 
-  return;
+  return addr;
 }
 
 #endif /* defined AARCH64_UNWIND_H && defined __ILP32__ */
diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
index 
eaceace20298b9b13344aff9d1fe9ee5f9c7bd73..55fe35520106e848c5d4aea4e7104bf4a0c14891
 100644
--- a/libgcc/unwind-dw2.c
+++ b/libgcc/unwind-dw2.c
@@ -139,7 +139,6 @@ struct _Unwind_Context
 #define EXTENDED_CONTEXT_BIT ((~(_Unwind_Word) 0 >> 2) + 1)
   /* Bit reserved on AArch64, return address has been signed with A or B
  key.  */
-#define RA_SIGNED_BIT ((~(_Unwind_Word) 0 >> 3) + 1)
   _Unwind_Word flags;
   /* 0 for now, can be increased when further fields are added to
  struct _Unwind_Context.  */
@@ -1204,10 +1203,15 @@ execute_cfa_program (const unsigned char *insn_ptr,
case DW_CFA_GNU_window_save:
 #if defined (__aarch64__) && !defined (__ILP32__)
  /* This CFA is multiplexed with Sparc.  On AArch64 it's used to toggle
-return address signing status.  */
+return address signing status.  The REG_UNDEFINED/UNSAVED states
+mean RA signing is enabled/disabled.  */
  reg = DWARF_REGNUM_AARCH64_RA_STATE;
- gcc_assert (fs->regs.how[reg] == REG_UNSAVED);
- fs->regs.reg[reg].loc.offset ^= 1;
+ gcc_assert (fs->regs.how[reg] == REG_UNSAVED
+ || fs->regs.how[reg] == REG_UNDEFINED);
+ if (fs->regs.how[reg] == REG_UNSAVED)
+   

Re: [PATCH 1/2] libstdc++: Enable string_view in freestanding

2023-01-10 Thread Arsen Arsenović via Gcc-patches
Hi Jonathan,

Jonathan Wakely  writes:

> Sorry for the top post.
>
> -#define __cpp_lib_string_contains 202011L
> +#if _GLIBCXX_HOSTED
> +  // This FTM is not hosted as it also implies matching 
> support,
> +  // and  is omitted from the freestanding subset.
> +# define __cpp_lib_string_contains 202011L
> +#endif // HOSTED
>
> That should say "not freestanding", right?

Whoops, yes.  Here's the fixed-up patch.

From 07cac07fc88994ced9f3ea97c4e03f8c719c4ee4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Arsen=20Arsenovi=C4=87?= 
Date: Thu, 15 Dec 2022 00:53:37 +0100
Subject: [PATCH 1/2] libstdc++: Enable string_view in freestanding

This enables the default contract handler in freestanding environments,
and, of course, provides freestanding users with string_view.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Install bits/char_traits.h,
std/string_view
* include/Makefile.in: Regenerate.
* include/bits/char_traits.h: Gate hosted-only, wchar-only and
mbstate-only bits behind appropriate #ifs.
* include/std/string_view: Gate  functionality behind
HOSTED.
* include/std/version: Enable __cpp_lib_constexpr_string_view
and __cpp_lib_starts_ends_with in !HOSTED.
* include/std/ranges: Re-enable __is_basic_string_view on
freestanding, include  directly.
* include/precompiled/stdc++.h: Include  when
!HOSTED too.
* testsuite/20_util/function_objects/searchers.cc: Skip testing
boyer_moore searchers on freestanding
* testsuite/21_strings/basic_string_view/capacity/1.cc: Guard
-related tests behind __STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/cons/char/1.cc: Ditto.
* testsuite/21_strings/basic_string_view/cons/char/2.cc: Remove
unused  include.
* testsuite/21_strings/basic_string_view/cons/char/3.cc: Remove
unused  include.
* testsuite/21_strings/basic_string_view/cons/char/range.cc:
Guard  related testing behind __STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/cons/wchar_t/1.cc:
Guard  related tests behind __STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/element_access/char/1.cc:
Ditto.
* testsuite/21_strings/basic_string_view/element_access/wchar_t/1.cc:
Guard  tests behind __STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/operations/contains/char/2.cc:
Enable test on freestanding, guard  bits behind
__STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/operations/substr/char.cc:
Guard  bits behind __STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/operations/substr/wchar_t.cc:
Ditto.
---
 libstdc++-v3/include/Makefile.am  |  6 +--
 libstdc++-v3/include/Makefile.in  |  6 +--
 libstdc++-v3/include/bits/char_traits.h   | 50 ---
 libstdc++-v3/include/precompiled/stdc++.h |  3 +-
 libstdc++-v3/include/std/ranges   |  3 +-
 libstdc++-v3/include/std/string_view  | 19 +--
 libstdc++-v3/include/std/version  |  4 +-
 .../20_util/function_objects/searchers.cc | 27 --
 .../basic_string_view/capacity/1.cc   |  2 +
 .../basic_string_view/cons/char/1.cc  |  7 ++-
 .../basic_string_view/cons/char/2.cc  |  1 -
 .../basic_string_view/cons/char/3.cc  |  1 -
 .../basic_string_view/cons/char/range.cc  |  7 ++-
 .../basic_string_view/cons/wchar_t/1.cc   |  6 ++-
 .../element_access/char/1.cc  |  7 ++-
 .../element_access/wchar_t/1.cc   |  6 ++-
 .../operations/contains/char/2.cc |  1 -
 .../operations/substr/char.cc |  7 ++-
 .../operations/substr/wchar_t.cc  |  7 ++-
 19 files changed, 133 insertions(+), 37 deletions(-)

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index e91f4ddd4de..bf566082a8c 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -46,6 +46,7 @@ std_freestanding = \
${std_srcdir}/scoped_allocator \
${std_srcdir}/source_location \
${std_srcdir}/span \
+   ${std_srcdir}/string_view \
${std_srcdir}/tuple \
${std_srcdir}/type_traits \
${std_srcdir}/typeindex \
@@ -100,7 +101,6 @@ std_headers = \
${std_srcdir}/stop_token \
${std_srcdir}/streambuf \
${std_srcdir}/string \
-   ${std_srcdir}/string_view \
${std_srcdir}/system_error \
${std_srcdir}/thread \
${std_srcdir}/unordered_map \
@@ -120,6 +120,7 @@ bits_freestanding = \
${bits_srcdir}/c++0x_warning.h \
${bits_srcdir}/boost_concept_check.h \
${bits_srcdir}/concept_check.h \
+   ${bits_srcdir}/char_traits.h \
${bits_srcdir}/cpp_type_traits.h \
${bits_srcdir}/enable_special_members.h \

Ping^3: [PATCH] d: Update __FreeBSD_version values [PR107469]

2023-01-10 Thread Lorenzo Salvadore via Gcc-patches
Hello,

Ping https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605685.html

I would like to remind that Gerald Pfeifer already volunteered to commit this 
patch
when it is approved. However the patch has not been approved yet.

Thanks,

Lorenzo Salvadore

> --- Original Message ---
> On Friday, November 11th, 2022 at 12:07 AM, Lorenzo Salvadore 
> develo...@lorenzosalvadore.it wrote:
> 
> > Update __FreeBSD_version values for the latest FreeBSD supported
> > versions. In particular, add __FreeBSD_version for FreeBSD 14, which is
> > necessary to compile libphobos successfully on FreeBSD 14.
> > 
> > The patch has already been applied successfully in the official FreeBSD
> > ports tree for the ports lang/gcc11 and lang/gcc11-devel. Please see the
> > following commits:
> > 
> > https://cgit.freebsd.org/ports/commit/?id=f61fb49b2e76fd4f7a5b7a11510b5109206c19f2
> > https://cgit.freebsd.org/ports/commit/?id=57936dba89ea208e5dbc1bd2d7fda3d29a1838b3
> > 
> > libphobos/ChangeLog:
> > 
> > 2022-11-10 Lorenzo Salvadore develo...@lorenzosalvadore.it
> > 
> > PR d/107469.
> > * libdruntime/core/sys/freebsd/config.d: Update __FreeBSD_version.
> > 
> > ---
> > libphobos/libdruntime/core/sys/freebsd/config.d | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libphobos/libdruntime/core/sys/freebsd/config.d 
> > b/libphobos/libdruntime/core/sys/freebsd/config.d
> > index 5e3129e2422..9d502e52e32 100644
> > --- a/libphobos/libdruntime/core/sys/freebsd/config.d
> > +++ b/libphobos/libdruntime/core/sys/freebsd/config.d
> > @@ -14,8 +14,9 @@ public import core.sys.posix.config;
> > // NOTE: When adding newer versions of FreeBSD, verify all current versioned
> > // bindings are still compatible with the release.
> > 
> > - version (FreeBSD_13) enum __FreeBSD_version = 130;
> > -else version (FreeBSD_12) enum __FreeBSD_version = 1202000;
> > + version (FreeBSD_14) enum __FreeBSD_version = 140;
> > +else version (FreeBSD_13) enum __FreeBSD_version = 1301000;
> > +else version (FreeBSD_12) enum __FreeBSD_version = 1203000;
> > else version (FreeBSD_11) enum __FreeBSD_version = 1104000;
> > else version (FreeBSD_10) enum __FreeBSD_version = 1004000;
> > else version (FreeBSD_9) enum __FreeBSD_version = 903000;
> > --
> > 2.38.0


Re: [PATCH, Modula2] PR-108142 Many empty directories created in the build directory

2023-01-10 Thread Gaius Mulley via Gcc-patches
Jakub Jelinek  writes:

> On Tue, Jan 10, 2023 at 11:16:28AM +0100, Richard Biener via Gcc-patches 
> wrote:
>> > @@ -424,7 +388,7 @@ override PLUGINCFLAGS := $(filter-out 
>> > -mdynamic-no-pic,$(PLUGINCFLAGS))
>> >
>> >  plugin/m2rte$(soext): $(srcdir)/m2/plugin/m2rte.cc 
>> > $(GCC_HEADER_DEPENDENCIES_FOR_M2) \
>> >  insn-attr-common.h insn-flags.h $(generated_files)
>> > -   test -d plugin || mkdir plugin
>> > +   -test -d plugin || $(mkinstalldirs) plugin
>> 
>> I wonder if that's possibly racy (that's why you use mkinstalldirs?)?
>
> Using $(mkinstalldirs) in the patch is what I've suggested because
> previously the patch was using mkdir -p which we almost never use
> (I think only some Ada Makefiles).  Above when it is a single directory
> mkdir is fine.
>   -test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
> etc. is what is used in gcc/Makefile.in in some spots.
> If 2 shells do that test -d plugin || mkdir plugin at the same time,
> then yes, both might do mkdir, but that is why we have the - at the start,
> the error of doing mkdir twice will be ignored then.

thanks both - will apply the patch and close the PR

regards,
Gaius


[PATCH] tree-optimization/106293 - missed DSE with virtual LC PHI

2023-01-10 Thread Richard Biener via Gcc-patches
Degenerate virtual PHIs can break DSEs fragile heuristic as to what
defs it can handle for further processing.  The following enhances
it to look through degenerate PHIs by means of a worklist, processing
the degenerate PHI defs uses to the defs array.  The rewrite of
virtuals into loop-closed SSA caused this to issue appear more often.
The patch itself is mostly re-indenting the new loop body.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106293
* tree-ssa-dse.cc (dse_classify_store): Use a worklist to
process degenerate PHI defs.

* gcc.dg/tree-ssa/ssa-dse-46.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-46.c |  23 +++
 gcc/tree-ssa-dse.cc| 181 +++--
 2 files changed, 121 insertions(+), 83 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-46.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-46.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-46.c
new file mode 100644
index 000..68b36433ffc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-46.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1" } */
+
+int a;
+static long b = 4073709551612, d;
+short c;
+void foo();
+char e(int **f) {
+  **f = 0;
+  if (a) {
+unsigned long *g = 
+unsigned long **h = 
+for (; d;) {
+  foo();
+  for (; c;) {
+unsigned long ***i = 
+  }
+}
+  }
+  return 1;
+}
+
+/* { dg-final { scan-tree-dump-not "" "dse1" } } */
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 89e2fa2c3f5..46ab57d5754 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -984,108 +984,123 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
   else
defvar = gimple_vdef (temp);
 
-  /* If we're instructed to stop walking at region boundary, do so.  */
-  if (defvar == stop_at_vuse)
-   return DSE_STORE_LIVE;
-
   auto_vec defs;
   gphi *first_phi_def = NULL;
   gphi *last_phi_def = NULL;
-  FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar)
+
+  auto_vec worklist;
+  worklist.quick_push (defvar);
+
+  do
{
- /* Limit stmt walking.  */
- if (++cnt > param_dse_max_alias_queries_per_store)
-   {
- fail = true;
- break;
-   }
+ defvar = worklist.pop ();
+ /* If we're instructed to stop walking at region boundary, do so.  */
+ if (defvar == stop_at_vuse)
+   return DSE_STORE_LIVE;
 
- /* In simple cases we can look through PHI nodes, but we
-have to be careful with loops and with memory references
-containing operands that are also operands of PHI nodes.
-See gcc.c-torture/execute/20051110-*.c.  */
- if (gimple_code (use_stmt) == GIMPLE_PHI)
+ FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar)
{
- /* If we already visited this PHI ignore it for further
-processing.  */
- if (!bitmap_bit_p (visited,
-SSA_NAME_VERSION (PHI_RESULT (use_stmt
+ /* Limit stmt walking.  */
+ if (++cnt > param_dse_max_alias_queries_per_store)
{
- /* If we visit this PHI by following a backedge then we have
-to make sure ref->ref only refers to SSA names that are
-invariant with respect to the loop represented by this
-PHI node.  */
- if (dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
- gimple_bb (use_stmt))
- && !for_each_index (ref->ref ? >ref : >base,
- check_name, gimple_bb (use_stmt)))
-   return DSE_STORE_LIVE;
- defs.safe_push (use_stmt);
- if (!first_phi_def)
-   first_phi_def = as_a  (use_stmt);
- last_phi_def = as_a  (use_stmt);
+ fail = true;
+ break;
}
-   }
- /* If the statement is a use the store is not dead.  */
- else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
-   {
- if (dse_stmt_to_dr_map
- && ref->ref
- && is_gimple_assign (use_stmt))
+
+ /* In simple cases we can look through PHI nodes, but we
+have to be careful with loops and with memory references
+containing operands that are also operands of PHI nodes.
+See gcc.c-torture/execute/20051110-*.c.  */
+ if (gimple_code (use_stmt) == GIMPLE_PHI)
{
- if (!dra)
-   dra.reset (create_data_ref (NULL, NULL, ref->ref, stmt,
-   false, false));
- bool existed_p;
- data_reference_p 
-

Re: [PATCH] libatomic: Provide gthr.h default implementation

2023-01-10 Thread Sebastian Huber

On 19/12/2022 17:02, Sebastian Huber wrote:

Build libatomic for all targets.  Use gthr.h to provide a default
implementation.  If the thread model is "single", then this implementation will
not work if for example atomic operations are used for thread/interrupt
synchronization.


Is this and the related -fprofile-update=atomic patch something for GCC 14?

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


[PATCH] gcc: emit DW_AT_name for DW_TAG_GNU_formal_parameter_pack [PR70536]

2023-01-10 Thread Ed Catmur
Per http://wiki.dwarfstd.org/index.php?title=C%2B%2B0x:_Variadic_templates 
DW_TAG_GNU_formal_parameter_pack should have a DW_AT_name:

17$:  DW_TAG_formal_parameter_pack
  DW_AT_name("args")
18$:  DW_TAG_formal_parameter
  ! no DW_AT_name attribute
  DW_AT_type(reference to 13$)
(...)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70536From 2c50fbbfdd42c9ecb6d6b8e4c53bb3029ef1ee25 Mon Sep 17 00:00:00 2001
From: Ed Catmur 
Date: Sat, 12 Mar 2022 17:38:33 +
Subject: [PATCH] emit DW_AT_name for DW_TAG_GNU_formal_parameter_pack

Per http://wiki.dwarfstd.org/index.php?title=C%2B%2B0x:_Variadic_templates 
DW_TAG_GNU_formal_parameter_pack should have a DW_AT_name:

17$:  DW_TAG_formal_parameter_pack
  DW_AT_name("args")
18$:  DW_TAG_formal_parameter
  ! no DW_AT_name attribute
  DW_AT_type(reference to 13$)
(...)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70536
---
 gcc/dwarf2out.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 5681b01749ad..ef3bc6f88e07 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -23006,7 +23006,7 @@ gen_formal_parameter_pack_die  (tree parm_pack,
  && subr_die);
 
   parm_pack_die = new_die (DW_TAG_GNU_formal_parameter_pack, subr_die, 
parm_pack);
-  add_src_coords_attributes (parm_pack_die, parm_pack);
+  add_name_and_src_coords_attributes (parm_pack_die, parm_pack);
 
   for (arg = pack_arg; arg; arg = DECL_CHAIN (arg))
 {


Re: [PATCH] diagnostics: fix crash with -fdiagnostics-format=json-file

2023-01-10 Thread Martin Liška
On 1/6/23 14:21, David Malcolm wrote:
> On Fri, 2023-01-06 at 12:33 +0100, Martin Liška wrote:
>> Patch can bootstrap on x86_64-linux-gnu and survives regression
>> tests.
> 
> Thanks for the patch.
> 
> I noticed that you marked PR 108307 as a dup of this, which covers
> -fdiagnostics-format=sarif-file (and a .S file as input).
> 
> The patch doesn't add any test coverage (for either of the diagnostic
> formats).
> 
> If we try to emit a diagnostic and base_file_name is NULL, and the user
> requested one of -fdiagnostics-format={json,sarif}-file, where do the
> diagnostics go?  Where should they go?

Hey.

I've done a new version of the patch where I utilize x_main_input_basename.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
MartinFrom 078233b4f84ae6d81a7327589723b2be518d29ca Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 10 Jan 2023 15:14:05 +0100
Subject: [PATCH] middle-end: always find a basename for -fdiagnostics-format=*

In some situations, x_dump_base_name is NULL and thus we can
and should use x_main_input_basename which should never be NULL.

	PR middle-end/106133

gcc/ChangeLog:

	* gcc.cc (driver_handle_option): Use x_main_input_basename
	if x_dump_base_name is null.
	* opts.cc (common_handle_option): Likewise.

gcc/testsuite/ChangeLog:

	* c-c++-common/pr106133.c: New test.
---
 gcc/gcc.cc| 10 +++---
 gcc/opts.cc   | 10 +++---
 gcc/testsuite/c-c++-common/pr106133.c |  3 +++
 3 files changed, 17 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr106133.c

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index d629ca5e424..382ca817a09 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -4290,9 +4290,13 @@ driver_handle_option (struct gcc_options *opts,
   break;
 
 case OPT_fdiagnostics_format_:
-  diagnostic_output_format_init (dc, opts->x_dump_base_name,
- (enum diagnostics_output_format)value);
-  break;
+	{
+	  const char *basename = (opts->x_dump_base_name ? opts->x_dump_base_name
+  : opts->x_main_input_basename);
+	  diagnostic_output_format_init (dc, basename,
+	 (enum diagnostics_output_format)value);
+	  break;
+	}
 
 case OPT_Wa_:
   {
diff --git a/gcc/opts.cc b/gcc/opts.cc
index 9ba47d7deaa..4809c18a529 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -2863,9 +2863,13 @@ common_handle_option (struct gcc_options *opts,
   break;
 
 case OPT_fdiagnostics_format_:
-  diagnostic_output_format_init (dc, opts->x_dump_base_name,
- (enum diagnostics_output_format)value);
-  break;
+	{
+	  const char *basename = (opts->x_dump_base_name ? opts->x_dump_base_name
+  : opts->x_main_input_basename);
+	  diagnostic_output_format_init (dc, basename,
+	 (enum diagnostics_output_format)value);
+	  break;
+	}
 
 case OPT_fdiagnostics_parseable_fixits:
   dc->extra_output_kind = (value
diff --git a/gcc/testsuite/c-c++-common/pr106133.c b/gcc/testsuite/c-c++-common/pr106133.c
new file mode 100644
index 000..7d2c5afe417
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr106133.c
@@ -0,0 +1,3 @@
+/* PR middle-end/106133 */
+/* { dg-do compile } */
+/* { dg-options "-fdiagnostics-format=json-file -E" } */
-- 
2.39.0



RE: [x86 PATCH] PR rtl-optimization/107991: peephole2 to tweak register allocation.

2023-01-10 Thread Roger Sayle


Hi Richard and Uros,
I believe I've managed to reduce a minimal test case that exhibits the
underlying
problem with reload.   The following snippet when compiled on x86-64 with
-O2:

void ext(int x);
void foo(int x, int y) { ext(y - x); }

produces the following 5 instructions prior to reload:
insn 13: r86:SI=di:SI   // REG_DEAD di:SI
insn 14: r87:SI=si:SI   // REG_READ si:SI
insn 7: {r85:SI=r87:SI-r86:SI;clobber flags:CC;}// REG_DEAD r86:SI,
r87:SI
insn 8: di:SI=r85:SI// REG_READ r85:SI
insn 9: call [`ext'] argc:0

Hence there are three pseudos (allocnos) to be register allocated; r85, r86
& r87.

Currently, reload produces the following assignments/colouring using 3 hard
regs.
r85 in di
r86 in ax
r87 in si

A better (optimal) register allocation requires only 2 hard regs.
r85 in di
r86 in si
r87 in di

Fortunately, this over-allocation is cleaned up later (during
cprop_hardreg), but
as pointed out by Uros, there's little benefit in reducing register pressure
this
late (after peephole2).

As far as I understand it, Richard's patch to handle fully-tied destinations
looks
very reasonable (and is impressively tested/benchmarked):
https://gcc.gnu.org/pipermail/gcc-patches/2019-September/530743.html
but in the prototypical 0:"=r", 1:"0", 2:"r" constraint case, as used in the
problematic subsi3_1 pattern (of insn 7), I'm trying to figure out why r85
and r87 don't get allocated to the same register [given the local spilling
of non-eliminable hard regs in insn 7, temporarily introducing a new pseudo
r89].

In closing, reload is a complex piece of code that's shared between a large
number of backends; if Richard's patch is a win "statistically", then it's
not unreasonable to use a peephole2 to clean-up/catch the corner cases
on class_likely_spilled_p targets [indeed many of the peephole2s in i386.md
tidy up register allocation issues], and such a "specialized" fix is more
suitable
for stage 3, than a potentially disruptive tweak to reload.  At worst, the
peephole2 becomes dead if/when the problem is fixed upstream.

Or put another way, if reload worked perfectly, i386.md wouldn't need
many of the peephole2s that it currently has.  Oh, for such an ideal world.

I hope this helps.
Cheers,
Roger
--

> -Original Message-
> From: Richard Sandiford 
> Sent: 10 January 2023 10:48
> To: Uros Bizjak 
> Cc: GCC Patches ; Roger Sayle
> 
> Subject: Re: [x86 PATCH] PR rtl-optimization/107991: peephole2 to tweak
> register allocation.
> 
> Uros Bizjak  writes:
> > On Mon, Jan 9, 2023 at 4:01 PM Roger Sayle
>  wrote:
> >>
> >>
> >> This patch addresses PR rtl-optimization/107991, which is a P2
> >> regression where GCC currently requires more "mov" instructions than
GCC 7.
> >>
> >> The x86's two address ISA creates some interesting challenges for
reload.
> >> For example, the tricky "x = y - x" usually needs to be implemented
> >> on x86 as
> >>
> >> tmp = x
> >> x = y
> >> x -= tmp
> >>
> >> where a scratch register and two mov's are required to work around
> >> the lack of a subf (subtract from) or rsub (reverse subtract) insn.
> >>
> >> Not uncommonly, if y is dead after this subtraction, register
> >> allocation can be improved by clobbering y.
> >>
> >> y -= x
> >> x = y
> >>
> >> For the testcase in PR 107991, things are slightly more complicated,
> >> where y is not itself dead, but is assigned from (i.e. equivalent to)
> >> a value that is dead.  Hence we have something like:
> >>
> >> y = z
> >> x = y - x
> >>
> >> so, GCC's reload currently generates the expected shuffle (as y is
live):
> >>
> >> y = z
> >> tmp = x
> >> x = y
> >> x -= tmp
> >>
> >> but we can use a peephole2 that understands that y and z are
> >> equivalent, and that z is dead, to produce the shorter sequence:
> >>
> >> y = z
> >> z -= x
> >> x = z
> >>
> >> In practice, for the new testcase from PR 107991, which before
produced:
> >>
> >> foo:movl%edx, %ecx
> >> movl%esi, %edx
> >> movl%esi, %eax
> >> subl%ecx, %edx
> >> testb   %dil, %dil
> >> cmovne  %edx, %eax
> >> ret
> >>
> >> with this patch/peephole2 we now produce the much improved:
> >>
> >> foo:movl%esi, %eax
> >> subl%edx, %esi
> >> testb   %dil, %dil
> >> cmovne  %esi, %eax
> >> ret
> >>
> >>
> >> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> >> and make -k check, both with and without --target_board=unix{-m32},
> >> with no new failures.  Ok for mainline?
> >
> > Looking at the PR, it looks to me that Richard S (CC'd) wants to solve
> > this issue in the register allocator. This would be preferred
> > (compared to a very specialized peephole2), since peephole2 pass comes
> > very late in the game, so one freed register does not contribute to
> > 

Re: [PATCH] longlong.h: Do no use asm input cast for clang

2023-01-10 Thread Andreas Schwab
On Jan 10 2023, Segher Boessenkool wrote:

> The file starts with
>
> /* longlong.h -- definitions for mixed size 32/64 bit arithmetic.
>Copyright (C) 1991-2022 Free Software Foundation, Inc.
>
>This file is part of the GNU C Library.
>
> Please change that first then?

GCC is the source of the original version of longlong.h (from 1991).  It
has then been imported into GMP, from where it found its way into GLIBC.
After that, the file has been synchronized back and forth between GCC
and GLIBC.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH 1/2] libstdc++: Enable string_view in freestanding

2023-01-10 Thread Jonathan Wakely via Gcc-patches
Sorry for the top post.

-#define __cpp_lib_string_contains 202011L
+#if _GLIBCXX_HOSTED
+  // This FTM is not hosted as it also implies matching 
support,
+  // and  is omitted from the freestanding subset.
+# define __cpp_lib_string_contains 202011L
+#endif // HOSTED

That should say "not freestanding", right?


On Tue, 10 Jan 2023, 10:04 Arsen Arsenović via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> This enables the default contract handler in freestanding environments,
> and, of course, provides freestanding users with string_view.
>
> libstdc++-v3/ChangeLog:
>
> * include/Makefile.am: Install bits/char_traits.h,
> std/string_view
> * include/Makefile.in: Regenerate.
> * include/bits/char_traits.h: Gate hosted-only, wchar-only and
> mbstate-only bits behind appropriate #ifs.
> * include/std/string_view: Gate  functionality behind
> HOSTED.
> * include/std/version: Enable __cpp_lib_constexpr_string_view
> and __cpp_lib_starts_ends_with in !HOSTED.
> * include/std/ranges: Re-enable __is_basic_string_view on
> freestanding, include  directly.
> * include/precompiled/stdc++.h: Include  when
> !HOSTED too.
> * testsuite/20_util/function_objects/searchers.cc: Skip testing
> boyer_moore searchers on freestanding
> * testsuite/21_strings/basic_string_view/capacity/1.cc: Guard
> -related tests behind __STDC_HOSTED__.
> * testsuite/21_strings/basic_string_view/cons/char/1.cc: Ditto.
> * testsuite/21_strings/basic_string_view/cons/char/2.cc: Remove
> unused  include.
> * testsuite/21_strings/basic_string_view/cons/char/3.cc: Remove
> unused  include.
> * testsuite/21_strings/basic_string_view/cons/char/range.cc:
> Guard  related testing behind __STDC_HOSTED__.
> * testsuite/21_strings/basic_string_view/cons/wchar_t/1.cc:
> Guard  related tests behind __STDC_HOSTED__.
> * testsuite/21_strings/basic_string_view/element_access/char/1.cc:
> Ditto.
> *
> testsuite/21_strings/basic_string_view/element_access/wchar_t/1.cc:
> Guard  tests behind __STDC_HOSTED__.
> *
> testsuite/21_strings/basic_string_view/operations/contains/char/2.cc:
> Enable test on freestanding, guard  bits behind
> __STDC_HOSTED__.
> * testsuite/21_strings/basic_string_view/operations/substr/char.cc:
> Guard  bits behind __STDC_HOSTED__.
> *
> testsuite/21_strings/basic_string_view/operations/substr/wchar_t.cc:
> Ditto.
> ---
> Morning (so much for submitting it last night eh? :D),
>
> This patchset enables the use of std::string_view in freestanding
> environments.  This permits freestanding programs to use contracts, and
> fixes building libstdc++.* on freestanding with one of the patches I
> sent previously.
>
> I also included fixes for some new test failures on unix/-ffreestanding.
> I hope to get some time to set up a dedicated runner for re-spinning
> -ffreestanding libstdc++ every so often in the near future..
>
> I haven't built Managarm with frg::string_view made into an alias for
> std::string_view yet, I can also do that before the merge, if so
> desired, as a little use-case test, but that might take a few days.
>
> Before NYE, I tested a full x86_64-pc-linux-gnu bootstrap, but I haven't
> had a chance to do that today after a rebase, though I did verify that
> --target_board='unix/{,-ffreestanding}' passes fine.  I can do that
> tonight and update this thread if need be.
>
> Thanks in advance, have a great day.
>
>  libstdc++-v3/include/Makefile.am  |  6 +--
>  libstdc++-v3/include/Makefile.in  |  6 +--
>  libstdc++-v3/include/bits/char_traits.h   | 50 ---
>  libstdc++-v3/include/precompiled/stdc++.h |  3 +-
>  libstdc++-v3/include/std/ranges   |  3 +-
>  libstdc++-v3/include/std/string_view  | 19 +--
>  libstdc++-v3/include/std/version  |  4 +-
>  .../20_util/function_objects/searchers.cc | 27 --
>  .../basic_string_view/capacity/1.cc   |  2 +
>  .../basic_string_view/cons/char/1.cc  |  7 ++-
>  .../basic_string_view/cons/char/2.cc  |  1 -
>  .../basic_string_view/cons/char/3.cc  |  1 -
>  .../basic_string_view/cons/char/range.cc  |  7 ++-
>  .../basic_string_view/cons/wchar_t/1.cc   |  6 ++-
>  .../element_access/char/1.cc  |  7 ++-
>  .../element_access/wchar_t/1.cc   |  6 ++-
>  .../operations/contains/char/2.cc |  1 -
>  .../operations/substr/char.cc |  7 ++-
>  .../operations/substr/wchar_t.cc  |  7 ++-
>  19 files changed, 133 insertions(+), 37 deletions(-)
>
> diff --git a/libstdc++-v3/include/Makefile.am
> b/libstdc++-v3/include/Makefile.am
> index e91f4ddd4de..bf566082a8c 100644
> --- a/libstdc++-v3/include/Makefile.am
> 

Re: [PATCH] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd

2023-01-10 Thread Segher Boessenkool
Hi!

On Tue, Jan 10, 2023 at 09:45:27PM +0800, Jiufu Guo wrote:
> As mentioned in PR108338, on p9, we could use mtvsrws to implement
> the conversion from SI#0 to SF (or lowpart DI to SF).  And we find
> we can also enhance the conversion from highpart DI to SF (as the
> case in this patch).
> 
> This patch enhances these conversions accordingly.

Those aren't conversions, they are just bitcasts, reinterpreting the
same datum as something else, but keeping all bits the same.

The mtvsrws insn moves a SImode value from a GPR to a VSR, splatting it
in all four lanes.  You'll typically want a xscvspdpn or similar after
that -- but with the value splat in all lanes it will surely be in the
lane that later instruction needs the data to be in :-)

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -158,6 +158,7 @@ (define_c_enum "unspec"
> UNSPEC_HASHCHK
> UNSPEC_XXSPLTIDP_CONST
> UNSPEC_XXSPLTIW_CONST
> +   UNSPEC_P9V_MTVSRWS
>])

Is it hard to decribe this without unspec?  Unspecs prevent the compiler
from optimising things (unless you very carefully implement all of that
manually -- but if you just write it as plain RTL most things fall into
place automatically).

There are many existing patterns that needlessly and detrimentally use
unspecs, but we should improve on that, not make it worse :-)

> @@ -8203,10 +8204,19 @@ (define_insn_and_split "movsf_from_si"
>rtx op2 = operands[2];
>rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
>  
> -  /* Move SF value to upper 32-bits for xscvspdpn.  */
> -  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
> -  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
> -  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +  if (TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE)
> +{
> +   emit_insn (gen_p9_mtvsrws (op0, op1_di));
> +   emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
> +}

This does not require TARGET_POWERPC64?

P9 implies we have direct moves (those are implied by P8 already).  We
also do not need to test for vector I think?

> +(define_code_iterator any_rshift [ashiftrt lshiftrt])
> +
>  ;; For extracting high part element from DImode register like:
>  ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
>  ;; split it before reload with "and mask" to avoid generating shift right
>  ;; 32 bit then shift left 32 bit.
> -(define_insn_and_split "movsf_from_si2"
> +(define_insn_and_split "movsf_from_si2_"
>[(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
>   (unspec:SF
>[(subreg:SI
> -(ashiftrt:DI
> +(any_rshift:DI
>   (match_operand:DI 1 "input_operand" "r")
>   (const_int 32))
>  0)]

Hrm.  You can write this with just a subreg, no shift is needed at all.
Can you please try that instead?  That is nastiness for endianness, but
that is still preferable over introducing new complexities like this.

> +(define_insn "p9_mtvsrws"
> +  [(set (match_operand:SF 0 "register_operand" "=wa")
> + (unspec:SF [(match_operand:DI 1 "register_operand" "r")]
> +UNSPEC_P9V_MTVSRWS))]
> +  "TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
> +  "mtvsrws %x0,%1"
> +  [(set_attr "type" "mtvsr")])

(Same issues here as above).

> +/* { dg-final { scan-assembler-times {\mmtvsrws\M} 1 { target { 
> has_arch_ppc64 && has_arch_pwr9 } } } } */

mtvsrws does not need ppc64.

> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { 
> has_arch_ppc64 && has_arch_pwr9 } } } } */
> +/* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { 
> has_arch_ppc64 && has_arch_pwr9 } } } } */

These two do of course.

> +/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { 
> has_arch_pwr8 && has_arch_ppc64 } } } } */

But this doesn't.


Segher


[PING][PATCH] arm: Split up MVE _Generic associations to prevent type clashes [PR107515]

2023-01-10 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

With these previous patches:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606586.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606587.html
we enabled the MVE overloaded _Generic associations to handle more
scalar types, however at PR 107515 we found a new regression that
wasn't detected in our testing:

With glibc's `posix/types.h`:
```
typedef signed int __int32_t;
...
typedef __int32_t int32_t;
```
We would get a `error: '_Generic' specifies two compatible types`
from `__ARM_mve_coerce3` because of `type: param`, when `type` is
`int` and `int32_t: param` both being the same under the hood.

The same did not happen with Newlib's header `sys/_stdint.h`:
```
typedef long int __int32_t;
...
typedef __int32_t int32_t ;
```
which worked fine, because it uses `long int`.

The same could feasibly happen in `__ARM_mve_coerce2` between
`__fp16` and `float16_t`.

The solution here is to break the _Generic down, so that the similar
types don't appear at the same level, as is done in `__ARM_mve_typeid`.

Ok for trunk?

Thanks,
Stam Markianos-Wright

gcc/ChangeLog:
 PR target/96795
 PR target/107515
 * config/arm/arm_mve.h (__ARM_mve_coerce2): Split types.
 (__ARM_mve_coerce3): Likewise.

gcc/testsuite/ChangeLog:
 PR target/96795
 PR target/107515
 *
gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c: New test.
 *
gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c: New test.


=== Inline Ctrl+C, Ctrl+V or patch ===

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index
09167ec118ed3310c5077145e119196f29d83cac..70003653db65736fcfd019e83d9f18153be650dc
100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35659,9 +35659,9 @@ extern void *__ARM_undef;
  #define __ARM_mve_coerce1(param, type) \
  _Generic(param, type: param, const type: param, default: *(type
*)__ARM_undef)
  #define __ARM_mve_coerce2(param, type) \
-_Generic(param, type: param, float16_t: param, float32_t: param,
default: *(type *)__ARM_undef)
+_Generic(param, type: param, __fp16: param, default: _Generic
(param, _Float16: param, float16_t: param, float32_t: param, default:
*(type *)__ARM_undef))
  #define __ARM_mve_coerce3(param, type) \
-_Generic(param, type: param, int8_t: param, int16_t: param,
int32_t: param, int64_t: param, uint8_t: param, uint16_t: param,
uint32_t: param, uint64_t: param, default: *(type *)__ARM_undef)
+_Generic(param, type: param, default: _Generic (param, int8_t:
param, int16_t: param, int32_t: param, int64_t: param, uint8_t: param,
uint16_t: param, uint32_t: param, uint64_t: param, default: *(type
*)__ARM_undef))

  #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */

diff --git
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c
new file mode 100644
index
..427dcacb5ff59b53d5eab1f1582ef6460da3f2f3
--- /dev/null
+++
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c
@@ -0,0 +1,65 @@
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O2 -Wno-pedantic -Wno-long-long" } */
+#include "arm_mve.h"
+
+float f1;
+double f2;
+float16_t f3;
+float32_t f4;
+__fp16 f5;
+_Float16 f6;
+
+int i1;
+short i2;
+long i3;
+long long i4;
+int8_t i5;
+int16_t i6;
+int32_t i7;
+int64_t i8;
+
+const int ci1;
+const short ci2;
+const long ci3;
+const long long ci4;
+const int8_t ci5;
+const int16_t ci6;
+const int32_t ci7;
+const int64_t ci8;
+
+float16x8_t floatvec;
+int16x8_t intvec;
+
+void test(void)
+{
+/* Test a few different supported ways of passing an int value.  The
+intrinsic vmulq was chosen arbitrarily, but it is representative of
+all intrinsics that take a non-const scalar value.  */
+intvec = vmulq(intvec, 2);
+intvec = vmulq(intvec, (int32_t) 2);
+intvec = vmulq(intvec, (short) 2);
+intvec = vmulq(intvec, i1);
+intvec = vmulq(intvec, i2);
+intvec = vmulq(intvec, i3);
+intvec = vmulq(intvec, i4);
+intvec = vmulq(intvec, i5);
+intvec = vmulq(intvec, i6);
+intvec = vmulq(intvec, i7);
+intvec = vmulq(intvec, i8);
+
+/* Test a few different supported ways of passing a float value.  */
+floatvec = vmulq(floatvec, 0.5);
+floatvec = vmulq(floatvec, 0.5f);
+floatvec = vmulq(floatvec, (__fp16) 0.5);
+floatvec = vmulq(floatvec, f1);
+floatvec = vmulq(floatvec, f2);
+floatvec = vmulq(floatvec, f3);
+floatvec = vmulq(floatvec, f4);
+floatvec = vmulq(floatvec, f5);
+floatvec = vmulq(floatvec, f6);
+floatvec = vmulq(floatvec, 0.15f16);
+floatvec = vmulq(floatvec, (_Float16) 0.15);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git

Re: [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714]

2023-01-10 Thread Stam Markianos-Wright via Gcc-patches



On 12/12/2022 13:42, Kyrylo Tkachov wrote:

Hi Stam,


-Original Message-
From: Stam Markianos-Wright 
Sent: Friday, December 9, 2022 1:32 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw
; Ramana Radhakrishnan
; ni...@redhat.com
Subject: [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions
[PR107714]

Hi all,

In the M-Class Arm-ARM:

https://developer.arm.com/documentation/ddi0553/bu/?lang=en

these MVE instructions only have '!' writeback variant and at:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

we found that the Um constraint would also allow through a
register offset writeback, resulting in an assembler error.

Here I have added a new constraint and predicate for these
instructions, which (uniquely, AFAICT), only support a `!` writeback
increment by the data size (inside the compiler this is a POST_INC).

No regressions in arm-none-eabi with MVE and MVE.FP.

Ok for trunk, and backport to GCC11 and GCC12 (testing pending)?

Thanks,
Stam

gcc/ChangeLog:
      PR target/107714
      * config/arm/arm-protos.h (mve_struct_mem_operand): New
protoype.
      * config/arm/arm.cc (mve_struct_mem_operand): New function.
      * config/arm/constraints.md (Ug): New constraint.
      * config/arm/mve.md (mve_vst4q): Change constraint.
      (mve_vst2q): Likewise.
      (mve_vld4q): Likewise.
      (mve_vld2q): Likewise.
      * config/arm/predicates.md (mve_struct_operand): New predicate.

gcc/testsuite/ChangeLog:
      PR target/107714
      * gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test.


diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 
e5a36d29c7135943b9bb5ea396f70e2e4beb1e4a..8908b7f5b15ce150685868e78e75280bf32053f1
 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -474,6 +474,12 @@
   (and (match_code "mem")
(match_test "TARGET_32BIT && arm_coproc_mem_operand (op, FALSE)")))
  
+(define_memory_constraint "Ug"

+ "@internal
+  In Thumb-2 state a valid MVE struct load/store address."
+ (and (match_code "mem")
+  (match_test "TARGET_HAVE_MVE && mve_struct_mem_operand (op)")))
+

I think you can define the constraints in terms of the new mve_struct_operand predicate 
directly (see how we define the "Ua" constraint, for example).
Ok if that works (and testing passes of course).


Done as discussed and re-tested on all branches. Pushed as:

4269a6567eb991e6838f40bda5be9e3a7972530c to trunk

25edc76f2afba0b4eaf22174d42de042a6969dbe to gcc-12

08842ad274f5e2630994f7c6e70b2d31768107ea to gcc-11

Thank you!
Stam



Thanks,
Kyrill



Re: [PATCH v2] libstdc++: Fix Unicode codecvt and add tests [PR86419]

2023-01-10 Thread Jonathan Wakely via Gcc-patches
On Tue, 10 Jan 2023, 13:43 Dimitrij Mijoski,  wrote:

> On Tue, 2023-01-10 at 13:28 +, Jonathan Wakely wrote:
>
> Thanks for the patch. Do you have a copyright assignment for gcc filed
> with the FSF?
>
>
> Yes, I have already signed the copyright assignment.
>

Great, thanks for confirming.


[PATCH] rs6000: Enhance lowpart/highpart DI->SF by mtvsrws/mtvsrd

2023-01-10 Thread Jiufu Guo via Gcc-patches
Hi,

As mentioned in PR108338, on p9, we could use mtvsrws to implement
the conversion from SI#0 to SF (or lowpart DI to SF).  And we find
we can also enhance the conversion from highpart DI to SF (as the
case in this patch).

This patch enhances these conversions accordingly.

Bootstrap and regtests pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

PR target/108338

gcc/ChangeLog:

* config/rs6000/rs6000.md (any_rshift): New code_iterator.
(movsf_from_si2): Rename to...
(movsf_from_si2_): ... this.
(p9_mtvsrws): New define_insn.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr108338.c: New test.
---
 gcc/config/rs6000/rs6000.md | 32 +---
 gcc/testsuite/gcc.target/powerpc/pr108338.c | 41 +
 2 files changed, 67 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108338.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 3cae64a264a..9025a912141 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -158,6 +158,7 @@ (define_c_enum "unspec"
UNSPEC_HASHCHK
UNSPEC_XXSPLTIDP_CONST
UNSPEC_XXSPLTIW_CONST
+   UNSPEC_P9V_MTVSRWS
   ])
 
 ;;
@@ -8203,10 +8204,19 @@ (define_insn_and_split "movsf_from_si"
   rtx op2 = operands[2];
   rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
 
-  /* Move SF value to upper 32-bits for xscvspdpn.  */
-  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
-  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
-  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+  if (TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE)
+{
+   emit_insn (gen_p9_mtvsrws (op0, op1_di));
+   emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+}
+  else
+{
+  /* Move SF value to upper 32-bits for xscvspdpn.  */
+  emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));
+  emit_insn (gen_p8_mtvsrd_sf (op0, op2));
+  emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0));
+}
+
   DONE;
 }
   [(set_attr "length"
@@ -8219,15 +8229,17 @@ (define_insn_and_split "movsf_from_si"
"*,  *, p9v,   p8v,   *, *,
 p8v,p8v,   p8v,   *")])
 
+(define_code_iterator any_rshift [ashiftrt lshiftrt])
+
 ;; For extracting high part element from DImode register like:
 ;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;}
 ;; split it before reload with "and mask" to avoid generating shift right
 ;; 32 bit then shift left 32 bit.
-(define_insn_and_split "movsf_from_si2"
+(define_insn_and_split "movsf_from_si2_"
   [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
(unspec:SF
 [(subreg:SI
-  (ashiftrt:DI
+  (any_rshift:DI
(match_operand:DI 1 "input_operand" "r")
(const_int 32))
   0)]
@@ -9475,6 +9487,14 @@ (define_insn "p8_mtvsrd_sf"
   "mtvsrd %x0,%1"
   [(set_attr "type" "mtvsr")])
 
+(define_insn "p9_mtvsrws"
+  [(set (match_operand:SF 0 "register_operand" "=wa")
+   (unspec:SF [(match_operand:DI 1 "register_operand" "r")]
+  UNSPEC_P9V_MTVSRWS))]
+  "TARGET_P9_VECTOR && TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "mtvsrws %x0,%1"
+  [(set_attr "type" "mtvsr")])
+
 (define_insn_and_split "reload_vsx_from_gprsf"
   [(set (match_operand:SF 0 "register_operand" "=wa")
(unspec:SF [(match_operand:SF 1 "register_operand" "r")]
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108338.c 
b/gcc/testsuite/gcc.target/powerpc/pr108338.c
new file mode 100644
index 000..2afac79ea4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108338.c
@@ -0,0 +1,41 @@
+// { dg-do run }
+// { dg-options "-O2 -save-temps" }
+
+float __attribute__ ((noipa)) sf_from_di_off0 (long long l)
+{
+  char buff[16];
+  *(long long*)buff = l;
+  float f = *(float*)(buff);
+  return f;
+}
+
+float  __attribute__ ((noipa)) sf_from_di_off4 (long long l)
+{
+  char buff[16];
+  *(long long*)buff = l;
+  float f = *(float*)(buff + 4);
+  return f; 
+}
+
+/* { dg-final { scan-assembler-times {\mmtvsrws\M} 1 { target { has_arch_ppc64 
&& has_arch_pwr9 } } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 1 { target { has_arch_ppc64 
&& has_arch_pwr9 } } } } */
+/* { dg-final { scan-assembler-times {\mrldicr\M} 1 { target { has_arch_ppc64 
&& has_arch_pwr9 } } } } */
+/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 { target { 
has_arch_pwr8 && has_arch_ppc64 } } } } */
+
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 2 { target { has_arch_pwr8 
&& { has_arch_ppc64 && { ! has_arch_pwr9 } } } } } } */
+
+union di_sf_sf
+{
+  struct {float f1; float f2;};
+  long long l;
+};
+
+int main()
+{
+  union di_sf_sf v;
+  v.f1 = 1.0f;
+  v.f2 = 2.0f;
+  if (sf_from_di_off0 (v.l) != 1.0f || sf_from_di_off4 (v.l) != 2.0f )
+__builtin_abort ();
+  return 0;
+}
-- 
2.17.1



Re: [PATCH v2] libstdc++: Fix Unicode codecvt and add tests [PR86419]

2023-01-10 Thread Dimitrij Mijoski via Gcc-patches
On Tue, 2023-01-10 at 13:28 +, Jonathan Wakely wrote:
> Thanks for the patch. Do you have a copyright assignment for gcc
> filed with the FSF? 

Yes, I have already signed the copyright assignment.


Re: [PATCH v2] libstdc++: Fix Unicode codecvt and add tests [PR86419]

2023-01-10 Thread Jonathan Wakely via Gcc-patches
On Tue, 10 Jan 2023, 13:00 Dimitrij Mijoski wrote:

> Fixes the conversion from UTF-8 to UTF-16 to properly return partial
> instead ok.
> Fixes the conversion from UTF-16 to UTF-8 to properly return partial
> instead ok.
> Fixes the conversion from UTF-8 to UCS-2 to properly return partial
> instead error.
> Fixes the conversion from UTF-8 to UCS-2 to treat 4-byte UTF-8 sequences
> as error just by seeing the leading byte.
> Fixes UTF-8 decoding for all codecvts so they detect error at the end of
> the input range when the last code point is also incomplete.
>

Thanks for the patch. Do you have a copyright assignment for gcc filed with
the FSF? If not, we require that, or a DCO sign-off. See
https://gcc.gnu.org/contribute.html#legal
and
https://gcc.gnu.org/dco.html
for more details.


Re: gcc-13/changes.html: Mention -fstrict-flex-arrays and its impact

2023-01-10 Thread Qing Zhao via Gcc-patches



> On Jan 10, 2023, at 3:06 AM, Richard Biener  wrote:
> 
> On Mon, 9 Jan 2023, Qing Zhao wrote:
> 
>> 
>> 
>>> On Jan 9, 2023, at 2:11 AM, Richard Biener  wrote:
>>> 
>>> On Thu, 22 Dec 2022, Qing Zhao wrote:
>>> 
 
 
> On Dec 22, 2022, at 2:09 AM, Richard Biener  wrote:
> 
> On Wed, 21 Dec 2022, Qing Zhao wrote:
> 
>> Hi, Richard,
>> 
>> Thanks a lot for your comments.
>> 
>>> On Dec 21, 2022, at 2:12 AM, Richard Biener  wrote:
>>> 
>>> On Tue, 20 Dec 2022, Qing Zhao wrote:
>>> 
 Hi,
 
 This is the patch for mentioning -fstrict-flex-arrays and 
 -Warray-bounds=2 changes in gcc-13/changes.html.
 
 Let me know if you have any comment or suggestions.
>>> 
>>> Some copy editing below
>>> 
 Thanks.
 
 Qing.
 
 ===
 From c022076169b4f1990b91f7daf4cc52c6c5535228 Mon Sep 17 00:00:00 2001
 From: Qing Zhao 
 Date: Tue, 20 Dec 2022 16:13:04 +
 Subject: [PATCH] gcc-13/changes: Mention -fstrict-flex-arrays and its 
 impact.
 
 ---
 htdocs/gcc-13/changes.html | 15 +++
 1 file changed, 15 insertions(+)
 
 diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
 index 689178f9..47b3d40f 100644
 --- a/htdocs/gcc-13/changes.html
 +++ b/htdocs/gcc-13/changes.html
 @@ -39,6 +39,10 @@ a work-in-progress.
  Legacy debug info compression option -gz=zlib-gnu 
 was removed
and the option is ignored right now.
  New debug info compression option value -gz=zstd has 
 been added.
 +-Warray-bounds=2 will no longer issue warnings 
 for out of bounds
 +  accesses to trailing struct members of one-element array type 
 anymore. Please
 +  add -fstrict-flex-arrays=level to control how the 
 compiler treat
 +  trailing arrays of structures as flexible array members. 
>>> 
>>> "Instead it diagnoses accesses to trailing arrays according to 
>>> -fstrict-flex-arrays."
>> 
>> Okay.
>>> 
 
 
 
 @@ -409,6 +413,17 @@ a work-in-progress.
 Other significant improvements
 
 
 +Treating trailing arrays as flexible array 
 members
 +
 +
 + GCC can now control when to treat the trailing array of a 
 structure as a 
 + flexible array member for the purpose of accessing the elements 
 of such
 + an array. By default, all trailing arrays of structures are 
 treated as
>>> 
>>> all trailing arrays in aggregates are treated
>> Okay.
>>> 
 + flexible array members. Use the new command-line option
 + -fstrict-flex-array=level to control how GCC treats 
 the trailing
 + array of a structure as a flexible array member at different 
 levels.
>>> 
>>> -fstrict-flex-arrays to control which trailing array
>>> members are streated as flexible arrays.
>> 
>> Okay.
>> 
>>> 
>>> I've also just now noticed that there's now a flag_strict_flex_arrays
>>> check in the middle-end (in array bound diagnostics) but this option
>>> isn't streamed or handled with LTO.  I think you want to replace that
>>> with the appropriate DECL_NOT_FLEXARRAY check.
>> 
>> We need to know the level value of the strict_flex_arrays on the struct 
>> field to issue proper warnings at different levels. DECL_NOT_FLEXARRAY 
>> does not include such info. So, what should I do? Streaming the 
>> flag_strict_flex_arrays with LTO?
> 
> But you do
> 
> if (compref)
>  {
>/* Try to determine special array member type for this 
> COMPONENT_REF.  */
>sam = component_ref_sam_type (arg);
>/* Get the level of strict_flex_array for this array field.  */
>tree afield_decl = TREE_OPERAND (arg, 1);
>strict_flex_array_level = strict_flex_array_level_of (afield_decl);
> 
> I see that function doesn't look at DECL_NOT_FLEXARRAY but just
> checks attributes (those are streamed in LTO).
 
 Yes, checked both flag_strict_flex_arrays and attributes. 
 
 There are two places in middle end calling ?strict_flex_array_level_of? 
 function, 
 one inside ?array_bounds_checker::check_array_ref?, another one inside 
 ?component_ref_size?.
 Shall we check DECL_NOT_FLEXARRAY field instead of calling 
 ?strict_flex_array_level_of? in both places?
>>> 
>>> I wonder if that function should check DECL_NOT_FLEXARRAY?
>> 
>> The function ?strict_flex_array_level_of? is intended to query the LEVEL of 
>> strict_flex_array, only check DECL_NOT_FLEXARRAY is not enough. 
>> 
>> So, I 

Re: [PATCH] longlong.h: Do no use asm input cast for clang

2023-01-10 Thread Segher Boessenkool
Hi!

On Tue, Jan 10, 2023 at 09:26:13AM -0300, Adhemerval Zanella Netto wrote:
> On 12/12/22 20:52, Segher Boessenkool wrote:
> > On Mon, Dec 12, 2022 at 02:10:16PM -0300, Adhemerval Zanella Netto wrote:
> > How do you intend to modify all the existing copies of the header that
> > haven't been updated for over a decade already?> 
> > If you think changing all user code that uses longlong.h is a good idea,
> > please change it to not use inline asm, use builtins in some cases but
> > mostly just rewrite things in plain C.  But GCC cannot rewrite user code
> > (not preemptively anyway ;-) ) -- and longlong.h as encountered in the
> > wild (not the one in our libgcc source code) is user code.
> > 
> > If you think changing the copy in libgcc is a good idea, please change
> > the original in glibc first?
> 
> That's my original intention [1], but Joseph stated that GCC is the upstream
> source of this file.  Joseph, would you be ok for a similar patch to glibc
> since gcc is reluctant to accept it?
> 
> [1] https://sourceware.org/pipermail/libc-alpha/2022-October/143050.html

The file starts with

/* longlong.h -- definitions for mixed size 32/64 bit arithmetic.
   Copyright (C) 1991-2022 Free Software Foundation, Inc.

   This file is part of the GNU C Library.

Please change that first then?


Segher


Re: [committed 1/3] libstdc++: Fix std::span constraint for sizeof(size_t) < sizeof(int) [PR108221]

2023-01-10 Thread Jonathan Wakely via Gcc-patches
On Tue, 10 Jan 2023, 12:26 Jakub Jelinek via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> On Tue, Jan 10, 2023 at 11:46:55AM +, Jonathan Wakely via Gcc-patches
> wrote:
> > Tested x86_64-linux. Pushed to trunk.
> >
> > -- >8 --
> >
> > The default constructor has a constraint that is always false if
> > arithmetic on size_t values promotes to int. Rewrite the constraint
> > exactly as written in the standard, which works correctly.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   PR libstdc++/108221
> >   * include/std/span (span::span()): Un-simplify constraint to
> >   work for size_t of lesser rank than int.
> > ---
> >  libstdc++-v3/include/std/span | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/libstdc++-v3/include/std/span
> b/libstdc++-v3/include/std/span
> > index 251fed91abf..b336332b190 100644
> > --- a/libstdc++-v3/include/std/span
> > +++ b/libstdc++-v3/include/std/span
> > @@ -145,7 +145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >
> >constexpr
> >span() noexcept
> > -  requires ((_Extent + 1u) <= 1u)
> > +  requires (_Extent == dynamic_extent || _Extent == 0)
> >: _M_ptr(nullptr), _M_extent(0)
> >{ }
>
> If it would be C++23 only, you could use ((_Extent + 1uz) <= 1uz).
>

That would still promote _Extent and 1uz to int before the addition, and so
would not wrap to zero when _Extent == -1uz

size_t(_Extent + 1) <= 1 would work, but I'm just going to KISS and do what
the standard says.


As this is evaluated at compile time only, it is unfortunate it is
> 3 operations compared to former 2, but not a big deal.  If this was
> in code that would be emitted at runtime, GCC already optimizes
> (x == -1uz || x == 0)
> or
> (x == 0 || x == -1uz)
> to
> ((x + 1uz) <= 1uz)
>
> Jakub
>
>


[PATCH v2] libstdc++: Fix Unicode codecvt and add tests [PR86419]

2023-01-10 Thread Dimitrij Mijoski via Gcc-patches
Fixes the conversion from UTF-8 to UTF-16 to properly return partial
instead ok.
Fixes the conversion from UTF-16 to UTF-8 to properly return partial
instead ok.
Fixes the conversion from UTF-8 to UCS-2 to properly return partial
instead error.
Fixes the conversion from UTF-8 to UCS-2 to treat 4-byte UTF-8 sequences
as error just by seeing the leading byte.
Fixes UTF-8 decoding for all codecvts so they detect error at the end of
the input range when the last code point is also incomplete.

libstdc++-v3/ChangeLog:
PR libstdc++/86419
* src/c++11/codecvt.cc: Fix bugs.
* testsuite/22_locale/codecvt/codecvt_unicode.cc: New tests.
* testsuite/22_locale/codecvt/codecvt_unicode.h: New tests.
* testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc: New
  tests.
---
 libstdc++-v3/src/c++11/codecvt.cc |   38 +-
 .../22_locale/codecvt/codecvt_unicode.cc  |   68 +
 .../22_locale/codecvt/codecvt_unicode.h   | 1268 +
 .../codecvt/codecvt_unicode_wchar_t.cc|   59 +
 4 files changed, 1414 insertions(+), 19 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc
 create mode 100644 libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h
 create mode 100644 
libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc

diff --git a/libstdc++-v3/src/c++11/codecvt.cc 
b/libstdc++-v3/src/c++11/codecvt.cc
index 9f8cb7677..49282a510 100644
--- a/libstdc++-v3/src/c++11/codecvt.cc
+++ b/libstdc++-v3/src/c++11/codecvt.cc
@@ -277,13 +277,15 @@ namespace
 }
 else if (c1 < 0xF0) // 3-byte sequence
 {
-  if (avail < 3)
+  if (avail < 2)
return incomplete_mb_character;
   char32_t c2 = (unsigned char) from[1];
   if ((c2 & 0xC0) != 0x80)
return invalid_mb_sequence;
   if (c1 == 0xE0 && c2 < 0xA0) // overlong
return invalid_mb_sequence;
+  if (avail < 3)
+   return incomplete_mb_character;
   char32_t c3 = (unsigned char) from[2];
   if ((c3 & 0xC0) != 0x80)
return invalid_mb_sequence;
@@ -292,9 +294,9 @@ namespace
from += 3;
   return c;
 }
-else if (c1 < 0xF5) // 4-byte sequence
+else if (c1 < 0xF5 && maxcode > 0x) // 4-byte sequence
 {
-  if (avail < 4)
+  if (avail < 2)
return incomplete_mb_character;
   char32_t c2 = (unsigned char) from[1];
   if ((c2 & 0xC0) != 0x80)
@@ -302,10 +304,14 @@ namespace
   if (c1 == 0xF0 && c2 < 0x90) // overlong
return invalid_mb_sequence;
   if (c1 == 0xF4 && c2 >= 0x90) // > U+10
-  return invalid_mb_sequence;
+   return invalid_mb_sequence;
+  if (avail < 3)
+   return incomplete_mb_character;
   char32_t c3 = (unsigned char) from[2];
   if ((c3 & 0xC0) != 0x80)
return invalid_mb_sequence;
+  if (avail < 4)
+   return incomplete_mb_character;
   char32_t c4 = (unsigned char) from[3];
   if ((c4 & 0xC0) != 0x80)
return invalid_mb_sequence;
@@ -527,12 +533,11 @@ namespace
   // Flag indicating whether to process UTF-16 or UCS2
   enum class surrogates { allowed, disallowed };
 
-  // utf8 -> utf16 (or utf8 -> ucs2 if s == surrogates::disallowed)
-  template
-  codecvt_base::result
-  utf16_in(range& from, range& to,
-  unsigned long maxcode = max_code_point, codecvt_mode mode = {},
-  surrogates s = surrogates::allowed)
+  // utf8 -> utf16 (or utf8 -> ucs2 if maxcode <= 0x)
+  template 
+  codecvt_base::result utf16_in (range , range ,
+unsigned long maxcode = max_code_point,
+codecvt_mode mode = {})
   {
 read_utf8_bom(from, mode);
 while (from.size() && to.size())
@@ -540,12 +545,7 @@ namespace
auto orig = from;
const char32_t codepoint = read_utf8_code_point(from, maxcode);
if (codepoint == incomplete_mb_character)
- {
-   if (s == surrogates::allowed)
- return codecvt_base::partial;
-   else
- return codecvt_base::error; // No surrogates in UCS2
- }
+ return codecvt_base::partial;
if (codepoint > maxcode)
  return codecvt_base::error;
if (!write_utf16_code_point(to, codepoint, mode))
@@ -554,7 +554,7 @@ namespace
return codecvt_base::partial;
  }
   }
-return codecvt_base::ok;
+return from.size () ? codecvt_base::partial : codecvt_base::ok;
   }
 
   // utf16 -> utf8 (or ucs2 -> utf8 if s == surrogates::disallowed)
@@ -576,7 +576,7 @@ namespace
  return codecvt_base::error; // No surrogates in UCS-2
 
if (from.size() < 2)
- return codecvt_base::ok; // stop converting at this point
+ return codecvt_base::partial; // stop converting at this point
 
const char32_t c2 = from[1];
if (is_low_surrogate(c2))
@@ -629,7 +629,7 @@ namespace
   {
 // UCS-2 

Re: [PATCH] longlong.h: Do no use asm input cast for clang

2023-01-10 Thread Adhemerval Zanella Netto via Gcc-patches



On 12/12/22 20:52, Segher Boessenkool wrote:
> On Mon, Dec 12, 2022 at 02:10:16PM -0300, Adhemerval Zanella Netto wrote:
>> On 30/11/22 20:24, Segher Boessenkool wrote:
>>> I understand that the casts should be no-ops on the asm side (maybe they
>>> change the sign) and they are present as type-checking.  Can we implement
>>> this type-checking in a different (portable) way?  I think the macro you use
>>> should be named like __asm_output_check_type (..) or so to indicate the
>>> intended purpose.
> 
> I didn't write that.  Please quote correctly.  Thanks!
> 
>> I do not think trying to leverage it on clang side would yield much, it
>> seems that it really does not want to support this extension.  I am not
>> sure we can really make it portable, best option I can think of would to
>> add a mix of __builtin_classify_type and typeof prior asm call (we do
>> something similar to powerp64 syscall code on glibc), although it would
>> still require some gcc specific builtins.
>>
>> I am open for ideas, since to get this header to be clang-compatible on
>> glibc it requires to get it first on gcc.
> 
> How do you intend to modify all the existing copies of the header that
> haven't been updated for over a decade already?> 
> If you think changing all user code that uses longlong.h is a good idea,
> please change it to not use inline asm, use builtins in some cases but
> mostly just rewrite things in plain C.  But GCC cannot rewrite user code
> (not preemptively anyway ;-) ) -- and longlong.h as encountered in the
> wild (not the one in our libgcc source code) is user code.
> 
> If you think changing the copy in libgcc is a good idea, please change
> the original in glibc first?

That's my original intention [1], but Joseph stated that GCC is the upstream
source of this file.  Joseph, would you be ok for a similar patch to glibc
since gcc is reluctant to accept it?

[1] https://sourceware.org/pipermail/libc-alpha/2022-October/143050.html


Re: [committed 1/3] libstdc++: Fix std::span constraint for sizeof(size_t) < sizeof(int) [PR108221]

2023-01-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 10, 2023 at 11:46:55AM +, Jonathan Wakely via Gcc-patches wrote:
> Tested x86_64-linux. Pushed to trunk.
> 
> -- >8 --
> 
> The default constructor has a constraint that is always false if
> arithmetic on size_t values promotes to int. Rewrite the constraint
> exactly as written in the standard, which works correctly.
> 
> libstdc++-v3/ChangeLog:
> 
>   PR libstdc++/108221
>   * include/std/span (span::span()): Un-simplify constraint to
>   work for size_t of lesser rank than int.
> ---
>  libstdc++-v3/include/std/span | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
> index 251fed91abf..b336332b190 100644
> --- a/libstdc++-v3/include/std/span
> +++ b/libstdc++-v3/include/std/span
> @@ -145,7 +145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  
>constexpr
>span() noexcept
> -  requires ((_Extent + 1u) <= 1u)
> +  requires (_Extent == dynamic_extent || _Extent == 0)
>: _M_ptr(nullptr), _M_extent(0)
>{ }

If it would be C++23 only, you could use ((_Extent + 1uz) <= 1uz).
As this is evaluated at compile time only, it is unfortunate it is
3 operations compared to former 2, but not a big deal.  If this was
in code that would be emitted at runtime, GCC already optimizes
(x == -1uz || x == 0)
or
(x == 0 || x == -1uz)
to
((x + 1uz) <= 1uz)

Jakub



[committed 1/3] libstdc++: Fix std::span constraint for sizeof(size_t) < sizeof(int) [PR108221]

2023-01-10 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

The default constructor has a constraint that is always false if
arithmetic on size_t values promotes to int. Rewrite the constraint
exactly as written in the standard, which works correctly.

libstdc++-v3/ChangeLog:

PR libstdc++/108221
* include/std/span (span::span()): Un-simplify constraint to
work for size_t of lesser rank than int.
---
 libstdc++-v3/include/std/span | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index 251fed91abf..b336332b190 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -145,7 +145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   constexpr
   span() noexcept
-  requires ((_Extent + 1u) <= 1u)
+  requires (_Extent == dynamic_extent || _Extent == 0)
   : _M_ptr(nullptr), _M_extent(0)
   { }
 
-- 
2.39.0



[committed 2/3] libstdc++: Fix some algos for 16-bit size_t [PR108221]

2023-01-10 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

Some standard algorithms fail to compile when size_t or ptrdiff_t is
narrower than int. The __lg helper function is ambiguous if ptrdiff_t is
short or __int20, so replace it with a function template that works for
those types as well as signed/unsigned int/long/long long. The helpers
for stable_sort perform arithmetic on size values and assume the types
won't change, which isn't true if the type promotes to int.

libstdc++-v3/ChangeLog:

PR libstdc++/108221
* include/bits/stl_algobase.h (__lg): Replace six overloads with
a single function template for all integer types.
* include/bits/stl_algo.h (__merge_adaptive_resize): Cast
arithmetic results back to _Distance.
---
 libstdc++-v3/include/bits/stl_algo.h |  5 ++-
 libstdc++-v3/include/bits/stl_algobase.h | 47 
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 6386918fc8b..eed04b3c1b8 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -2458,13 +2458,14 @@ _GLIBCXX_END_INLINE_ABI_NAMESPACE(_V2)
 
  _BidirectionalIterator __new_middle
= std::__rotate_adaptive(__first_cut, __middle, __second_cut,
-__len1 - __len11, __len22,
+_Distance(__len1 - __len11), __len22,
 __buffer, __buffer_size);
  std::__merge_adaptive_resize(__first, __first_cut, __new_middle,
   __len11, __len22,
   __buffer, __buffer_size, __comp);
  std::__merge_adaptive_resize(__new_middle, __second_cut, __last,
-  __len1 - __len11, __len2 - __len22,
+  _Distance(__len1 - __len11),
+  _Distance(__len2 - __len22),
   __buffer, __buffer_size, __comp);
}
 }
diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index ae898ed3706..566b6d9c4bc 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -72,7 +72,10 @@
 #if __cplusplus >= 201103L
 # include 
 #endif
-#if __cplusplus > 201703L
+#if __cplusplus >= 201402L
+# include  // std::__bit_width
+#endif
+#if __cplusplus >= 202002L
 # include 
 #endif
 
@@ -1505,29 +1508,25 @@ _GLIBCXX_END_NAMESPACE_CONTAINER
 
   /// This is a helper function for the sort routines and for random.tcc.
   //  Precondition: __n > 0.
-  inline _GLIBCXX_CONSTEXPR int
-  __lg(int __n)
-  { return (int)sizeof(int) * __CHAR_BIT__  - 1 - __builtin_clz(__n); }
-
-  inline _GLIBCXX_CONSTEXPR unsigned
-  __lg(unsigned __n)
-  { return (int)sizeof(int) * __CHAR_BIT__  - 1 - __builtin_clz(__n); }
-
-  inline _GLIBCXX_CONSTEXPR long
-  __lg(long __n)
-  { return (int)sizeof(long) * __CHAR_BIT__ - 1 - __builtin_clzl(__n); }
-
-  inline _GLIBCXX_CONSTEXPR unsigned long
-  __lg(unsigned long __n)
-  { return (int)sizeof(long) * __CHAR_BIT__ - 1 - __builtin_clzl(__n); }
-
-  inline _GLIBCXX_CONSTEXPR long long
-  __lg(long long __n)
-  { return (int)sizeof(long long) * __CHAR_BIT__ - 1 - __builtin_clzll(__n); }
-
-  inline _GLIBCXX_CONSTEXPR unsigned long long
-  __lg(unsigned long long __n)
-  { return (int)sizeof(long long) * __CHAR_BIT__ - 1 - __builtin_clzll(__n); }
+  template
+inline _GLIBCXX_CONSTEXPR _Tp
+__lg(_Tp __n)
+{
+#if __cplusplus >= 201402L
+  return std::__bit_width(make_unsigned_t<_Tp>(__n)) - 1;
+#else
+  // Use +__n so it promotes to at least int.
+  const int __sz = sizeof(+__n);
+  int __w = __sz * __CHAR_BIT__ - 1;
+  if (__sz == sizeof(long long))
+   __w -= __builtin_clzll(+__n);
+  else if (__sz == sizeof(long))
+   __w -= __builtin_clzl(+__n);
+  else if (__sz == sizeof(int))
+   __w -= __builtin_clz(+__n);
+  return __w;
+#endif
+}
 
 _GLIBCXX_BEGIN_NAMESPACE_ALGO
 
-- 
2.39.0



[committed 3/3] libstdc++: Fix tzdb.cc to compile with -fno-exceptions

2023-01-10 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc (tzdb_list::_S_init_tzdb): Use __try and
__catch macros for exception handling.
---
 libstdc++-v3/src/c++20/tzdb.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/src/c++20/tzdb.cc b/libstdc++-v3/src/c++20/tzdb.cc
index 7227fe7cfe6..e335ea61c4d 100644
--- a/libstdc++-v3/src/c++20/tzdb.cc
+++ b/libstdc++-v3/src/c++20/tzdb.cc
@@ -1197,11 +1197,11 @@ namespace std::chrono
   const tzdb&
   tzdb_list::_Node::_S_init_tzdb()
   {
-try
+__try
   {
return reload_tzdb();
   }
-catch (const std::exception&)
+__catch (const std::exception&)
   {
auto [leaps, ok] = _S_read_leap_seconds();
 
-- 
2.39.0



OpenMP Patch Ping

2023-01-10 Thread Tobias Burnus

Hi all, hello Jakub,

Below is the updated list to last ping,
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607178.html

NOTE to the list below: I have stopped checking older patches. I know
some more are pending review, others need to be revised. I will re-check,
once the below listed patches have been reviewed. Cf. old list.

Thanks for the reviews done in between the last ping and now!

 * * *

Small patches
=

* [Patch] Fortran: Extend align-clause checks of OpenMP's allocate clause
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608401.html
  Tue Dec 13 16:38:22 GMT 2022

* [Patch] OpenMP: Parse align clause in allocate directive in C/C++
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608404.html
  Tue Dec 13 17:44:27 GMT 2022

* Re: [Patch] libgomp.texi: Reverse-offload updates (was: [Patch] libgomp: 
Handle OpenMP's reverse offloads)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608245.html
  Thu Nov 24 12:01:04 GMT 2022

(Side note: wwwdocs also needs to be updated for the latter patch and
some other patches done in the meanwhile.)


Fortran allocat(e,ors) prep patch
=

* [Patch] Fortran/OpenMP: Add parsing support for allocators/allocate directive 
(was: [Patch] Fortran/OpenMP: Add parsing support for allocators directive)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608904.html
  Wed Dec 21 15:51:25 GMT 2022

(Remark: While written from scratch, it is kind of a follow-up to Abid's patch
   [PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 
5.0)
you/Jakub reviewed on Tue Oct 11 12:13:14 GMT 2022, i.e.
 https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603258.html
- For the actual implementation of 'allocators', we still have to solve the 
issues
  raised in the review for '[PATCH 2/5] [gfortran] Translate allocate directive 
(OpenMP 5.0).'.
  at https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603279.html (and 
earlier in the thread);
  implementing 'omp allocate' (Fortran/C/C++) seems to be easier but no one has 
started implementing
  it so far - only parsing support exists.
- The USM patches on semi-USM system run into a similar issue as 'allocators' 
and for it, some
  ME omp_allocate is added.)


Mapping related patches
===
(Complex but GCC needs a revision badly as it fixing several bugs and missing 
functionality.)

* Complete patch set was just re-submitted by Julian, overiew patch is
  [PATCH v6 00/11] OpenMP: C/C++ lvalue parsing, C/C++/Fortran "declare mapper" 
support
  https://gcc.gnu.org/pipermail/gcc-patches/2022-December/thread.html#609031
  Fri Dec 23 12:12:53 GMT 2022
* Note: For 10/11 of the set, there was a follow up this Monday
  [PATCH v6 10/11] OpenMP: Support OpenMP 5.0 "declare mapper" directives for C
  https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609566.html

[As it relates to one patch in the series:
  '[Patch] Fortran/OpenMP: Fix DT struct-component with 'alloc' and array descr'
That's mine, needs to be updated (WIP) and fixes array 
descriptor/alloc-string-length var
issues, where descriptor/string length may need to be handled explicitly on 
data entering map,
i.e. string lengths/allocator may require 'to:' instead of 'alloc:' - and on 
data exit mapping,
the current code might add a bogus 'alloc:'. - Idea is to handle this explicitly
in fortran/trans-openmp.cc instead of auto-adding it in the ME.
Status: WIP - removed in ME but not all cases are handled yet in FE.)


Fortran deep mapping (allocatable components)

(Old patch of March 2022, but first part now properly but belated submitted - 
today):
[Patch][1/2] OpenMP: Add lang hooks + run-time filled map arrays for Fortran 
deep mapping of DT
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609637.html

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[Patch][1/2] OpenMP: Add lang hooks + run-time filled map arrays for Fortran deep mapping of DT

2023-01-10 Thread Tobias Burnus

This patches is the ME part to support OpenMP 5.0's deep-mapping
feature, i.e. mapping allocatable components of Fortran's derived types
automatically. [Not the lang hooks but allocatate-array part will probably
also be useful when later adding 'iterator'-modifier support to the
'map'/'to'/'from' clauses.]

This is a belated real submission of the patch sent in March 2022,
  https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591144.html
(with FE fixes at 
https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593562.html
  (note to self: Bernhard did sent some comment fixes off list)
+ https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593704.html )
+ ME fix for OpenACC at 
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603906.html
[which is in the attach patch]

As written, attached is the ME part. Below is a description how
it is supposed to get used; the patch links above show how it looks
in the real-code FE.


==
BACKGROUND
==

Fortran permits

type t
  integer, allocatable :: x, y(:)
end type t
type t2
  type(t2), allocatable :: previous_stack  ! Not valid in OMP 5.0
  integer, allocatable :: a
  type(t) :: b, c(:)
end type t2
type(t2) :: var1, var2(:)

!$omp target enter data(var1, var2)

Where all allocatable components need to be mapped alongside. The number of
mappings is only known at runtime, e.g. for 'var2' - the array size is only
known at runtime and then each allocatable component of each element of
'var2' needs to be mapped - both those can contain allocatable components as
well, which have to be mapped - but of course only if the parent component
is actually allocated.

 * * *

The current code puts 'kinds' with const values into an array, 'sizes' in
a fixed-size stack array (either with const or dynamic values) and 'addrs'
is a struct.

To support deep mapping, those all have to be dynamic; hence, the arrays
'sizes' and 'kinds' are turned into pointers - and the 'struct' gets a
tailing variable-size array, which is then filled with the dynamic content.

For this purpose, three lang hooks are added - all are called rather late,
i.e. during omp-low.cc, such that all previous operations (implicit mapping,
explicit mapping, OpenMP mapper) are already done.

* First one to check whether there is any allocatable component for a map-clause
  element (explicitly or implicitly added). If not, the current code is used.
  Otherwise, it uses dynamically allocated arrays
(Side note: As the size is now only known at runtime, TREE_VEC has now another
 element - the array size - hence the change to expand_omp_target, before it
 was known statically from the type.)

* Second hook to actually count how many allocations are done, required for
  the allocation.

* Third hook to actually fill the arrays.


Comments? Remarks?

Tobias

PS: There are two things to watch out in the future:
- 'mapper': I think it should work when the mapper is present as it comes rather
  late in the flow, but I have not checked with Julian's patches (pending 
review).
- Order: the dynamic items are added last to 'addrs' to permit keeping the 
'struct'
  type. I think that's fine for allocatable components as they are added rather 
late
  and accessing them via 'is_device_ptr' is not possible.
  But there might be some issues with 'interator' in future; something to watch 
out.
  If so, we may need to partially or fully give up on putting all others 
mappings stillinto
  the struct.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Add lang hooks + run-time filled map arrays for Fortran deep mapping of DT

This patch adds middle end support for mapping Fortran derived-types with
allocatable components. If those are present, the kinds/sizes arrays will be
allocated at run time and the addrs struct gets an variable-sized array at
the end. The newly added hooks are:
  * lhd_omp_deep_mapping_p: If true, use the new code.
  * lhd_omp_deep_mapping_cnt: Count the elements, needed for allocation.
  * lhd_omp_deep_mapping: Fill the allocated arrays.

gcc/ChangeLog:

	* langhooks-def.h (lhd_omp_deep_mapping_p,
	lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New.
	(LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT,
	LANG_HOOKS_OMP_DEEP_MAPPING): Define.
	(LANG_HOOKS_DECLS): Use it.
	* langhooks.cc (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt,
	lhd_omp_deep_mapping): New stubs.
	* langhooks.h (struct lang_hooks_for_decls): Add new hooks
	* omp-expand.cc (expand_omp_target): Handle dynamic-size
	addr/sizes/kinds arrays.
	* omp-low.cc (build_sender_ref, fixup_child_record_type,
	scan_sharing_clauses, lower_omp_target): Update to handle
	new hooks and dynamic-size addr/sizes/kinds arrays.

 gcc/langhooks-def.h |  10 +++
 gcc/langhooks.cc|  24 ++
 

Re: [PATCH] tree-optimization/108314 - avoid BIT_NOT optimization for extract-last

2023-01-10 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, 10 Jan 2023, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > The extract-last reduction internal function expects the then and
>> > else clause as vector and scalar and thus we cannot perform optimization
>> > of the inversion of the condition by swapping the then/else clauses.
>> >
>> > Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
>> 
>> Sorry for not having found the time to look at the PR yet.
>> Like you say in the trail, it seems kind-of familiar.
>> 
>> I think we should instead prevent the else in:
>> 
>>scalar_cond_masked_key cond (cond_expr, ncopies);
>>if (loop_vinfo->scalar_cond_masked_set.contains (cond))
>>  masks = _VINFO_MASKS (loop_vinfo);
>>else
>>  {
>> 
>> for EXTRACT_LAST.  We've lost as soon as swap_cond_operands gets
>> set to true.
>
> But we're not getting there - the above is guarded with
>
>   if (reduction_type == EXTRACT_LAST_REDUCTION) 
> masks = _VINFO_MASKS (loop_vinfo); 
>   else
> {
>
> instead we run into
>
>   if (masked)
> vec_compare = vec_cond_lhs;
>   else
> {
>   vec_cond_rhs = vec_oprnds1[i];
>   if (bitop1 == NOP_EXPR)
> {
> ...
>   else
> {
> ...
>   else if (bitop2 == BIT_NOT_EXPR
> {
>   /* Instead of doing ~x ? y : z do x ? z : y.  */
>   vec_compare = new_temp;
>   std::swap (vec_then_clause, vec_else_clause);
>
> so we could instead reject vectorizing for EQ_EXPR but then
> applying the negation to the condition allows this to be
> vectorized just fine (which is what the patch does)?

Ah, OK.  I wasn't sure which of the paths we were going down to get here.

So yeah, I agree the patch is OK.  Sorry for the noise.

Richard

> Richard.
>
>> Thanks,
>> Richard
>> 
>> > Thanks,
>> > Richard.
>> >
>> >PR tree-optimization/108314
>> >* tree-vect-stmts.cc (vectorizable_condition): Do not
>> >perform BIT_NOT_EXPR optimization for EXTRACT_LAST_REDUCTION.
>> >
>> >* gcc.dg/vect/pr108314.c: New testcase.
>> > ---
>> >  gcc/testsuite/gcc.dg/vect/pr108314.c | 16 
>> >  gcc/tree-vect-stmts.cc   | 13 +
>> >  2 files changed, 25 insertions(+), 4 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr108314.c
>> >
>> > diff --git a/gcc/testsuite/gcc.dg/vect/pr108314.c 
>> > b/gcc/testsuite/gcc.dg/vect/pr108314.c
>> > new file mode 100644
>> > index 000..07260e06915
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.dg/vect/pr108314.c
>> > @@ -0,0 +1,16 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
>> > +
>> > +int x, y, z;
>> > +
>> > +void f(void)
>> > +{
>> > +  int t = 4;
>> > +  for (; x; x++)
>> > +{
>> > +  if (y)
>> > +  continue;
>> > +  t = 0;
>> > +}
>> > +  z = t;
>> > +}
>> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> > index 6ddd41fb473..eb4ca1f184e 100644
>> > --- a/gcc/tree-vect-stmts.cc
>> > +++ b/gcc/tree-vect-stmts.cc
>> > @@ -10677,7 +10677,8 @@ vectorizable_condition (vec_info *vinfo,
>> >  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
>> >  if (bitop2 == NOP_EXPR)
>> >vec_compare = new_temp;
>> > -else if (bitop2 == BIT_NOT_EXPR)
>> > +else if (bitop2 == BIT_NOT_EXPR
>> > + && reduction_type != EXTRACT_LAST_REDUCTION)
>> >{
>> >  /* Instead of doing ~x ? y : z do x ? z : y.  */
>> >  vec_compare = new_temp;
>> > @@ -10686,9 +10687,13 @@ vectorizable_condition (vec_info *vinfo,
>> >  else
>> >{
>> >  vec_compare = make_ssa_name (vec_cmp_type);
>> > -new_stmt
>> > -  = gimple_build_assign (vec_compare, bitop2,
>> > - vec_cond_lhs, new_temp);
>> > +if (bitop2 == BIT_NOT_EXPR)
>> > +  new_stmt
>> > += gimple_build_assign (vec_compare, bitop2, new_temp);
>> > +else
>> > +  new_stmt
>> > += gimple_build_assign (vec_compare, bitop2,
>> > +   vec_cond_lhs, new_temp);
>> >  vect_finish_stmt_generation (vinfo, stmt_info,
>> >   new_stmt, gsi);
>> >}
>> 


Re: [PATCH] tree-optimization/108314 - avoid BIT_NOT optimization for extract-last

2023-01-10 Thread Richard Biener via Gcc-patches
On Tue, 10 Jan 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> > The extract-last reduction internal function expects the then and
> > else clause as vector and scalar and thus we cannot perform optimization
> > of the inversion of the condition by swapping the then/else clauses.
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> 
> Sorry for not having found the time to look at the PR yet.
> Like you say in the trail, it seems kind-of familiar.
> 
> I think we should instead prevent the else in:
> 
> scalar_cond_masked_key cond (cond_expr, ncopies);
> if (loop_vinfo->scalar_cond_masked_set.contains (cond))
>   masks = _VINFO_MASKS (loop_vinfo);
> else
>   {
> 
> for EXTRACT_LAST.  We've lost as soon as swap_cond_operands gets
> set to true.

But we're not getting there - the above is guarded with

  if (reduction_type == EXTRACT_LAST_REDUCTION) 
masks = _VINFO_MASKS (loop_vinfo); 
  else
{

instead we run into

  if (masked)
vec_compare = vec_cond_lhs;
  else
{
  vec_cond_rhs = vec_oprnds1[i];
  if (bitop1 == NOP_EXPR)
{
...
  else
{
...
  else if (bitop2 == BIT_NOT_EXPR
{
  /* Instead of doing ~x ? y : z do x ? z : y.  */
  vec_compare = new_temp;
  std::swap (vec_then_clause, vec_else_clause);

so we could instead reject vectorizing for EQ_EXPR but then
applying the negation to the condition allows this to be
vectorized just fine (which is what the patch does)?

Richard.

> Thanks,
> Richard
> 
> > Thanks,
> > Richard.
> >
> > PR tree-optimization/108314
> > * tree-vect-stmts.cc (vectorizable_condition): Do not
> > perform BIT_NOT_EXPR optimization for EXTRACT_LAST_REDUCTION.
> >
> > * gcc.dg/vect/pr108314.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/vect/pr108314.c | 16 
> >  gcc/tree-vect-stmts.cc   | 13 +
> >  2 files changed, 25 insertions(+), 4 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/pr108314.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/pr108314.c 
> > b/gcc/testsuite/gcc.dg/vect/pr108314.c
> > new file mode 100644
> > index 000..07260e06915
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/pr108314.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> > +
> > +int x, y, z;
> > +
> > +void f(void)
> > +{
> > +  int t = 4;
> > +  for (; x; x++)
> > +{
> > +  if (y)
> > +   continue;
> > +  t = 0;
> > +}
> > +  z = t;
> > +}
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index 6ddd41fb473..eb4ca1f184e 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -10677,7 +10677,8 @@ vectorizable_condition (vec_info *vinfo,
> >   vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
> >   if (bitop2 == NOP_EXPR)
> > vec_compare = new_temp;
> > - else if (bitop2 == BIT_NOT_EXPR)
> > + else if (bitop2 == BIT_NOT_EXPR
> > +  && reduction_type != EXTRACT_LAST_REDUCTION)
> > {
> >   /* Instead of doing ~x ? y : z do x ? z : y.  */
> >   vec_compare = new_temp;
> > @@ -10686,9 +10687,13 @@ vectorizable_condition (vec_info *vinfo,
> >   else
> > {
> >   vec_compare = make_ssa_name (vec_cmp_type);
> > - new_stmt
> > -   = gimple_build_assign (vec_compare, bitop2,
> > -  vec_cond_lhs, new_temp);
> > + if (bitop2 == BIT_NOT_EXPR)
> > +   new_stmt
> > + = gimple_build_assign (vec_compare, bitop2, new_temp);
> > + else
> > +   new_stmt
> > + = gimple_build_assign (vec_compare, bitop2,
> > +vec_cond_lhs, new_temp);
> >   vect_finish_stmt_generation (vinfo, stmt_info,
> >new_stmt, gsi);
> > }
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [x86 PATCH] PR rtl-optimization/107991: peephole2 to tweak register allocation.

2023-01-10 Thread Richard Sandiford via Gcc-patches
Uros Bizjak  writes:
> On Mon, Jan 9, 2023 at 4:01 PM Roger Sayle  wrote:
>>
>>
>> This patch addresses PR rtl-optimization/107991, which is a P2 regression
>> where GCC currently requires more "mov" instructions than GCC 7.
>>
>> The x86's two address ISA creates some interesting challenges for reload.
>> For example, the tricky "x = y - x" usually needs to be implemented on x86
>> as
>>
>> tmp = x
>> x = y
>> x -= tmp
>>
>> where a scratch register and two mov's are required to work around
>> the lack of a subf (subtract from) or rsub (reverse subtract) insn.
>>
>> Not uncommonly, if y is dead after this subtraction, register allocation
>> can be improved by clobbering y.
>>
>> y -= x
>> x = y
>>
>> For the testcase in PR 107991, things are slightly more complicated,
>> where y is not itself dead, but is assigned from (i.e. equivalent to)
>> a value that is dead.  Hence we have something like:
>>
>> y = z
>> x = y - x
>>
>> so, GCC's reload currently generates the expected shuffle (as y is live):
>>
>> y = z
>> tmp = x
>> x = y
>> x -= tmp
>>
>> but we can use a peephole2 that understands that y and z are equivalent,
>> and that z is dead, to produce the shorter sequence:
>>
>> y = z
>> z -= x
>> x = z
>>
>> In practice, for the new testcase from PR 107991, which before produced:
>>
>> foo:movl%edx, %ecx
>> movl%esi, %edx
>> movl%esi, %eax
>> subl%ecx, %edx
>> testb   %dil, %dil
>> cmovne  %edx, %eax
>> ret
>>
>> with this patch/peephole2 we now produce the much improved:
>>
>> foo:movl%esi, %eax
>> subl%edx, %esi
>> testb   %dil, %dil
>> cmovne  %esi, %eax
>> ret
>>
>>
>> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>> and make -k check, both with and without --target_board=unix{-m32},
>> with no new failures.  Ok for mainline?
>
> Looking at the PR, it looks to me that Richard S (CC'd) wants to solve
> this issue in the register allocator. This would be preferred
> (compared to a very specialized peephole2), since peephole2 pass comes
> very late in the game, so one freed register does not contribute to
> lower the register pressure at all.

Yeah, I think there are three issues that would all be good to fix:

(1) the fwprop regression
(2) the unhelpful pre-RA move
(3) the RA handling of what it sees now

I had a look last year to see where (1) was coming from, and it turned
out to be bad bookkeeping during a recursive walk.  I've got a couple of
competing ideas for how to fix it, but I've not had time to work on it
since then, sorry.

> Peephole2 should be used to clean after reload only in rare cases when
> target ISA prevents generic solution. From your description, a generic
> solution would benefit all targets with destructive subtraction (or
> perhaps also for other noncommutative operations).

Yeah, agree that peepholes aren't the best fit here.  The problem could
occur with instructions that are too far apart to be peepholed.

Thanks,
Richard

> So, please coordinate with Richard S regarding this issue.
>
> Thanks,
> Uros.
>
>>
>>
>> 2023-01-09  Roger Sayle  
>>
>> gcc/ChangeLog
>> PR rtl-optimization/107991
>> * config/i386/i386.md (peephole2): New peephole2 to avoid register
>> shuffling before a subtraction, after a register-to-register move.
>>
>> gcc/testsuite/ChangeLog
>> PR rtl-optimization/107991
>> * gcc.target/i386/pr107991.c: New test case.
>>
>>
>> Thanks in advance,
>> Roger
>> --
>>


Re: [PATCH, Modula2] PR-108142 Many empty directories created in the build directory

2023-01-10 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 10, 2023 at 11:16:28AM +0100, Richard Biener via Gcc-patches wrote:
> > @@ -424,7 +388,7 @@ override PLUGINCFLAGS := $(filter-out 
> > -mdynamic-no-pic,$(PLUGINCFLAGS))
> >
> >  plugin/m2rte$(soext): $(srcdir)/m2/plugin/m2rte.cc 
> > $(GCC_HEADER_DEPENDENCIES_FOR_M2) \
> >  insn-attr-common.h insn-flags.h $(generated_files)
> > -   test -d plugin || mkdir plugin
> > +   -test -d plugin || $(mkinstalldirs) plugin
> 
> I wonder if that's possibly racy (that's why you use mkinstalldirs?)?

Using $(mkinstalldirs) in the patch is what I've suggested because
previously the patch was using mkdir -p which we almost never use
(I think only some Ada Makefiles).  Above when it is a single directory
mkdir is fine.
-test -d $(TESTSUITEDIR) || mkdir $(TESTSUITEDIR)
etc. is what is used in gcc/Makefile.in in some spots.
If 2 shells do that test -d plugin || mkdir plugin at the same time,
then yes, both might do mkdir, but that is why we have the - at the start,
the error of doing mkdir twice will be ignored then.

Jakub



Re: [PATCH] tree-optimization/108314 - avoid BIT_NOT optimization for extract-last

2023-01-10 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> The extract-last reduction internal function expects the then and
> else clause as vector and scalar and thus we cannot perform optimization
> of the inversion of the condition by swapping the then/else clauses.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?

Sorry for not having found the time to look at the PR yet.
Like you say in the trail, it seems kind-of familiar.

I think we should instead prevent the else in:

  scalar_cond_masked_key cond (cond_expr, ncopies);
  if (loop_vinfo->scalar_cond_masked_set.contains (cond))
masks = _VINFO_MASKS (loop_vinfo);
  else
{

for EXTRACT_LAST.  We've lost as soon as swap_cond_operands gets
set to true.

Thanks,
Richard

> Thanks,
> Richard.
>
>   PR tree-optimization/108314
>   * tree-vect-stmts.cc (vectorizable_condition): Do not
>   perform BIT_NOT_EXPR optimization for EXTRACT_LAST_REDUCTION.
>
>   * gcc.dg/vect/pr108314.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/pr108314.c | 16 
>  gcc/tree-vect-stmts.cc   | 13 +
>  2 files changed, 25 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr108314.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr108314.c 
> b/gcc/testsuite/gcc.dg/vect/pr108314.c
> new file mode 100644
> index 000..07260e06915
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr108314.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
> +
> +int x, y, z;
> +
> +void f(void)
> +{
> +  int t = 4;
> +  for (; x; x++)
> +{
> +  if (y)
> + continue;
> +  t = 0;
> +}
> +  z = t;
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 6ddd41fb473..eb4ca1f184e 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -10677,7 +10677,8 @@ vectorizable_condition (vec_info *vinfo,
> vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
> if (bitop2 == NOP_EXPR)
>   vec_compare = new_temp;
> -   else if (bitop2 == BIT_NOT_EXPR)
> +   else if (bitop2 == BIT_NOT_EXPR
> +&& reduction_type != EXTRACT_LAST_REDUCTION)
>   {
> /* Instead of doing ~x ? y : z do x ? z : y.  */
> vec_compare = new_temp;
> @@ -10686,9 +10687,13 @@ vectorizable_condition (vec_info *vinfo,
> else
>   {
> vec_compare = make_ssa_name (vec_cmp_type);
> -   new_stmt
> - = gimple_build_assign (vec_compare, bitop2,
> -vec_cond_lhs, new_temp);
> +   if (bitop2 == BIT_NOT_EXPR)
> + new_stmt
> +   = gimple_build_assign (vec_compare, bitop2, new_temp);
> +   else
> + new_stmt
> +   = gimple_build_assign (vec_compare, bitop2,
> +  vec_cond_lhs, new_temp);
> vect_finish_stmt_generation (vinfo, stmt_info,
>  new_stmt, gsi);
>   }


Re: [PATCH] c++: Define built-in for std::tuple_element [PR100157]

2023-01-10 Thread Jonathan Wakely via Gcc-patches
On Mon, 9 Jan 2023 at 19:25, Patrick Palka wrote:
>
> On Mon, 9 Jan 2023, Patrick Palka wrote:
>
> > On Wed, 5 Oct 2022, Patrick Palka wrote:
> >
> > > On Thu, 7 Jul 2022, Jonathan Wakely via Gcc-patches wrote:
> > >
> > > > This adds a new built-in to replace the recursive class template
> > > > instantiations done by traits such as std::tuple_element and
> > > > std::variant_alternative. The purpose is to select the Nth type from a
> > > > list of types, e.g. __builtin_type_pack_element(1, char, int, float) is
> > > > int.
> > > >
> > > > For a pathological example tuple_element_t<1000, tuple<2000 types...>>
> > > > the compilation time is reduced by more than 90% and the memory  used by
> > > > the compiler is reduced by 97%. In realistic examples the gains will be
> > > > much smaller, but still relevant.
> > > >
> > > > Clang has a similar built-in, __type_pack_element, but that's a
> > > > "magic template" built-in using <> syntax, which GCC doesn't support. So
> > > > this provides an equivalent feature, but as a built-in function using
> > > > parens instead of <>. I don't really like the name "type pack element"
> > > > (it gives you an element from a pack of types) but the semi-consistency
> > > > with Clang seems like a reasonable argument in favour of keeping the
> > > > name. I'd be open to alternative names though, e.g. __builtin_nth_type
> > > > or __builtin_type_at_index.
> > >
> > > Rather than giving the trait a different name from __type_pack_element,
> > > I wonder if we could just special case cp_parser_trait to expect <>
> > > instead of parens for this trait?
> > >
> > > Btw the frontend recently got a generic TRAIT_TYPE tree code, which gets
> > > rid of much of the boilerplate of adding a new type-yielding built-in
> > > trait, see e.g. cp-trait.def.
> >
> > Here's a tested patch based on Jonathan's original patch that implements
> > the built-in in terms of TRAIT_TYPE, names it __type_pack_element
> > instead of __builtin_type_pack_element, and treats invocations of it
> > like a template-id instead of a call (to match Clang).

The library change is very much OK, thanks for taking this over.


> >
> > -- >8 --
> >
> > Subject: [PATCH] c++: Define built-in for std::tuple_element [PR100157]
> >
> > This adds a new built-in to replace the recursive class template
> > instantiations done by traits such as std::tuple_element and
> > std::variant_alternative.  The purpose is to select the Nth type from a
> > list of types, e.g. __type_pack_element<1, char, int, float> is int.
> > We implement it as a special kind of TRAIT_TYPE.
> >
> > For a pathological example tuple_element_t<1000, tuple<2000 types...>>
> > the compilation time is reduced by more than 90% and the memory  used by
> > the compiler is reduced by 97%.  In realistic examples the gains will be
> > much smaller, but still relevant.
> >
> > Unlike the other built-in traits, __type_pack_element uses template-id
> > syntax instead of call syntax and is SFINAE-enabled, matching Clang's
> > implementation.  And like the other built-in traits, it's not mangleable
> > so we can't use it directly in function signatures.
> >
> > Some caveats:
> >
> >   * Clang's version of the built-in seems to act like a "magic template"
> > that can e.g. be used as a template template argument.  For simplicity
> > we implement it in a more ad-hoc way.
> >   * Our parsing of the <>'s in __type_pack_element<...> is currently
> > rudimentary and doesn't try to disambiguate a trailing >> vs > >
> > as cp_parser_enclosed_template_argument_list does.
>
> Hmm, this latter caveat turns out to be inconvenient (for code such as
> type_pack_element3.C) and admits an easy workaround inspired by what
> cp_parser_enclosed_template_argument_list does.
>
> v2: Consider the >> in __type_pack_element<0, int, char>> to be two >'s.
> Handle non-type TRAIT_TYPE_TYPE1 in strip_typedefs (for sake of
> CPTK_TYPE_PACK_ELEMENT).
>
> -- >8 --
>
> Subject: [PATCH] c++: Define built-in for std::tuple_element [PR100157]
>
> This adds a new built-in to replace the recursive class template
> instantiations done by traits such as std::tuple_element and
> std::variant_alternative.  The purpose is to select the Nth type from a
> list of types, e.g. __type_pack_element<1, char, int, float> is int.
> We implement it as a special kind of TRAIT_TYPE.
>
> For a pathological example tuple_element_t<1000, tuple<2000 types...>>
> the compilation time is reduced by more than 90% and the memory  used by
> the compiler is reduced by 97%.  In realistic examples the gains will be
> much smaller, but still relevant.
>
> Unlike the other built-in traits, __type_pack_element uses template-id
> syntax instead of call syntax and is SFINAE-enabled, matching Clang's
> implementation.  And like the other built-in traits, it's not mangleable
> so we can't use it directly in function signatures.
>
> N.B. Clang seems to implement __type_pack_element as a first-class
> template that 

Re: [PATCH, Modula2] PR-108142 Many empty directories created in the build directory

2023-01-10 Thread Richard Biener via Gcc-patches
On Tue, Jan 10, 2023 at 2:49 AM Gaius Mulley via Gcc-patches
 wrote:
>
>
> PR-108142 Modula-2 configure generates many subdirectories in the top
> build directory.  This patch dynamically creates subdirectories under
> gcc/m2 if and when required.
>
> Bootstrapped on x86_64 gnu/linux, ok for master?
>
> regards,
> Gaius
>
>
> gcc/m2/ChangeLog:
>
> * Make-lang.in (GM2_1): Change -B path to m2/stage1.
> ($(objdir)/m2/images/gnu.eps): Check and create dest dir
> if necessary.
> (gm2-libs.texi-check): Check and create dir m2/gm2-libs-pim,
> m2/gm2-libs-iso and m2/gm2-libs if necessary.
> ($(objdir)/m2/gm2-compiler-boot): Remove.
> ($(objdir)/m2/gm2-libs-boot): Remove.
> ($(objdir)/m2/gm2-libs-libiberty): Remove.
> ($(objdir)/m2/gm2-libiberty): Remove.
> ($(objdir)/m2/gm2-gcc): Remove.
> ($(objdir)/m2/gm2-compiler): Remove.
> ($(objdir)/m2/gm2-libs): Remove.
> ($(objdir)/m2/gm2-libs-iso): Remove.
> ($(objdir)/m2/gm2-libs-min): Remove.
> ($(objdir)/m2/gm2-compiler-paranoid): Remove.
> ($(objdir)/m2/gm2-libs-paranoid): Remove.
> ($(objdir)/m2/gm2-compiler-verify): Remove.
> ($(objdir)/m2/boot-bin): Remove.
> ($(objdir)/m2/gm2-libs-pim): Remove.
> ($(objdir)/m2/gm2-libs-coroutines): Remove.
> (stage1/m2): Remove.
> (stage2/m2): Remove.
> (stage3/m2): Remove.
> (m2.stageprofile): New rule.
> (m2.stagefeedback): New rule.
> (cc1gm2$(exeext)): Change dependent name.
> (m2/stage2/cc1gm2$(exeext)): Change dependent name.
> Check and create dest dir.
> (m2/stage1/cc1gm2$(exeext)): Check and create dest dir
> if necessary.
> (m2/gm2-gcc/%.o): Ditto.
> (m2/gm2-gcc/rtegraph.o): Ditto.
> (m2/gm2-gcc/$(SRC_PREFIX)%.h): Ditto.
> (m2/gm2-gcc/$(SRC_PREFIX)%.h): Ditto.
> (m2/mc-boot): Ditto.
> (m2/mc-boot-ch): Ditto.
> (m2/gm2-libs-boot): Ditto.
> (m2/gm2-compiler-boot): Ditto.
> (m2/gm2-compiler): Ditto.
> (m2/gm2-libiberty): Ditto.
> (m2/gm2-compiler): Ditto.
> (m2/gm2-libs-iso): Ditto.
> (m2/gm2-libs): Ditto.
> (m2/gm2-libs-min): Ditto.
> (m2/gm2-libs-coroutines): Ditto.
> (m2/boot-bin): Ditto.
> (m2/pge-boot): Ditto.
> (m2/pge-boot): Ditto.
> * Make-maintainer.in (m2/gm2-ppg-boot): Check and create
> dest dir if necessary.
> (m2): Ditto.
> (m2/gm2-ppg-boot): Ditto.
> (m2/gm2-pg-boot): Ditto.
> (m2/gm2-auto): Ditto.
> (m2/gm2-pg-boot): Ditto.
> (m2/gm2-pge-boot): Ditto.
> ($(objdir)/plugin): Ditto.
> ($(objdir)/m2/mc-boot-ch): Ditto.
> ($(objdir)/m2/mc-boot-gen): Ditto.
> (m2/boot-bin): Ditto.
> (m2/mc): Ditto.
> (m2/mc-obj): Ditto.
> ($(objdir)/m2/gm2-ppg-boot): Ditto.
> ($(objdir)/m2/gm2-pg-boot): Ditto.
> ($(objdir)/m2/gm2-pge-boot): Ditto.
> (m2/mc-boot-gen): Ditto.
> (m2/m2obj3): Ditto.
> (m2/gm2-libs-paranoid): Ditto.
> (m2/gm2-compiler-paranoid): Ditto.
> (m2/gm2-libs-paranoid): Ditto.
> (m2/gm2-compiler-paranoid): Ditto.
> (m2/gm2-libs-paranoid): Ditto.
> (m2/gm2-compiler-paranoid): Ditto.
> * config-lang.in (m2/gm2-compiler-boot): Remove mkdir.
> (m2/gm2-libs-boot): Ditto.
> (m2/gm2-ici-boot): Ditto.
> (m2/gm2-libiberty): Ditto.
> (m2/gm2-gcc): Ditto.
> (m2/gm2-compiler): Ditto.
> (m2/gm2-libs): Ditto.
> (m2/gm2-libs-iso): Ditto.
> (m2/gm2-compiler-paranoid): Ditto.
> (m2/gm2-libs-paranoid): Ditto.
> (m2/gm2-compiler-verify): Ditto.
> (m2/boot-bin): Ditto.
> (m2/gm2-libs-pim): Ditto.
> (m2/gm2-libs-coroutines): Ditto.
> (m2/gm2-libs-min): Ditto.
> (m2/pge-boot): Ditto.
> (plugin): Ditto.
> (stage1/m2): Ditto.
> (stage2/m2): Ditto.
> (stage3/m2): Ditto.
> (stage4/m2): Ditto.
> (m2/gm2-auto): Ditto.
> (m2/gm2-pg-boot): Ditto.
> (m2/gm2-pge-boot): Ditto.
> (m2/gm2-ppg-boot): Ditto.
> (m2/mc-boot): Ditto.
> (m2/mc-boot-ch): Ditto.
> (m2/mc-boot-gen): Ditto.
>
> -- o< -- o< -- o< -- o< -- o< -- o< -- o<
> diff --git a/gcc/m2/Make-lang.in b/gcc/m2/Make-lang.in
> index 08d0f3b963f..a3751109481 100644
> --- a/gcc/m2/Make-lang.in
> +++ b/gcc/m2/Make-lang.in
> @@ -27,7 +27,7 @@ GM2_CROSS_NAME = `echo gm2|sed 
> '$(program_transform_cross_name)'`
>
>  M2_MAINTAINER = no
>
> -GM2_1 = ./gm2 -B./stage1/m2 -g -fm2-g
> +GM2_1 = ./gm2 -B./m2/stage1 -g -fm2-g
>
>  GM2_FOR_TARGET = $(STAGE_CC_WRAPPER) ./gm2 -B./ -B$(build_tooldir)/bin/ 
> -L$(objdir)/../ld $(TFLAGS)
>
> @@ -71,7 +71,6 @@ m2.srcextra: 

Re: [PATCH] PR rtl-optimization/106421: ICE in bypass_block from non-local goto.

2023-01-10 Thread Richard Biener via Gcc-patches
On Mon, Jan 9, 2023 at 8:50 PM Roger Sayle  wrote:
>
>
> This patch fixes PR rtl-optimization/106421, an ICE-on-valid (but
> undefined) regression.  The fix, as proposed by Richard Biener, is to
> defend against BLOCK_FOR_INSN returning NULL in cprop's bypass_block.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?

OK.

>
> 2023-01-09  Roger Sayle  
>
> gcc/ChangeLog
> PR rtl-optimization/106421
> * cprop.cc (bypass_block): Check that DEST is local to this
> function (non-NULL) before calling find_edge.
>
> gcc/testsuite/ChangeLog
> PR rtl-optimization/106421
> * gcc.dg/pr106421.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


[PATCH 1/2] libstdc++: Enable string_view in freestanding

2023-01-10 Thread Arsen Arsenović via Gcc-patches
This enables the default contract handler in freestanding environments,
and, of course, provides freestanding users with string_view.

libstdc++-v3/ChangeLog:

* include/Makefile.am: Install bits/char_traits.h,
std/string_view
* include/Makefile.in: Regenerate.
* include/bits/char_traits.h: Gate hosted-only, wchar-only and
mbstate-only bits behind appropriate #ifs.
* include/std/string_view: Gate  functionality behind
HOSTED.
* include/std/version: Enable __cpp_lib_constexpr_string_view
and __cpp_lib_starts_ends_with in !HOSTED.
* include/std/ranges: Re-enable __is_basic_string_view on
freestanding, include  directly.
* include/precompiled/stdc++.h: Include  when
!HOSTED too.
* testsuite/20_util/function_objects/searchers.cc: Skip testing
boyer_moore searchers on freestanding
* testsuite/21_strings/basic_string_view/capacity/1.cc: Guard
-related tests behind __STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/cons/char/1.cc: Ditto.
* testsuite/21_strings/basic_string_view/cons/char/2.cc: Remove
unused  include.
* testsuite/21_strings/basic_string_view/cons/char/3.cc: Remove
unused  include.
* testsuite/21_strings/basic_string_view/cons/char/range.cc:
Guard  related testing behind __STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/cons/wchar_t/1.cc:
Guard  related tests behind __STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/element_access/char/1.cc:
Ditto.
* testsuite/21_strings/basic_string_view/element_access/wchar_t/1.cc:
Guard  tests behind __STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/operations/contains/char/2.cc:
Enable test on freestanding, guard  bits behind
__STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/operations/substr/char.cc:
Guard  bits behind __STDC_HOSTED__.
* testsuite/21_strings/basic_string_view/operations/substr/wchar_t.cc:
Ditto.
---
Morning (so much for submitting it last night eh? :D),

This patchset enables the use of std::string_view in freestanding
environments.  This permits freestanding programs to use contracts, and
fixes building libstdc++.* on freestanding with one of the patches I
sent previously.

I also included fixes for some new test failures on unix/-ffreestanding.
I hope to get some time to set up a dedicated runner for re-spinning
-ffreestanding libstdc++ every so often in the near future..

I haven't built Managarm with frg::string_view made into an alias for
std::string_view yet, I can also do that before the merge, if so
desired, as a little use-case test, but that might take a few days.

Before NYE, I tested a full x86_64-pc-linux-gnu bootstrap, but I haven't
had a chance to do that today after a rebase, though I did verify that
--target_board='unix/{,-ffreestanding}' passes fine.  I can do that
tonight and update this thread if need be.

Thanks in advance, have a great day.

 libstdc++-v3/include/Makefile.am  |  6 +--
 libstdc++-v3/include/Makefile.in  |  6 +--
 libstdc++-v3/include/bits/char_traits.h   | 50 ---
 libstdc++-v3/include/precompiled/stdc++.h |  3 +-
 libstdc++-v3/include/std/ranges   |  3 +-
 libstdc++-v3/include/std/string_view  | 19 +--
 libstdc++-v3/include/std/version  |  4 +-
 .../20_util/function_objects/searchers.cc | 27 --
 .../basic_string_view/capacity/1.cc   |  2 +
 .../basic_string_view/cons/char/1.cc  |  7 ++-
 .../basic_string_view/cons/char/2.cc  |  1 -
 .../basic_string_view/cons/char/3.cc  |  1 -
 .../basic_string_view/cons/char/range.cc  |  7 ++-
 .../basic_string_view/cons/wchar_t/1.cc   |  6 ++-
 .../element_access/char/1.cc  |  7 ++-
 .../element_access/wchar_t/1.cc   |  6 ++-
 .../operations/contains/char/2.cc |  1 -
 .../operations/substr/char.cc |  7 ++-
 .../operations/substr/wchar_t.cc  |  7 ++-
 19 files changed, 133 insertions(+), 37 deletions(-)

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index e91f4ddd4de..bf566082a8c 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -46,6 +46,7 @@ std_freestanding = \
${std_srcdir}/scoped_allocator \
${std_srcdir}/source_location \
${std_srcdir}/span \
+   ${std_srcdir}/string_view \
${std_srcdir}/tuple \
${std_srcdir}/type_traits \
${std_srcdir}/typeindex \
@@ -100,7 +101,6 @@ std_headers = \
${std_srcdir}/stop_token \
${std_srcdir}/streambuf \
${std_srcdir}/string \
-   ${std_srcdir}/string_view \
${std_srcdir}/system_error \
${std_srcdir}/thread \

[PATCH 2/2] libstdc++: Fix a few !HOSTED test regressions

2023-01-10 Thread Arsen Arsenović via Gcc-patches
libstdc++-v3/ChangeLog:

* testsuite/20_util/to_chars/version.cc: Mark hosted-only.
* testsuite/20_util/uses_allocator/lwg3677.cc: Ditto.
* testsuite/20_util/weak_ptr/cons/self_move.cc: Ditto.
* testsuite/std/ranges/adaptors/as_rvalue/1.cc: Replace usage of
std::make_unique with a freestanding-compatible wrapper around
unique_ptr.
* testsuite/21_strings/basic_string_view/operations/contains/char.cc:
Don't test for presence of __cpp_lib_string_contains on !HOSTED.
* testsuite/21_strings/basic_string_view/operations/contains/char/2.cc:
Ditto.
* testsuite/std/ranges/version_c++23.cc: Don't test for presence
of __cpp_lib_ranges in !HOSTED.
---
 .../testsuite/20_util/to_chars/version.cc |  1 +
 .../20_util/uses_allocator/lwg3677.cc |  1 +
 .../20_util/weak_ptr/cons/self_move.cc|  1 +
 .../operations/contains/char.cc   | 13 -
 .../operations/contains/char/2.cc | 11 +++
 .../std/ranges/adaptors/as_rvalue/1.cc| 19 +++
 .../testsuite/std/ranges/version_c++23.cc |  6 --
 7 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/libstdc++-v3/testsuite/20_util/to_chars/version.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/version.cc
index 25b1e0036e8..2789afa28ef 100644
--- a/libstdc++-v3/testsuite/20_util/to_chars/version.cc
+++ b/libstdc++-v3/testsuite/20_util/to_chars/version.cc
@@ -1,5 +1,6 @@
 // { dg-options "-std=gnu++23" }
 // { dg-do preprocess { target c++23 } }
+// { dg-require-effective-target hosted }
 
 #include 
 
diff --git a/libstdc++-v3/testsuite/20_util/uses_allocator/lwg3677.cc 
b/libstdc++-v3/testsuite/20_util/uses_allocator/lwg3677.cc
index 649b98d3922..b2595d0eb22 100644
--- a/libstdc++-v3/testsuite/20_util/uses_allocator/lwg3677.cc
+++ b/libstdc++-v3/testsuite/20_util/uses_allocator/lwg3677.cc
@@ -1,5 +1,6 @@
 // { dg-options "-std=gnu++23" }
 // { dg-do run { target c++20 } }
+// { dg-require-effective-target hosted }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/20_util/weak_ptr/cons/self_move.cc 
b/libstdc++-v3/testsuite/20_util/weak_ptr/cons/self_move.cc
index c890d2ba94d..7e38765eb2b 100644
--- a/libstdc++-v3/testsuite/20_util/weak_ptr/cons/self_move.cc
+++ b/libstdc++-v3/testsuite/20_util/weak_ptr/cons/self_move.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++11 } }
+// { dg-require-effective-target hosted }
 
 #include 
 #include 
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/char.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/char.cc
index c71a6dc6c63..8ae56757fe8 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/char.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/char.cc
@@ -22,11 +22,14 @@
 
 #include 
 
-#ifndef __cpp_lib_string_contains
-# error "Feature-test macro for contains missing in "
-#elif __cpp_lib_string_contains != 202011L
-# error "Feature-test macro for contains has wrong value in "
-#endif
+#if __STDC_HOSTED__
+// This FTM is omitted since  is not freestanding.
+# ifndef __cpp_lib_string_contains
+#  error "Feature-test macro for contains missing in "
+# elif __cpp_lib_string_contains != 202011L
+#  error "Feature-test macro for contains has wrong value in "
+# endif
+#endif // HOSTED
 
 void
 test01()
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/char/2.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/char/2.cc
index c106a553f40..d8c85e23249 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/char/2.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string_view/operations/contains/char/2.cc
@@ -20,8 +20,11 @@
 
 #include 
 
-#ifndef __cpp_lib_string_contains
-# error "Feature-test macro for contains missing in "
-#elif __cpp_lib_string_contains != 202011L
-# error "Feature-test macro for contains has wrong value in "
+#if __STDC_HOSTED__
+// This FTM is omitted since  is not freestanding.
+# ifndef __cpp_lib_string_contains
+#  error "Feature-test macro for contains missing in "
+# elif __cpp_lib_string_contains != 202011L
+#  error "Feature-test macro for contains has wrong value in "
+# endif
 #endif
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/as_rvalue/1.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/as_rvalue/1.cc
index fbf0d651366..da829606e06 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/as_rvalue/1.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/as_rvalue/1.cc
@@ -14,13 +14,24 @@
 namespace ranges = std::ranges;
 namespace views = std::views;
 
+
+/* Replacement for the standard version, as it's not available in freestanding
+   environments.  */
+template
+requires (!std::is_array_v)
+constexpr auto
+make_unique (Args &&...args)
+{
+  return std::unique_ptr { new T 

Re: [PATCH] bpf: correct bpf_print_operand for floats [PR108293]

2023-01-10 Thread Jose E. Marchesi via Gcc-patches


Hi David.
Thanks for the patch.

> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index 2aeaeaf309b..9dde3944e9c 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -880,13 +880,20 @@ bpf_print_operand (FILE *file, rtx op, int code 
> ATTRIBUTE_UNUSED)
>output_address (GET_MODE (op), XEXP (op, 0));
>break;
>  case CONST_DOUBLE:
> -  if (CONST_DOUBLE_HIGH (op))
> - fprintf (file, HOST_WIDE_INT_PRINT_DOUBLE_HEX,
> -  CONST_DOUBLE_HIGH (op), CONST_DOUBLE_LOW (op));
> -  else if (CONST_DOUBLE_LOW (op) < 0)
> - fprintf (file, HOST_WIDE_INT_PRINT_HEX, CONST_DOUBLE_LOW (op));
> -  else
> - fprintf (file, HOST_WIDE_INT_PRINT_DEC, CONST_DOUBLE_LOW (op));
> +  long vals[2];
> +  real_to_target (vals, CONST_DOUBLE_REAL_VALUE (op), GET_MODE (op));
> +  vals[0] &= 0x;
> +  vals[1] &= 0x;
> +  if (GET_MODE (op) == SFmode)
> + fprintf (file, "0x%08lx", vals[0]);
> +  else if (GET_MODE (op) == DFmode)
> + {
> +   /* Note: real_to_target puts vals in target word order.  */
> +   if (WORDS_BIG_ENDIAN)
> + fprintf (file, "0x%08lx%08lx", vals[0], vals[1]);
> +   else
> + fprintf (file, "0x%08lx%08lx", vals[1], vals[0]);
> + }
>break;
>  default:
>output_addr_const (file, op);

Do we want a gcc_unreachable in case the mode of `op' is not SFmode nor
DFmode?

> diff --git a/gcc/testsuite/gcc.target/bpf/double-1.c 
> b/gcc/testsuite/gcc.target/bpf/double-1.c
> new file mode 100644
> index 000..200f1bd18f8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/double-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mlittle-endian" } */
> +
> +double f;
> +double a() { f = 1.0; return 1.0; }
> +double b() { f = 2.0; return 2.0; }
> +double c() { f = 2.0; return 3.0; }
> +double d() { f = 3.0; return 3.0; }
> +
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x3ff0" 2 } } */
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x4000" 3 } } */
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x4008" 3 } } */
> diff --git a/gcc/testsuite/gcc.target/bpf/double-2.c 
> b/gcc/testsuite/gcc.target/bpf/double-2.c
> new file mode 100644
> index 000..d04ddd0c575
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/double-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mbig-endian" } */
> +
> +double f;
> +double a() { f = 1.0; return 1.0; }
> +double b() { f = 2.0; return 2.0; }
> +double c() { f = 2.0; return 3.0; }
> +double d() { f = 3.0; return 3.0; }
> +
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x3ff0" 2 } } */
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x4000" 3 } } */
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x4008" 3 } } */
> diff --git a/gcc/testsuite/gcc.target/bpf/float-1.c 
> b/gcc/testsuite/gcc.target/bpf/float-1.c
> new file mode 100644
> index 000..05ed7bb651d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/float-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mlittle-endian" } */
> +
> +float f;
> +float a() { f = 1.0; return 1.0; }
> +float b() { f = 2.0; return 2.0; }
> +float c() { f = 2.0; return 3.0; }
> +float d() { f = 3.0; return 3.0; }
> +
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x3f80" 2 } } */
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x4000" 3 } } */
> +/* { dg-final { scan-assembler-times "lddw\t%r.,0x4040" 3 } } */


[PATCH] tree-optimization/108314 - avoid BIT_NOT optimization for extract-last

2023-01-10 Thread Richard Biener via Gcc-patches
The extract-last reduction internal function expects the then and
else clause as vector and scalar and thus we cannot perform optimization
of the inversion of the condition by swapping the then/else clauses.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR tree-optimization/108314
* tree-vect-stmts.cc (vectorizable_condition): Do not
perform BIT_NOT_EXPR optimization for EXTRACT_LAST_REDUCTION.

* gcc.dg/vect/pr108314.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr108314.c | 16 
 gcc/tree-vect-stmts.cc   | 13 +
 2 files changed, 25 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr108314.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr108314.c 
b/gcc/testsuite/gcc.dg/vect/pr108314.c
new file mode 100644
index 000..07260e06915
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr108314.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv9-a" { target aarch64-*-* } } */
+
+int x, y, z;
+
+void f(void)
+{
+  int t = 4;
+  for (; x; x++)
+{
+  if (y)
+   continue;
+  t = 0;
+}
+  z = t;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 6ddd41fb473..eb4ca1f184e 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -10677,7 +10677,8 @@ vectorizable_condition (vec_info *vinfo,
  vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
  if (bitop2 == NOP_EXPR)
vec_compare = new_temp;
- else if (bitop2 == BIT_NOT_EXPR)
+ else if (bitop2 == BIT_NOT_EXPR
+  && reduction_type != EXTRACT_LAST_REDUCTION)
{
  /* Instead of doing ~x ? y : z do x ? z : y.  */
  vec_compare = new_temp;
@@ -10686,9 +10687,13 @@ vectorizable_condition (vec_info *vinfo,
  else
{
  vec_compare = make_ssa_name (vec_cmp_type);
- new_stmt
-   = gimple_build_assign (vec_compare, bitop2,
-  vec_cond_lhs, new_temp);
+ if (bitop2 == BIT_NOT_EXPR)
+   new_stmt
+ = gimple_build_assign (vec_compare, bitop2, new_temp);
+ else
+   new_stmt
+ = gimple_build_assign (vec_compare, bitop2,
+vec_cond_lhs, new_temp);
  vect_finish_stmt_generation (vinfo, stmt_info,
   new_stmt, gsi);
}
-- 
2.35.3


[Committed] IBM zSystems: Make -fcall-saved-... work.

2023-01-10 Thread Andreas Krebbel via Gcc-patches
Committed to mainline. Bootstrap and regression tests are clean.

gcc/ChangeLog:

* config/s390/s390.cc (s390_register_info): Check call_used_regs
instead of hard-coding the register numbers for call saved
registers.
(s390_optimize_register_info): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/s390/fcall-saved.c: New test.
---
 gcc/config/s390/s390.cc | 10 --
 gcc/testsuite/gcc.target/s390/fcall-saved.c | 11 +++
 2 files changed, 15 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/fcall-saved.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 42177c204f6..a9bb610385b 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -10075,8 +10075,8 @@ s390_register_info ()
 
   memset (cfun_frame_layout.gpr_save_slots, SAVE_SLOT_NONE, 16);
 
-  for (i = 6; i < 16; i++)
-if (clobbered_regs[i])
+  for (i = 0; i < 16; i++)
+if (clobbered_regs[i] && !call_used_regs[i])
   cfun_gpr_save_slot (i) = SAVE_SLOT_STACK;
 
   s390_register_info_stdarg_fpr ();
@@ -10136,10 +10136,8 @@ s390_optimize_register_info ()
|| cfun_frame_layout.save_return_addr_p
|| crtl->calls_eh_return);
 
-  memset (cfun_frame_layout.gpr_save_slots, SAVE_SLOT_NONE, 6);
-
-  for (i = 6; i < 16; i++)
-if (!clobbered_regs[i])
+  for (i = 0; i < 16; i++)
+if (!clobbered_regs[i] || call_used_regs[i])
   cfun_gpr_save_slot (i) = SAVE_SLOT_NONE;
 
   s390_register_info_set_ranges ();
diff --git a/gcc/testsuite/gcc.target/s390/fcall-saved.c 
b/gcc/testsuite/gcc.target/s390/fcall-saved.c
new file mode 100644
index 000..a08155372f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/fcall-saved.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mzarch -fcall-saved-r4" } */
+
+void test(void) {
+asm volatile("nop" ::: "r4");
+}
+
+/* { dg-final { scan-assembler-times "\tstg\t" 1 { target { lp64 } } } } */
+/* { dg-final { scan-assembler-times "\tlg\t" 1 { target { lp64 } } } } */
+/* { dg-final { scan-assembler-times "\tst\t" 1 { target { ! lp64 } } } } */
+/* { dg-final { scan-assembler-times "\tl\t" 1 { target { ! lp64 } } } } */
-- 
2.39.0



Re: [PATCH] Remove legacy pre-C++ 11 definitions

2023-01-10 Thread Martin Liška
On 1/9/23 16:19, Jonathan Wakely wrote:
> On Mon, 9 Jan 2023 at 15:17, Martin Liška  wrote:
>>
>> On 1/6/23 19:23, Jonathan Wakely wrote:
>>> Seems to me that GCC code should just use nullptr directly not redefine 
>>> NULL.
>>
>> Sure, but that would lead to a huge patch which would rename NULL to 
>> nullptr, right?
> 
> 
> Yeah, which can probably be done separately (or not done at all).

That would be a massive patch affecting all targets and FEs.

> I was just commenting on the comment that Andrew showed. That comment
> explain that nullptr is better than 0 as a null pointer constant,
> which is a good reason to prefer nullptr. But not a good reason to
> redefine NULL; in code with a minimum requirement of C++11 you can
> just use nullptr directly.

Ok, so does it mean my patch addresses what can be easily adjusted?

Thanks,
Martin



Re: gcc-13/changes.html: Mention -fstrict-flex-arrays and its impact

2023-01-10 Thread Richard Biener via Gcc-patches
On Mon, 9 Jan 2023, Qing Zhao wrote:

> 
> 
> > On Jan 9, 2023, at 2:11 AM, Richard Biener  wrote:
> > 
> > On Thu, 22 Dec 2022, Qing Zhao wrote:
> > 
> >> 
> >> 
> >>> On Dec 22, 2022, at 2:09 AM, Richard Biener  wrote:
> >>> 
> >>> On Wed, 21 Dec 2022, Qing Zhao wrote:
> >>> 
>  Hi, Richard,
>  
>  Thanks a lot for your comments.
>  
> > On Dec 21, 2022, at 2:12 AM, Richard Biener  wrote:
> > 
> > On Tue, 20 Dec 2022, Qing Zhao wrote:
> > 
> >> Hi,
> >> 
> >> This is the patch for mentioning -fstrict-flex-arrays and 
> >> -Warray-bounds=2 changes in gcc-13/changes.html.
> >> 
> >> Let me know if you have any comment or suggestions.
> > 
> > Some copy editing below
> > 
> >> Thanks.
> >> 
> >> Qing.
> >> 
> >> ===
> >> From c022076169b4f1990b91f7daf4cc52c6c5535228 Mon Sep 17 00:00:00 2001
> >> From: Qing Zhao 
> >> Date: Tue, 20 Dec 2022 16:13:04 +
> >> Subject: [PATCH] gcc-13/changes: Mention -fstrict-flex-arrays and its 
> >> impact.
> >> 
> >> ---
> >> htdocs/gcc-13/changes.html | 15 +++
> >> 1 file changed, 15 insertions(+)
> >> 
> >> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> >> index 689178f9..47b3d40f 100644
> >> --- a/htdocs/gcc-13/changes.html
> >> +++ b/htdocs/gcc-13/changes.html
> >> @@ -39,6 +39,10 @@ a work-in-progress.
> >>   Legacy debug info compression option -gz=zlib-gnu 
> >> was removed
> >> and the option is ignored right now.
> >>   New debug info compression option value -gz=zstd 
> >> has been added.
> >> +-Warray-bounds=2 will no longer issue warnings 
> >> for out of bounds
> >> +  accesses to trailing struct members of one-element array type 
> >> anymore. Please
> >> +  add -fstrict-flex-arrays=level to control how the 
> >> compiler treat
> >> +  trailing arrays of structures as flexible array members. 
> > 
> > "Instead it diagnoses accesses to trailing arrays according to 
> > -fstrict-flex-arrays."
>  
>  Okay.
> > 
> >> 
> >> 
> >> 
> >> @@ -409,6 +413,17 @@ a work-in-progress.
> >> Other significant improvements
> >> 
> >> 
> >> +Treating trailing arrays as flexible array 
> >> members
> >> +
> >> +
> >> + GCC can now control when to treat the trailing array of a 
> >> structure as a 
> >> + flexible array member for the purpose of accessing the elements 
> >> of such
> >> + an array. By default, all trailing arrays of structures are 
> >> treated as
> > 
> > all trailing arrays in aggregates are treated
>  Okay.
> > 
> >> + flexible array members. Use the new command-line option
> >> + -fstrict-flex-array=level to control how GCC treats 
> >> the trailing
> >> + array of a structure as a flexible array member at different 
> >> levels.
> > 
> > -fstrict-flex-arrays to control which trailing array
> > members are streated as flexible arrays.
>  
>  Okay.
>  
> > 
> > I've also just now noticed that there's now a flag_strict_flex_arrays
> > check in the middle-end (in array bound diagnostics) but this option
> > isn't streamed or handled with LTO.  I think you want to replace that
> > with the appropriate DECL_NOT_FLEXARRAY check.
>  
>  We need to know the level value of the strict_flex_arrays on the struct 
>  field to issue proper warnings at different levels. DECL_NOT_FLEXARRAY 
>  does not include such info. So, what should I do? Streaming the 
>  flag_strict_flex_arrays with LTO?
> >>> 
> >>> But you do
> >>> 
> >>> if (compref)
> >>>   {
> >>> /* Try to determine special array member type for this 
> >>> COMPONENT_REF.  */
> >>> sam = component_ref_sam_type (arg);
> >>> /* Get the level of strict_flex_array for this array field.  */
> >>> tree afield_decl = TREE_OPERAND (arg, 1);
> >>> strict_flex_array_level = strict_flex_array_level_of (afield_decl);
> >>> 
> >>> I see that function doesn't look at DECL_NOT_FLEXARRAY but just
> >>> checks attributes (those are streamed in LTO).
> >> 
> >> Yes, checked both flag_strict_flex_arrays and attributes. 
> >> 
> >> There are two places in middle end calling ?strict_flex_array_level_of? 
> >> function, 
> >> one inside ?array_bounds_checker::check_array_ref?, another one inside 
> >> ?component_ref_size?.
> >> Shall we check DECL_NOT_FLEXARRAY field instead of calling 
> >> ?strict_flex_array_level_of? in both places?
> > 
> > I wonder if that function should check DECL_NOT_FLEXARRAY?
> 
> The function ?strict_flex_array_level_of? is intended to query the LEVEL of 
> strict_flex_array, only check DECL_NOT_FLEXARRAY is not enough. 
> 
> So, I think the major question here is: 
> 
> Do we need  the LEVEL