[PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]

2021-10-27 Thread Xionghu Luo via Gcc-patches



On 2021/10/27 21:24, David Edelsohn wrote:
> On Sun, Oct 24, 2021 at 10:51 PM Xionghu Luo  wrote:
>>
>> If the second operand of __builtin_shuffle is const vector 0, and with
>> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.
>>
>> gcc/ChangeLog:
>>
>> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
>> patterns match and emit for VSX xxpermdi.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/powerpc/pr102868.c: New test.
>> ---
>>  gcc/config/rs6000/rs6000.c  | 47 --
>>  gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 +
>>  2 files changed, 97 insertions(+), 3 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index d0730253bcc..5d802c1fa96 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -23046,7 +23046,23 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
>> rtx op1,
>>  {OPTION_MASK_P8_VECTOR,
>>   BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
>>   : CODE_FOR_p8_vmrgew_v4sf_direct,
>> - {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}};
>> + {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}},
>> +{OPTION_MASK_VSX,
>> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
>> +  : CODE_FOR_vsx_xxpermdi_v16qi),
>> + {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}},
>> +{OPTION_MASK_VSX,
>> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
>> +  : CODE_FOR_vsx_xxpermdi_v16qi),
>> + {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}},
>> +{OPTION_MASK_VSX,
>> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
>> +  : CODE_FOR_vsx_xxpermdi_v16qi),
>> + {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}},
>> +{OPTION_MASK_VSX,
>> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
>> +  : CODE_FOR_vsx_xxpermdi_v16qi),
>> + {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}};
> 
> If the insn_code is the same for big endian and little endian, why
> does the new code test BYTES_BIG_ENDIAN to set the same value
> (CODE_FOR_vsx_xxpermdi_v16qi)?
> 

Thanks for the catch, updated the patch as below:


[PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the upper 
bits [PR102868]

If the second operand of __builtin_shuffle is const vector 0, and with
specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.

gcc/ChangeLog:

* config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
patterns match and emit for VSX xxpermdi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr102868.c: New test.
---
 gcc/config/rs6000/rs6000.c  | 39 +--
 gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 +
 2 files changed, 89 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d0730253bcc..533560bb9ba 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -23046,7 +23046,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
rtx op1,
 {OPTION_MASK_P8_VECTOR,
  BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
  : CODE_FOR_p8_vmrgew_v4sf_direct,
- {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}};
+ {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}},
+{OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi,
+ {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}},
+{OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi,
+ {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}},
+{OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi,
+ {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}},
+{OPTION_MASK_VSX, CODE_FOR_vsx_xxpermdi_v16qi,
+ {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}};
 
   unsigned int i, j, elt, which;
   unsigned char perm[16];
@@ -23169,6 +23177,27 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
rtx op1,
  machine_mode omode = insn_data[icode].operand[0].mode;
  machine_mode imode = insn_data[icode].operand[1].mode;
 
+ rtx perm_idx = GEN_INT (0);
+ if (icode == CODE_FOR_vsx_xxpermdi_v16qi)
+   {
+ int perm_val = 0;
+ if (one_vec)
+   {
+ if (perm[0] == 8)
+   perm_val |= 2;
+ if (perm[8] == 8)
+   perm_val |= 1;
+   }
+ else
+   {
+ if (perm[0] != 0)
+   perm_val |= 2;
+ if (perm[8] != 16)
+   perm_val |= 1;
+   }
+ perm_idx = 

Re: [PATCH] elf: Add __libc_get_static_tls_bounds [BZ #16291]

2021-10-27 Thread Fāng-ruì Sòng via Gcc-patches
On Tue, Oct 19, 2021 at 12:37 PM Fāng-ruì Sòng  wrote:
>
> On Thu, Oct 14, 2021 at 5:13 PM Fangrui Song  wrote:
> >
> > On 2021-10-06, Fangrui Song wrote:
> > >On 2021-09-27, Fangrui Song wrote:
> > >>On 2021-09-27, Florian Weimer wrote:
> > >>>* Fangrui Song:
> > >>>
> > Sanitizer runtimes need static TLS boundaries for a variety of use 
> > cases.
> > 
> > * asan/hwasan/msan/tsan need to unpoison static TLS blocks to prevent 
> > false
> >  positives due to reusing the TLS blocks with a previous thread.
> > * lsan needs TCB for pointers into pthread_setspecific regions.
> > 
> > See https://maskray.me/blog/2021-02-14-all-about-thread-local-storage
> > for details.
> > 
> > compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp GetTls has
> > to infer the static TLS bounds from TP, _dl_get_tls_static_info, and
> > hard-coded TCB sizes. Currently this is somewhat robust for
> > aarch64/powerpc64/x86-64 but is brittle for many other architectures.
> > 
> > This patch implements __libc_get_static_tls_bounds@@GLIBC_PRIVATE which
> > is available in Android bionic since API level 31. This API allows the
> > sanitizer code to be more robust. _dl_get_tls_static_info@@GLIBC_PRIVATE
> > can probably be removed when Clang/GCC sanitizers drop reliance on it.
> > I am unclear whether the version should be GLIBC_2.*.
> > >>>
> > >>>Does this really cover the right memory region?  I assume LSAN needs
> > >>>something that identifies pointers to malloc'ed memory that are stored
> > >>>in non-malloc'ed (mmap'ed) memory.  The static TLS region is certainly a
> > >>>place where such pointers can be stored.  But struct pthread also
> > >>>contains other such pointers: the DTV, the TPP data, and POSIX TLS
> > >>>(pthread_setspecific) data, and struct pthread is not obviously part of
> > >>>the static TLS region.
> > >>
> > >>I know the pthread_setspecific leak detection is brittle but it is
> > >>currently implemented this way ;-)
> > >>
> > >>https://maskray.me/blog/2021-02-14-all-about-thread-local-storage says
> > >>
> > >>"On glibc, GetTls returned range includes
> > >>pthread::{specific_1stblock,specific} for thread-specific data keys.
> > >>There is currently a hack to ignore allocations from ld.so allocated
> > >>dynamic TLS blocks. Note: if the pthread::{specific_1stblock,specific}
> > >>pointers are encrypted, lsan cannot track the allocation."
> > >>
> > >>If pthread::{specific_1stblock,specific} use an XOR technique (like
> > >>__cxa_atexit/setjmp) the pthread_setspecific leak detection will stop
> > >>working :(
> > >>
> > >>---
> > >>
> > >>In any case, the pthread_setspecific leak detection is a relatively
> > >>minor issue. The big issue is asan/msan/tsan false positives due to
> > >>reusing an (exited) thread stack or its TLS blocks.
> > >>
> > >>Around
> > >>https://code.woboq.org/llvm/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp.html#435
> > >>there is very long messy code hard coding the thread descriptor size in
> > >>glibc.
> > >>
> > >>Android `__libc_get_static_tls_bounds(_addr, _addr);` is the
> > >>most robust one.
> > >>
> > >>---
> > >>
> > >>I ported sanitizers to musl (https://reviews.llvm.org/D93848)
> > >>in LLVM 12.0.0 and fixed some TLS block detection aarch64/ppc64 issues
> > >>(https://reviews.llvm.org/D98926 and its follow-up, due to the
> > >>complexity I couldn't get it right in the first place), so I have some
> > >>understanding about sanitizers' TLS usage.
> > >
> > >Adhemerval showed me that the __libc_get_static_tls_bounds behavior is
> > >expected on aarch64 as well (
> > >__libc_get_static_tls_bounds should match sanitizer GetTls)
> > >
> > >From https://gist.github.com/MaskRay/e035b85dce008f0c6d4997b98354d355
> > >```
> > >$ ./testrun.sh ./test-tls-boundary
> > >+++GetTls: 0x7f9c5fd6c000 4416
> > >get_tls=0x7f9c600b4050
> > >_dl_get_tls_static_info: 4416 64
> > >get_static=0x7f9c600b4070
> > >__libc_get_static_tls_bounds: 0x7f9c5fd6c000 4416
> > >```
> > >
> > >
> > >
> > >Is there any concern adding the interface?
> >
> > Gentle ping...
>
>
> CC gcc-patches which ports compiler-rt and may be interested in more
> reliable sanitizers.

PING^3


[Bug tree-optimization/102977] [12 Regression] vectorizer failed to use armv8.3-a complex fma

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102977

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||needs-bisection

--- Comment #5 from Andrew Pinski  ---
The testcases added for this case does not actually test that complex fma was
done.
The testcases were added in r11-6697.  The aarch64 patterns were added with
r11-6734 .

[Bug tree-optimization/102977] [12 Regression] vectorizer failed to use armv8.3-a complex fma

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102977

Andrew Pinski  changed:

   What|Removed |Added

  Component|middle-end  |tree-optimization
   Target Milestone|--- |12.0

[Bug middle-end/102977] [12 Regression] vectorizer failed to use armv8.3-a complex fma

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102977

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2021-10-28
Summary|[12 Regression] vectorizer  |[12 Regression] vectorizer
   |failed to use complex fma   |failed to use armv8.3-a
   |with SVE|complex fma
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #4 from Andrew Pinski  ---
it is easier to understand what is going wrong with:
#include

void
foo (_Complex float* a, _Complex float* b, _Complex float *c)
{
for (int i =0 ; i != 4; i++)
  a[i] += b[i] * c[i];
}

Oh you don't need SVE either because it was added for normal SIMD in ARMv8.3-a.

Confirmed.

[Bug middle-end/102977] [GCC12 regression] vectorizer failed to generate complex fma with SVE

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102977

Andrew Pinski  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #3 from Andrew Pinski  ---
Oh you mean fcmla.
Never mind.

[Bug middle-end/102977] [GCC12 regression] vectorizer failed to generate complex fma with SVE

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102977

--- Comment #2 from Andrew Pinski  ---
Note st2 does the opposite of ld2 while doing the storing of the vector.

[Bug middle-end/102977] [GCC12 regression] vectorizer failed to generate complex fma with SVE

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102977

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
Huh.
The trunk code is vectorized all the way:
ptrue   p1.h, vl8 ; set p1.h to 8 wide
ptrue   p0.b, all ; set p0.b to all ones
ld2h{z2.h - z3.h}, p1/z, [x1] ; load the 8x2 vector into z2/z3
ld2h{z0.h - z1.h}, p1/z, [x2] ; load the 8x2 vector into z0/z1
ld2h{z16.h - z17.h}, p1/z, [x0] ; load the 8x2 vector into z16/17
fmulz6.h, z0.h, z3.h ; z6 = z0 * z3
movprfx z7, z16  ; z7 = z16
fmlaz7.h, p0/m, z0.h, z2.h ; z7+=z0*z2
fmlaz6.h, p0/m, z1.h, z2.h ; z6 += z1*z2
movprfx z4, z7 ; z4 = z7
fmlsz4.h, p0/m, z1.h, z3.h ; z4 -= z1*z3
faddz5.h, z6.h, z17.h  ; z5 = z6 + z17
st2h{z4.h - z5.h}, p1, [x0] ; store the 8x2 vector into x0


note the way ld2 works is the first element goes into the first vector, second
element goes into the second vector, the 3rd element goes into the first
vector, the 4th element goes into the second vector.

So this is optimized all the way. Knowing the lower limit of the size of the
vectors will be 128 byte (or 64 half floats) so 8 half floats will always fit
into one vector just fine.
So this is vectorized all the way such that it is unrolled even.

Re: [PATCH] hardened conditionals

2021-10-27 Thread Alexandre Oliva via Gcc-patches
On Oct 26, 2021, Richard Biener  wrote:

> OK.

Thanks.  I've just fixed the ChangeLog entry and pushed it:

>> * common.opt (fharden-compares): New.
>> (fharden-conditional-branches): New.
>> * doc/invoke.texi: Document new options.
>> * gimple-harden-conditionals.cc: New.

 + * Makefile.in (OBJS): Build it.

>> * passes.def: Add new passes.
>> * tree-pass.h (make_pass_harden_compares): Declare.
>> (make_pass_harden_conditional_branches): Declare.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[Bug debug/102979] New: GCC gives wrong error for struct definitions without semicolon, despite G++ doing so

2021-10-27 Thread konstantinua00 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102979

Bug ID: 102979
   Summary: GCC gives wrong error for struct definitions without
semicolon, despite G++ doing so
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: konstantinua00 at gmail dot com
  Target Milestone: ---

```
struct test{int i;}
```
https://godbolt.org/z/GEz56T7bT

Current error: "error: expected identifier or '(' at end of input" (points to
"test")

G++ on the other hand gives reasonable:
"error: expected ';' after struct definition"
https://godbolt.org/z/YGqvY7bsa

Is it possible to get G++'s error in GCC too?

[Bug debug/102978] New: Function/Struct declaration with absent semicolon that is put before including standard header results in wall of errors with no indication of the actual problem

2021-10-27 Thread konstantinua00 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102978

Bug ID: 102978
   Summary: Function/Struct declaration with absent semicolon that
is put before including standard header results in
wall of errors with no indication of the actual
problem
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: konstantinua00 at gmail dot com
  Target Milestone: ---

```
void foo()
#include 
```
or
```
struct bar
#include 
```
as stated, give many useless errors.

example with : https://godbolt.org/z/KbrrszEWr  
less intimidating example with : https://godbolt.org/z/hxnxzqTcW
example with struct declaration: https://godbolt.org/z/nhTKc8cM9
happens in GCC with C std header too: https://godbolt.org/z/c6hMWexcW

Such situation happens when there're 2 user headers: first with missing
semicolon and second with new-in-TU std header.

G++(but not GCC) already notices missing semicolon on struct definitions and
GCC (but not G++) notices globals without one (due to typedef 
https://godbolt.org/z/44oYT1PKn), so maybe something can be done for
declarations too?

[Bug target/101324] powerpc64le: hashst appears before mflr at -O1 or higher

2021-10-27 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101324

Peter Bergner  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2021-October
   ||/582755.html

--- Comment #19 from Peter Bergner  ---
I posted Martin's already approved patch to the gcc-patches mailing list along
with a test case which need approval.

[Bug target/94613] S/390, powerpc: Wrong code generated for vec_sel builtin

2021-10-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94613

--- Comment #16 from CVS Commits  ---
The master branch has been updated by Xiong Hu Luo :

https://gcc.gnu.org/g:5f9ef1339e9d0d709af6a70b60e584bf7decd761

commit r12-4758-g5f9ef1339e9d0d709af6a70b60e584bf7decd761
Author: Xionghu Luo 
Date:   Wed Oct 27 21:22:39 2021 -0500

rs6000: Fold xxsel to vsel since they have same semantics

Fold xxsel to vsel like xxperm/vperm to avoid duplicate code.

gcc/ChangeLog:

2021-10-28  Xionghu Luo  

PR target/94613
* config/rs6000/altivec.md: Add vsx register constraints.
* config/rs6000/vsx.md (vsx_xxsel): Delete.
(vsx_xxsel2): Likewise.
(vsx_xxsel3): Likewise.
(vsx_xxsel4): Likewise.

gcc/testsuite/ChangeLog:

2021-10-28  Xionghu Luo  

* gcc.target/powerpc/builtins-1.c: Adjust.

[Bug target/94613] S/390, powerpc: Wrong code generated for vec_sel builtin

2021-10-27 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94613

--- Comment #15 from CVS Commits  ---
The master branch has been updated by Xiong Hu Luo :

https://gcc.gnu.org/g:9222481ffc69a6c0b73ec81e1bf04289fa3db0ed

commit r12-4757-g9222481ffc69a6c0b73ec81e1bf04289fa3db0ed
Author: Xionghu Luo 
Date:   Wed Oct 27 21:21:20 2021 -0500

rs6000: Fix wrong code generation for vec_sel [PR94613]

The vsel instruction is a bit-wise select instruction.  Using an
IF_THEN_ELSE to express it in RTL is wrong and leads to wrong code
being generated in the combine pass.  Per element selection is a
subset of per bit-wise selection,with the patch the pattern is
written using bit operations.  But there are 8 different patterns
to define "op0 := (op1 & ~op3) | (op2 & op3)":

(~op3) | (op3),
(~op3) | (op2),
(op3) | (~op3),
(op2) | (~op3),
(op1&~op3) | (op3),
(op1&~op3) | (op2),
(op3) | (op1&~op3),
(op2) | (op1&~op3),

The latter 4 cases does not follow canonicalisation rules, non-canonical
RTL is invalid RTL in vregs pass.  Secondly, combine pass will swap
(op1&~op3) to (~op3) by commutative canonical, which could reduce
it to the FIRST 4 patterns, but it won't swap (op2) | (~op3) to
(~op3) | (op2), so this patch handles it with 4 patterns with
different NOT op3 position and check equality inside it.

Tested pass on P7, P8 and P9.

gcc/ChangeLog:

2021-10-28  Xionghu Luo  

PR target/94613
* config/rs6000/altivec.md (*altivec_vsel): Change to ...
(altivec_vsel): ... this and update define.
(*altivec_vsel_uns): Delete.
(altivec_vsel2): New define_insn.
(altivec_vsel3): Likewise.
(altivec_vsel4): Likewise.
* config/rs6000/rs6000-call.c (altivec_expand_vec_sel_builtin):
New.
(altivec_expand_builtin): Call altivec_expand_vec_sel_builtin to
expand
vel_sel.
* config/rs6000/rs6000.c (rs6000_emit_vector_cond_expr): Use
bit-wise
selection instead of per element.
* config/rs6000/vector.md:
* config/rs6000/vsx.md (*vsx_xxsel): Change to ...
(vsx_xxsel): ... this and update define.
(*vsx_xxsel_uns): Delete.
(vsx_xxsel2): New define_insn.
(vsx_xxsel3): Likewise.
(vsx_xxsel4): Likewise.

gcc/testsuite/ChangeLog:

2021-10-28  Xionghu Luo  

PR target/94613
* gcc.target/powerpc/pr94613.c: New test.

rs6000: Fix up flag_shrink_wrap handling in presence of -mrop-protect [PR101324]

2021-10-27 Thread Peter Bergner via Gcc-patches
Sorry for reposting, but I forgot to CC the gcc-patches mailing list. :-(


PR101324 shows a problem in disabling shrink-wrapping when using -mrop-protect
when there is a attribute optimize/pragma.  Martin's patch below moves handling
of flag_shrink_wrap so it gets re-disbled when we change or add options.

This passed bootstrap and regtesting with no regressions.  Segher, you
approved Martin's patch in the bugzilla.  Is the test case ok too?

I'll note the test case uses the "new" rop_ok effective-target function which
I submitted as a separate patch.

Peter


2021-10-27  Martin Liska  

gcc/
PR target/101324
* config/rs6000/rs6000.c (rs6000_option_override_internal): Move the
disabling of shrink-wrapping when using -mrop-protect from here...
(rs6000_override_options_after_change): ...to here.

2021-10-27  Peter Bergner  

gcc/testsuite/
PR target/101324
* gcc.target/powerpc/pr101324.c: New test.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index bac959f4ef4..95e0d2cffdd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3484,6 +3484,10 @@ rs6000_override_options_after_change (void)
 }
   else if (!OPTION_SET_P (flag_cunroll_grow_size))
 flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
+
+  /* If we are inserting ROP-protect instructions, disable shrink wrap.  */
+  if (rs6000_rop_protect)
+flag_shrink_wrap = 0;
 }
 
 #ifdef TARGET_USES_LINUX64_OPT
@@ -4048,10 +4052,6 @@ rs6000_option_override_internal (bool global_init_p)
   && ((rs6000_isa_flags_explicit & OPTION_MASK_QUAD_MEMORY_ATOMIC) == 0))
 rs6000_isa_flags |= OPTION_MASK_QUAD_MEMORY_ATOMIC;
 
-  /* If we are inserting ROP-protect instructions, disable shrink wrap.  */
-  if (rs6000_rop_protect)
-flag_shrink_wrap = 0;
-
   /* If we can shrink-wrap the TOC register save separately, then use
  -msave-toc-indirect unless explicitly disabled.  */
   if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
diff --git a/gcc/testsuite/gcc.target/powerpc/pr101324.c 
b/gcc/testsuite/gcc.target/powerpc/pr101324.c
new file mode 100644
index 000..d27cc2876f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr101324.c
@@ -0,0 +1,17 @@
+/* { dg-require-effective-target rop_ok } */
+/* { dg-options "-O1 -mrop-protect -mdejagnu-cpu=power10" } */
+
+extern void foo (void);
+
+long int
+__attribute__ ((__optimize__ ("no-inline")))
+func (long int cond)
+{
+  if (cond)
+foo ();
+  return cond;
+}
+
+/* Ensure hashst comes after mflr and hashchk comes after ld 0,16(1).  */
+/* { dg-final { scan-assembler "mflr 0.*hashst 0," } } */
+/* { dg-final { scan-assembler "ld 0,16\\\(1\\\).*hashchk 0," } } */


Re: libgfortran.so SONAME and powerpc64le-linux ABI changes

2021-10-27 Thread Michael Meissner via Gcc
I've played with some patches to PowerPC to set the defaults for fortran.  But
without doing a full rebuild like you would do with a new distribution, I think
it will be problematical, unless you build everything with the default long
double set to IEEE 128-bit.

First off all, libquadmath is currently built on Linux 64-bit systems.  I never
removed building libquadmath once we got the official glibc 2.34 support

So to go in more detail of what I've tried.

I added an undocumented switch -mfortran that says set the defaults for
Fortran.  This switch would be used to build libgfortran, and also set with
TARGET_F951_OPTIONS for all Fortran invocations.

I tried to switch to float128_type_node instead of long_double_type_node.  I
ran into problems with gimplify in that it could not do a conversion from
_Float128 to float.  I suspect I didn't actually use the right type.

I then went to patches where -mfortran silently switches the long double type
to IEEE 128-bit.  There you get into various compatibility issues where the
linker complains that you are calling between the different long double types.

For instance because we are still building libquadmath, libquadmath is marked
as having long double being IBM 128-bit, but it is called from Fortran modules
that have long double being IEEE 128-bit.  I then did a build supressing
building libquadmath since I was using LE with glibc 2.34, and I got much
further.  This time instead of a lot of failures, I got 29 failures, due to
libgfortran still being marked as IBM long double and the fortran modules are
marked as IEEE long double.

Right now, the only way to avoid these things is to build the entire toolchain
defaulting to IEEE 128-bit.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


[Bug translation/66928] Typos in translatable strings

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66928

Eric Gallager  changed:

   What|Removed |Added

 Blocks||40883
 CC||egallager at gcc dot gnu.org
   Keywords||easyhack

--- Comment #1 from Eric Gallager  ---
are these still there?


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/79093] Hard coded plural in builtins.c:3203

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79093

Eric Gallager  changed:

   What|Removed |Added

 CC||egallager at gcc dot gnu.org
   Severity|normal  |trivial
   Keywords||diagnostic, easyhack
 Blocks||40883


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/80760] Suggested clarification of an error message

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80760

Eric Gallager  changed:

   What|Removed |Added

 Blocks||40883
 CC||egallager at gcc dot gnu.org
   Severity|normal  |trivial
   Keywords||easyhack


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/90041] Command line option without proper quoting in translation message

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90041

Eric Gallager  changed:

   What|Removed |Added

   Keywords||diagnostic, easyhack
 Blocks||40883
   Severity|normal  |trivial
 CC||egallager at gcc dot gnu.org,
   ||msebor at gcc dot gnu.org

--- Comment #6 from Eric Gallager  ---
(In reply to Jakub Jelinek from comment #1)
> We don't have any linter.  The previous changes were done by grepping stuff
> AFAIK.

Martin Sebor has added -Wformat-diag now; does that catch this?


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/90160] missing quote in diagnostic

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90160

Eric Gallager  changed:

   What|Removed |Added

 CC||egallager at gcc dot gnu.org
 Blocks||40883
   Severity|normal  |trivial
   Keywords||easyhack


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/90182] missing space in multiline string literal

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90182

Eric Gallager  changed:

   What|Removed |Added

   Keywords||easyhack
   Severity|normal  |trivial
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=79618
 Blocks||40883
 CC||egallager at gcc dot gnu.org


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/90164] wrong tense in ABI change diagnostic

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90164

Eric Gallager  changed:

   What|Removed |Added

   Severity|normal  |trivial
   Keywords||easyhack
 Blocks||40883
 CC||egallager at gcc dot gnu.org


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/90179] typo in diagnostic for unrecognized control register

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90179

Eric Gallager  changed:

   What|Removed |Added

 CC||egallager at gcc dot gnu.org
   Severity|normal  |trivial
 Blocks||40883
   Keywords||diagnostic, easyhack


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/93836] teach xgettext what HOST_WIDE_INT_PRINT means

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93836

Eric Gallager  changed:

   What|Removed |Added

   Keywords||diagnostic
 CC||egallager at gcc dot gnu.org,
   ||msebor at gcc dot gnu.org

--- Comment #3 from Eric Gallager  ---
(In reply to Roland Illig from comment #2)
> Thanks for the explanation. I think it might make sense to have a static
> analysis tool for cases like this, to prevent this mistake from the
> beginning, or at least be notified quickly, before the translators have to
> write bug reports. It's not the first time I saw this kind of bug. :)

Maybe Martin Sebor can add a check for it in -Wformat-diag?

[Bug translation/93852] typo: def instead of definition

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93852

Eric Gallager  changed:

   What|Removed |Added

   Severity|normal  |trivial
 Blocks||40883
   Keywords||diagnostic, easyhack
 CC||egallager at gcc dot gnu.org


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/93854] typo: defined here %qD

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93854

Eric Gallager  changed:

   What|Removed |Added

 Blocks||40883
 CC||egallager at gcc dot gnu.org
   Severity|normal  |trivial
   Keywords||easyhack


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/93855] typo: function argument vs. parameter

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93855

Eric Gallager  changed:

   What|Removed |Added

   Keywords||easyhack
 CC||egallager at gcc dot gnu.org
   Severity|normal  |trivial
 Blocks||40883


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/94698] Improper French translation for "override"

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94698

Eric Gallager  changed:

   What|Removed |Added

   Severity|normal  |trivial
 CC||egallager at gcc dot gnu.org
   Keywords||diagnostic, easyhack
 Blocks||40883, 81930

--- Comment #5 from Eric Gallager  ---
(In reply to Frederic Marchal from comment #4)
> French translation has been updated and submitted to the Translation Project.
> 
> Thanks for the report.

...so can this be closed now?


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81930
[Bug 81930] [meta-bug] Issues with -Weffc++

[Bug translation/40883] [meta-bug] Translation breakage with trivial fixes

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
Bug 40883 depends on bug 93759, which changed state.

Bug 93759 Summary: Invalid % in param
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93759

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug translation/93759] Invalid % in param

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93759

Eric Gallager  changed:

   What|Removed |Added

   Keywords||easyhack
 Status|NEW |RESOLVED
   Severity|normal  |trivial
 Resolution|--- |FIXED

--- Comment #8 from Eric Gallager  ---
(In reply to Roland Illig from comment #7)
> Is there still something to do for this bug?
> 
> de.po looks good now, having "c-no-format" instead of "c-format".

Well, you were the one to open it originally, so if you don't see anything left
to do, I guess that means it can be closed now...

[Bug translation/90183] ambiguous diagnostics "only available with"

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90183

Eric Gallager  changed:

   What|Removed |Added

   Severity|normal  |trivial
 Blocks||40883
 CC||egallager at gcc dot gnu.org
   Keywords||easyhack


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/79183] Hard coded plurals in gimple-ssa-sprintf.c:2050

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79183

Eric Gallager  changed:

   What|Removed |Added

 Blocks||40883
 CC||egallager at gcc dot gnu.org
   Keywords||easyhack
   Severity|normal  |trivial


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

Re: [PATCH] Enable vectorization for _Float16 floor/ceil/trunc/nearbyint/rint operations.

2021-10-27 Thread Hongtao Liu via Gcc-patches
On Mon, Oct 25, 2021 at 4:24 PM liuhongt  wrote:
>
>   Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>   Ok for trunk?
>
I'm going to check in this patch if there's no objection.
> gcc/ChangeLog:
>
> PR target/102464
> * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF): New
> function type.
> (V16HF_FTYPE_V16HF): Ditto.
> (V32HF_FTYPE_V32HF): Ditto.
> (V8HF_FTYPE_V8HF_ROUND): Ditto.
> (V16HF_FTYPE_V16HF_ROUND): Ditto.
> (V32HF_FTYPE_V32HF_ROUND): Ditto.
> * config/i386/i386-builtin.def ( IX86_BUILTIN_FLOORPH,
> IX86_BUILTIN_CEILPH, IX86_BUILTIN_TRUNCPH,
> IX86_BUILTIN_FLOORPH256, IX86_BUILTIN_CEILPH256,
> IX86_BUILTIN_TRUNCPH256, IX86_BUILTIN_FLOORPH512,
> IX86_BUILTIN_CEILPH512, IX86_BUILTIN_TRUNCPH512): New builtin.
> * config/i386/i386-builtins.c
> (ix86_builtin_vectorized_function): Enable vectorization for
> HFmode FLOOR/CEIL/TRUNC operation.
> * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle
> new builtins.
> * config/i386/sse.md (rint2, nearbyint2): Extend
> to vector HFmodes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr102464-vrndscaleph.c: New test.
> ---
>  gcc/config/i386/i386-builtin-types.def|   7 ++
>  gcc/config/i386/i386-builtin.def  |  11 ++
>  gcc/config/i386/i386-builtins.c   |  42 +++
>  gcc/config/i386/i386-expand.c |   3 +
>  gcc/config/i386/sse.md|  12 +-
>  .../gcc.target/i386/pr102464-vrndscaleph.c| 115 ++
>  6 files changed, 184 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr102464-vrndscaleph.c
>
> diff --git a/gcc/config/i386/i386-builtin-types.def 
> b/gcc/config/i386/i386-builtin-types.def
> index 4c355c587b5..e33f06ab30b 100644
> --- a/gcc/config/i386/i386-builtin-types.def
> +++ b/gcc/config/i386/i386-builtin-types.def
> @@ -1380,3 +1380,10 @@ DEF_FUNCTION_TYPE (USI, V32HF, V32HF, INT, USI, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, UHI, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, V32HF, V32HF, USI, INT)
>  DEF_FUNCTION_TYPE (V32HF, V32HF, INT, V32HF, USI, INT)
> +
> +DEF_FUNCTION_TYPE (V8HF, V8HF)
> +DEF_FUNCTION_TYPE (V16HF, V16HF)
> +DEF_FUNCTION_TYPE (V32HF, V32HF)
> +DEF_FUNCTION_TYPE_ALIAS (V8HF_FTYPE_V8HF, ROUND)
> +DEF_FUNCTION_TYPE_ALIAS (V16HF_FTYPE_V16HF, ROUND)
> +DEF_FUNCTION_TYPE_ALIAS (V32HF_FTYPE_V32HF, ROUND)
> diff --git a/gcc/config/i386/i386-builtin.def 
> b/gcc/config/i386/i386-builtin.def
> index 99217d08d37..d9eee3f373c 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -958,6 +958,10 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, 
> CODE_FOR_sse4_1_roundpd_vec_pack_sfix, "__buil
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_roundv2df2, 
> "__builtin_ia32_roundpd_az", IX86_BUILTIN_ROUNDPD_AZ, UNKNOWN, (int) 
> V2DF_FTYPE_V2DF)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_roundv2df2_vec_pack_sfix, 
> "__builtin_ia32_roundpd_az_vec_pack_sfix", 
> IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX, UNKNOWN, (int) V4SI_FTYPE_V2DF_V2DF)
>
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
> CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_floorph", 
> IX86_BUILTIN_FLOORPH, (enum rtx_code) ROUND_FLOOR, (int) 
> V8HF_FTYPE_V8HF_ROUND)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
> CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_ceilph", 
> IX86_BUILTIN_CEILPH, (enum rtx_code) ROUND_CEIL, (int) V8HF_FTYPE_V8HF_ROUND)
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
> CODE_FOR_avx512fp16_rndscalev8hf, "__builtin_ia32_truncph", 
> IX86_BUILTIN_TRUNCPH, (enum rtx_code) ROUND_TRUNC, (int) 
> V8HF_FTYPE_V8HF_ROUND)
> +
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
> "__builtin_ia32_floorps", IX86_BUILTIN_FLOORPS, (enum rtx_code) ROUND_FLOOR, 
> (int) V4SF_FTYPE_V4SF_ROUND)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
> "__builtin_ia32_ceilps", IX86_BUILTIN_CEILPS, (enum rtx_code) ROUND_CEIL, 
> (int) V4SF_FTYPE_V4SF_ROUND)
>  BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_roundps, 
> "__builtin_ia32_truncps", IX86_BUILTIN_TRUNCPS, (enum rtx_code) ROUND_TRUNC, 
> (int) V4SF_FTYPE_V4SF_ROUND)
> @@ -1090,6 +1094,10 @@ BDESC (OPTION_MASK_ISA_AVX, 0, 
> CODE_FOR_roundv4df2_vec_pack_sfix, "__builtin_ia3
>  BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_roundpd_vec_pack_sfix256, 
> "__builtin_ia32_floorpd_vec_pack_sfix256", 
> IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX256, (enum rtx_code) ROUND_FLOOR, (int) 
> V8SI_FTYPE_V4DF_V4DF_ROUND)
>  BDESC (OPTION_MASK_ISA_AVX, 0, CODE_FOR_avx_roundpd_vec_pack_sfix256, 
> "__builtin_ia32_ceilpd_vec_pack_sfix256", 
> IX86_BUILTIN_CEILPD_VEC_PACK_SFIX256, (enum rtx_code) ROUND_CEIL, (int) 
> V8SI_FTYPE_V4DF_V4DF_ROUND)
>
> +BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
> 

[Bug translation/90148] Closing quote in wrong position in plugin.c

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90148

Eric Gallager  changed:

   What|Removed |Added

 CC||egallager at gcc dot gnu.org
   Severity|normal  |trivial
 Blocks||40883
   Keywords||easyhack


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883
[Bug 40883] [meta-bug] Translation breakage with trivial fixes

[Bug translation/40883] [meta-bug] Translation breakage with trivial fixes

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40883

Eric Gallager  changed:

   What|Removed |Added

  Alias||trivial_translation_nits

--- Comment #10 from Eric Gallager  ---
alias trivial_translation_nits

Re: [RFC] Overflow check in simplifying exit cond comparing two IVs.

2021-10-27 Thread guojiufu via Gcc-patches



I just had a test on ppc64le, this patch pass bootstrap and regtest.
Is this patch OK for trunk?

Thanks for any comments.

BR,
Jiufu

On 2021-10-18 21:37, Jiufu Guo wrote:

With reference the discussions in:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572006.html
https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578672.html

Base on the patches in above discussion, we may draft a patch to fix 
the

issue.

In this patch, to make sure it is ok to change '{b0,s0} op {b1,s1}' to
'{b0,s0-s1} op {b1,0}', we also compute the condition which could 
assume

both 2 ivs are not overflow/wrap: the niter "of '{b0,s0-s1} op {b1,0}'"
< the niter "of untill wrap for iv0 or iv1".

Does this patch make sense?

BR,
Jiufu Guo

gcc/ChangeLog:

PR tree-optimization/100740
* tree-ssa-loop-niter.c (number_of_iterations_cond): Add
assume condition for combining of two IVs

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr100740.c: New test.
---
 gcc/tree-ssa-loop-niter.c | 103 +++---
 .../gcc.c-torture/execute/pr100740.c  |  11 ++
 2 files changed, 99 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr100740.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 75109407124..f2987a4448d 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1863,29 +1863,102 @@ number_of_iterations_cond (class loop *loop,

  provided that either below condition is satisfied:

-   a) the test is NE_EXPR;
-   b) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
+   a) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
+   b) assumptions in below table also need to be satisfied.
+
+   | iv0 | iv1 | assum (iv0step > iv1->step;
+   The second three rows: iv0->step < iv1->step.

  This rarely occurs in practice, but it is simple enough to 
manage.  */

   if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
 {
+  if (TREE_CODE (iv0->step) != INTEGER_CST
+ || TREE_CODE (iv1->step) != INTEGER_CST)
+   return false;
+  if (!iv0->no_overflow || !iv1->no_overflow)
+   return false;
+
   tree step_type = POINTER_TYPE_P (type) ? sizetype : type;
-  tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
-  iv0->step, iv1->step);
-
-  /* No need to check sign of the new step since below code takes 
care

-of this well.  */
-  if (code != NE_EXPR
- && (TREE_CODE (step) != INTEGER_CST
- || !iv0->no_overflow || !iv1->no_overflow))
+  tree step
+	= fold_binary_to_constant (MINUS_EXPR, step_type, iv0->step, 
iv1->step);

+
+  if (code != NE_EXPR && tree_int_cst_sign_bit (step))
return false;

-  iv0->step = step;
-  if (!POINTER_TYPE_P (type))
-   iv0->no_overflow = false;
+  bool positive0 = !tree_int_cst_sign_bit (iv0->step);
+  bool positive1 = !tree_int_cst_sign_bit (iv1->step);

-  iv1->step = build_int_cst (step_type, 0);
-  iv1->no_overflow = true;
+  /* Cases in rows 2 and 4 of above table.  */
+  if ((positive0 && !positive1) || (!positive0 && positive1))
+   {
+ iv0->step = step;
+ iv1->step = build_int_cst (step_type, 0);
+ return number_of_iterations_cond (loop, type, iv0, code, iv1,
+   niter, only_exit, every_iteration);
+   }
+
+  affine_iv i_0, i_1;
+  class tree_niter_desc num;
+  i_0 = *iv0;
+  i_1 = *iv1;
+  i_0.step = step;
+  i_1.step = build_int_cst (step_type, 0);
+  if (!number_of_iterations_cond (loop, type, _0, code, _1, 
,

+ only_exit, every_iteration))
+   return false;
+
+  affine_iv i0, i1;
+  class tree_niter_desc num_wrap;
+  i0 = *iv0;
+  i1 = *iv1;
+
+  /* Reset iv0 and iv1 to calculate the niter which cause 
overflow.  */

+  if (tree_int_cst_lt (i1.step, i0.step))
+   {
+ if (positive0 && positive1)
+   i0.step = build_int_cst (step_type, 0);
+ else if (!positive0 && !positive1)
+   i1.step = build_int_cst (step_type, 0);
+ if (code == NE_EXPR)
+   code = LT_EXPR;
+   }
+  else
+   {
+ if (positive0 && positive1)
+   i1.step = build_int_cst (step_type, 0);
+ else if (!positive0 && !positive1)
+   i0.step = build_int_cst (step_type, 0);
+ gcc_assert (code == NE_EXPR);
+ code = GT_EXPR;
+   }
+
+  /* Calculate the niter which cause overflow.  */
+  if (!number_of_iterations_cond (loop, type, , code, , 
_wrap,

+ only_exit, every_iteration))
+   return false;
+
+  /* Make assumption there is no overflow. */
+  tree assum
+   = 

[Bug c++/58798] class with a class reference member generates a warning that ought to be disableable with -Wpacked

2021-10-27 Thread egallager at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58798

Eric Gallager  changed:

   What|Removed |Added

 CC||egallager at gcc dot gnu.org,
   ||steven.spark at gmail dot com
Summary|class with a class  |class with a class
   |reference member generates  |reference member generates
   |unjustified warning |a warning that ought to be
   ||disableable with -Wpacked

--- Comment #10 from Eric Gallager  ---
(In reply to Eric Gallager from comment #6)
> (In reply to Szikra from comment #5)
> > (In reply to Jonathan Wakely from comment #4)
> > > Because the warning isn't controlled by the -Wpacked option. If it was, it
> > > would say [-Wpacked] after the warning. I think that's a bug, every 
> > > warning
> > > should be controlled by some -Wxxx option.
> > 
> > Thanks, good to know. So does this require a separate bug report, or can
> > someone change the status and confirm this one?
> 
> Confirmed that the warning should be controlled by -Wpacked.

Updating the title accordingly.

[Bug testsuite/102946] [12 Regression] gcc.dg/vect/pr101145_1.c etc. FAIL

2021-10-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102946

--- Comment #6 from Jiu Fu Guo  ---
Hi Rainer and Richard,
Thanks for working on this PR.

The intention of these test cases (pr101145*) is to test if the number 
of iterations can be calculated for the loop with the 'until wrap' 
condition.
So, I'm thinking we may be able to update the cases like:
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Symbolic number of iterations is" 2
"vect" } } */

[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2021-10-27 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

--- Comment #10 from H.J. Lu  ---
(In reply to Andrew Cooper from comment #8)
> Actually, there is a (possibly pre-existing) diagnostics issue:
> 
> $ cat proto.c
> static void __attribute__((cf_check)) foo(void);
> static void __attribute__((unused)) foo(void)
> {
> }
> void (*ptr)(void) = foo;
> 
> $ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr
> -fcf-check-attribute=no -c proto.c -o proto.o
> proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)'
> 2 | static void __attribute__((unused)) foo(void)
>   | ^~~
> proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)'
> 1 | static void __attribute__((cf_check)) foo(void);
>   |   ^~~
> 
> 
> The diagnostic complaining that the forward declaration doesn't match the
> definition gives 'void(void)' as the type in both cases, leaving out the
> fact that they differ by cf_check-ness.

Please try the v2 patch.

[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2021-10-27 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

H.J. Lu  changed:

   What|Removed |Added

  Attachment #51672|0   |1
is obsolete||

--- Comment #9 from H.J. Lu  ---
Created attachment 51687
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51687=edit
The v2 patch to add -fcf-check-attribute=[yes|no]

Re: [PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]

2021-10-27 Thread Kewen.Lin via Gcc-patches
on 2021/10/28 上午9:43, David Edelsohn wrote:
> On Wed, Oct 27, 2021 at 9:30 PM Kewen.Lin  wrote:
>>
>> Hi David,
>>
>> Thanks for the review!
>>
>> on 2021/10/27 下午9:12, David Edelsohn wrote:
>>> On Sun, Oct 24, 2021 at 11:04 PM Kewen.Lin  wrote:

 Hi,

 As PR102767 shows, the commit r12-3482 exposed one ICE in function
 rs6000_builtin_vectorization_cost.  We claims V1TI supports movmisalign
 on rs6000 (See define_expand "movmisalign"), so it return true in
 rs6000_builtin_support_vector_misalignment for misalign 8.  Later in
 the cost querying rs6000_builtin_vectorization_cost, we don't have
 the arms to handle the V1TI input under (TARGET_VSX &&
 TARGET_ALLOW_MOVMISALIGN).

 The proposed fix is to add the consideration for V1TI, simply make it
 as the cost for doubleword which is apparently bigger than the cost of
 scalar, won't have the vectorization to happen, just to keep consistency
 and avoid ICE.  Another thought is to not support movmisalign for V1TI,
 but it sounds like a bad idea since it doesn't match the reality.

 Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
 powerpc64-linux-gnu P8.

 Is it ok for trunk?

 BR,
 Kewen
 -
 gcc/ChangeLog:

 PR target/102767
 * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): 
 Consider
 V1T1 mode for unaligned load and store.

 gcc/testsuite/ChangeLog:

 PR target/102767
 * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file.

 diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
 index b7ea1483da5..73d3e06c3fc 100644
 --- a/gcc/config/rs6000/rs6000.c
 +++ b/gcc/config/rs6000/rs6000.c
 @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum 
 vect_cost_for_stmt type_of_cost,
 if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
   {
 elements = TYPE_VECTOR_SUBPARTS (vectype);
 -   if (elements == 2)
 +   /* See PR102767, consider V1TI to keep consistency.  */
 +   if (elements == 2 || elements == 1)
   /* Double word aligned.  */
   return 4;

 @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum 
 vect_cost_for_stmt type_of_cost,

  if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
{
 -elements = TYPE_VECTOR_SUBPARTS (vectype);
 -if (elements == 2)
 -  /* Double word aligned.  */
 -  return 2;
 +   elements = TYPE_VECTOR_SUBPARTS (vectype);
 +   /* See PR102767, consider V1TI to keep consistency.  */
 +   if (elements == 2 || elements == 1)
 + /* Double word aligned.  */
 + return 2;
>>>
>>> This section of the patch incorrectly changes the indentation.  Please
>>> use the correct indentation.
>>>
>>
>> The indentation change is intentional since the original identation is
>> wrong (more than 8 spaces leading the lines), there are more wrong
>> identation lines above the first changed line, but I thought it seems a
>> bad idea to fix them too when they are unrelated to what this patch
>> wants to fix, so I left them alone.
>>
>> With the above clarification, may I push this patch without any updates
>> for the mentioned indentation issue?
> 
> If you correct the indentation, you should adjust it for the entire
> block, not just the lines that you change.  If you want to fix the
> entire block to TAB+spaces as well, okay.  You didn't mention that you
> were fixing the indentation in the explanation of the patch.
> 

Sorry for not mentioning that.  Got it, I'll reformat the entire block then,
also with additional notes in the commit log.

Thanks again.

BR,
Kewen

> Thank, David
> 
>>

  if (elements == 4)
{
 diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 
 b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
 new file mode 100644
 index 000..a4122482989
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
 @@ -0,0 +1,21 @@
 +! { dg-require-effective-target powerpc_vsx_ok }
 +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" }
 +
 +INTERFACE
 +  FUNCTION elemental_mult (a, b, c)
 +type(*), DIMENSION(..) :: a, b, c
 +  END
 +END INTERFACE
 +
 +allocatable  z
 +integer, dimension(2,2) :: a, b
 +call test_CFI_address
 +contains
 +  subroutine test_CFI_address
 +if (elemental_mult (z, x, y) .ne. 0) stop
 +a = reshape ([4,3,2,1], [2,2])
 +b = reshape ([2,3,4,5], [2,2])
 +if (elemental_mult (i, a, b) .ne. 0) stop
 +  end
 +end
 +

>>>
>>> The patch is okay with the 

Re: [RFC] Don't move cold code out of loop by checking bb count

2021-10-27 Thread Xionghu Luo via Gcc-patches



On 2021/10/27 20:54, Jan Hubicka wrote:
>> Hi,
>>
>> On 2021/9/28 20:09, Richard Biener wrote:
>>> On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo  wrote:

 Update the patch to v3, not sure whether you prefer the paste style
 and continue to link the previous thread as Segher dislikes this...


 [PATCH v3] Don't move cold code out of loop by checking bb count


 Changes:
 1. Handle max_loop in determine_max_movement instead of
 outermost_invariant_loop.
 2. Remove unnecessary changes.
 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in 
 can_sm_ref_p.
 4. "gsi_next ();" in move_computations_worker is kept since it caused
 infinite loop when implementing v1 and the iteration is missed to be
 updated actually.

 v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
 v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html

 There was a patch trying to avoid move cold block out of loop:

 https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html

 Richard suggested to "never hoist anything from a bb with lower execution
 frequency to a bb with higher one in LIM invariantness_dom_walker
 before_dom_children".

 In gimple LIM analysis, add find_coldest_out_loop to move invariants to
 expected target loop, if profile count of the loop bb is colder
 than target loop preheader, it won't be hoisted out of loop.
 Likely for store motion, if all locations of the REF in loop is cold,
 don't do store motion of it.

 SPEC2017 performance evaluation shows 1% performance improvement for
 intrate GEOMEAN and no obvious regression for others.  Especially,
 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
 largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
 on P8LE.

 gcc/ChangeLog:

 * loop-invariant.c (find_invariants_bb): Check profile count
 before motion.
 (find_invariants_body): Add argument.
 * tree-ssa-loop-im.c (find_coldest_out_loop): New function.
 (determine_max_movement): Use find_coldest_out_loop.
 (move_computations_worker): Adjust and fix iteration udpate.
 (execute_sm_exit): Check pointer validness.
 (class ref_in_loop_hot_body): New functor.
 (ref_in_loop_hot_body::operator): New.
 (can_sm_ref_p): Use for_all_locs_in_loop.

 gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/recip-3.c: Adjust.
 * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
 * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
 * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
 ---
  gcc/loop-invariant.c   | 10 ++--
  gcc/tree-ssa-loop-im.c | 61 --
  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c|  2 +-
  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++
  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++
  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 +
  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++
  7 files changed, 165 insertions(+), 8 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c

 diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
 index fca0c2b24be..5c3be7bf0eb 100644
 --- a/gcc/loop-invariant.c
 +++ b/gcc/loop-invariant.c
 @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
 always_reached, bool always_executed)
 call.  */

  static void
 -find_invariants_bb (basic_block bb, bool always_reached, bool 
 always_executed)
 +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
 +   bool always_executed)
  {
rtx_insn *insn;
 +  basic_block preheader = loop_preheader_edge (loop)->src;
 +
 +  if (preheader->count > bb->count)
 +return;

FOR_BB_INSNS (bb, insn)
  {
 @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block 
 *body,
unsigned i;

for (i = 0; i < loop->num_nodes; i++)
 -find_invariants_bb (body[i],
 -   bitmap_bit_p (always_reached, i),
 +find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
 bitmap_bit_p (always_executed, i));
  }

 diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
 index 4b187c2cdaf..655fab03442 100644
 --- a/gcc/tree-ssa-loop-im.c
 +++ b/gcc/tree-ssa-loop-im.c
 @@ -417,6 +417,28 @@ movement_possibility (gimple 

[Bug target/102976] MMA test case emits wrong code when building a vector pair

2021-10-27 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102976

Peter Bergner  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2021-October
   ||/582749.html
   Target Milestone|--- |12.0

Re: [PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]

2021-10-27 Thread David Edelsohn via Gcc-patches
On Wed, Oct 27, 2021 at 9:30 PM Kewen.Lin  wrote:
>
> Hi David,
>
> Thanks for the review!
>
> on 2021/10/27 下午9:12, David Edelsohn wrote:
> > On Sun, Oct 24, 2021 at 11:04 PM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> As PR102767 shows, the commit r12-3482 exposed one ICE in function
> >> rs6000_builtin_vectorization_cost.  We claims V1TI supports movmisalign
> >> on rs6000 (See define_expand "movmisalign"), so it return true in
> >> rs6000_builtin_support_vector_misalignment for misalign 8.  Later in
> >> the cost querying rs6000_builtin_vectorization_cost, we don't have
> >> the arms to handle the V1TI input under (TARGET_VSX &&
> >> TARGET_ALLOW_MOVMISALIGN).
> >>
> >> The proposed fix is to add the consideration for V1TI, simply make it
> >> as the cost for doubleword which is apparently bigger than the cost of
> >> scalar, won't have the vectorization to happen, just to keep consistency
> >> and avoid ICE.  Another thought is to not support movmisalign for V1TI,
> >> but it sounds like a bad idea since it doesn't match the reality.
> >>
> >> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> >> powerpc64-linux-gnu P8.
> >>
> >> Is it ok for trunk?
> >>
> >> BR,
> >> Kewen
> >> -
> >> gcc/ChangeLog:
> >>
> >> PR target/102767
> >> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): 
> >> Consider
> >> V1T1 mode for unaligned load and store.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> PR target/102767
> >> * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file.
> >>
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index b7ea1483da5..73d3e06c3fc 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum 
> >> vect_cost_for_stmt type_of_cost,
> >> if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
> >>   {
> >> elements = TYPE_VECTOR_SUBPARTS (vectype);
> >> -   if (elements == 2)
> >> +   /* See PR102767, consider V1TI to keep consistency.  */
> >> +   if (elements == 2 || elements == 1)
> >>   /* Double word aligned.  */
> >>   return 4;
> >>
> >> @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum 
> >> vect_cost_for_stmt type_of_cost,
> >>
> >>  if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
> >>{
> >> -elements = TYPE_VECTOR_SUBPARTS (vectype);
> >> -if (elements == 2)
> >> -  /* Double word aligned.  */
> >> -  return 2;
> >> +   elements = TYPE_VECTOR_SUBPARTS (vectype);
> >> +   /* See PR102767, consider V1TI to keep consistency.  */
> >> +   if (elements == 2 || elements == 1)
> >> + /* Double word aligned.  */
> >> + return 2;
> >
> > This section of the patch incorrectly changes the indentation.  Please
> > use the correct indentation.
> >
>
> The indentation change is intentional since the original identation is
> wrong (more than 8 spaces leading the lines), there are more wrong
> identation lines above the first changed line, but I thought it seems a
> bad idea to fix them too when they are unrelated to what this patch
> wants to fix, so I left them alone.
>
> With the above clarification, may I push this patch without any updates
> for the mentioned indentation issue?

If you correct the indentation, you should adjust it for the entire
block, not just the lines that you change.  If you want to fix the
entire block to TAB+spaces as well, okay.  You didn't mention that you
were fixing the indentation in the explanation of the patch.

Thank, David

>
> >>
> >>  if (elements == 4)
> >>{
> >> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 
> >> b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
> >> new file mode 100644
> >> index 000..a4122482989
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
> >> @@ -0,0 +1,21 @@
> >> +! { dg-require-effective-target powerpc_vsx_ok }
> >> +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" }
> >> +
> >> +INTERFACE
> >> +  FUNCTION elemental_mult (a, b, c)
> >> +type(*), DIMENSION(..) :: a, b, c
> >> +  END
> >> +END INTERFACE
> >> +
> >> +allocatable  z
> >> +integer, dimension(2,2) :: a, b
> >> +call test_CFI_address
> >> +contains
> >> +  subroutine test_CFI_address
> >> +if (elemental_mult (z, x, y) .ne. 0) stop
> >> +a = reshape ([4,3,2,1], [2,2])
> >> +b = reshape ([2,3,4,5], [2,2])
> >> +if (elemental_mult (i, a, b) .ne. 0) stop
> >> +  end
> >> +end
> >> +
> >>
> >
> > The patch is okay with the indentation correction.
> >
> > Thanks, David
> >
>
> Thanks!
>
> BR,
> Kewen


[PATCH] rs6000: MMA test case emits wrong code when building a vector pair

2021-10-27 Thread Peter Bergner via Gcc-patches
PR102976 shows a test case where we generate wrong code when building
a vector pair from 2 vector registers.  The bug here is that with unlucky
register assignments, we can clobber one of the input operands before
we write both registers of the output operand.  The solution is to use
early-clobbers in the assemble pair and accumulator patterns.

This passed bootstrap and regtesting with no regressions and our
OpenBLAS team has confirmed it fixes the issues they reported.
Ok for mainline?

Ok for GCC 11 too after a few days on trunk?

Peter


gcc/
PR target/102976
* config/rs6000/mma.md (*vsx_assemble_pair): Add early-clobber for
output operand.
(*mma_assemble_acc): Likewise.

gcc/testsuite/
PR target/102976
* gcc.target/powerpc/pr102976.c: New test.

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 1990a2183f6..f0ea99963f7 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -339,7 +339,7 @@ (define_expand "vsx_assemble_pair"
 })
 
 (define_insn_and_split "*vsx_assemble_pair"
-  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+  [(set (match_operand:OO 0 "vsx_register_operand" "=")
(unspec:OO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa")
(match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")]
UNSPEC_MMA_ASSEMBLE))]
@@ -405,7 +405,7 @@ (define_expand "mma_assemble_acc"
 })
 
 (define_insn_and_split "*mma_assemble_acc"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=")
(unspec:XO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa")
(match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")
(match_operand:V16QI 3 "mma_assemble_input_operand" "mwa")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr102976.c 
b/gcc/testsuite/gcc.target/powerpc/pr102976.c
new file mode 100644
index 000..a8de8f056f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr102976.c
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#include 
+void
+bug (__vector_pair *dst)
+{
+  register vector unsigned char vec0 asm ("vs44");
+  register vector unsigned char vec1 asm ("vs32");
+  __builtin_vsx_build_pair (dst, vec0, vec1);
+}
+
+/* { dg-final { scan-assembler-times {xxlor[^,]*,44,44} 1 } } */
+/* { dg-final { scan-assembler-times {xxlor[^,]*,32,32} 1 } } */


[Bug middle-end/102977] New: [GCC12 regression] vectorizer failed to generate complex fma.

2021-10-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102977

Bug ID: 102977
   Summary: [GCC12 regression] vectorizer failed to generate
complex fma.
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: crazylht at gmail dot com
  Target Milestone: ---
Target: aarch64-linux-gnu

#include

#include

void
foo (_Complex _Float16* __restrict a, _Complex _Float16* b, _Complex _Float16
*c)
{
for (int i =0 ; i != 8; i++)
  a[i] += b[i] * c[i];
}


gcc11.2 generate 

foo:
mov x3, 16
ptrue   p1.b, all
whilelo p0.h, xzr, x3
ld1hz2.h, p0/z, [x1]
ld1hz1.h, p0/z, [x2]
ld1hz0.h, p0/z, [x0]
fcmla   z0.h, p1/m, z1.h, z2.h, #0
fcmla   z0.h, p1/m, z1.h, z2.h, #90
st1hz0.h, p0, [x0]
cntbx4
cnthx5
add x0, x0, x4
add x1, x1, x4
add x2, x2, x4
whilelo p0.h, x5, x3
b.none  .L1
ld1hz2.h, p0/z, [x1]
ld1hz1.h, p0/z, [x2]
ld1hz0.h, p0/z, [x0]
fcmla   z0.h, p1/m, z1.h, z2.h, #0
fcmla   z0.h, p1/m, z1.h, z2.h, #90
st1hz0.h, p0, [x0]
.L1:
ret


current trunk

foo:
ptrue   p1.h, vl8
ptrue   p0.b, all
ld2h{z2.h - z3.h}, p1/z, [x1]
ld2h{z0.h - z1.h}, p1/z, [x2]
ld2h{z16.h - z17.h}, p1/z, [x0]
fmulz6.h, z0.h, z3.h
movprfx z7, z16
fmlaz7.h, p0/m, z0.h, z2.h
fmlaz6.h, p0/m, z1.h, z2.h
movprfx z4, z7
fmlsz4.h, p0/m, z1.h, z3.h
faddz5.h, z6.h, z17.h
st2h{z4.h - z5.h}, p1, [x0]
ret


options: -Ofast -march=armv8.3-a+sve+fp16
refer to https://godbolt.org/z/4PPKnWvc1

Re: [PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]

2021-10-27 Thread Kewen.Lin via Gcc-patches
Hi David,

Thanks for the review!

on 2021/10/27 下午9:12, David Edelsohn wrote:
> On Sun, Oct 24, 2021 at 11:04 PM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR102767 shows, the commit r12-3482 exposed one ICE in function
>> rs6000_builtin_vectorization_cost.  We claims V1TI supports movmisalign
>> on rs6000 (See define_expand "movmisalign"), so it return true in
>> rs6000_builtin_support_vector_misalignment for misalign 8.  Later in
>> the cost querying rs6000_builtin_vectorization_cost, we don't have
>> the arms to handle the V1TI input under (TARGET_VSX &&
>> TARGET_ALLOW_MOVMISALIGN).
>>
>> The proposed fix is to add the consideration for V1TI, simply make it
>> as the cost for doubleword which is apparently bigger than the cost of
>> scalar, won't have the vectorization to happen, just to keep consistency
>> and avoid ICE.  Another thought is to not support movmisalign for V1TI,
>> but it sounds like a bad idea since it doesn't match the reality.
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
>> powerpc64-linux-gnu P8.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>> PR target/102767
>> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): 
>> Consider
>> V1T1 mode for unaligned load and store.
>>
>> gcc/testsuite/ChangeLog:
>>
>> PR target/102767
>> * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file.
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index b7ea1483da5..73d3e06c3fc 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum 
>> vect_cost_for_stmt type_of_cost,
>> if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
>>   {
>> elements = TYPE_VECTOR_SUBPARTS (vectype);
>> -   if (elements == 2)
>> +   /* See PR102767, consider V1TI to keep consistency.  */
>> +   if (elements == 2 || elements == 1)
>>   /* Double word aligned.  */
>>   return 4;
>>
>> @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum 
>> vect_cost_for_stmt type_of_cost,
>>
>>  if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
>>{
>> -elements = TYPE_VECTOR_SUBPARTS (vectype);
>> -if (elements == 2)
>> -  /* Double word aligned.  */
>> -  return 2;
>> +   elements = TYPE_VECTOR_SUBPARTS (vectype);
>> +   /* See PR102767, consider V1TI to keep consistency.  */
>> +   if (elements == 2 || elements == 1)
>> + /* Double word aligned.  */
>> + return 2;
> 
> This section of the patch incorrectly changes the indentation.  Please
> use the correct indentation.
> 

The indentation change is intentional since the original identation is
wrong (more than 8 spaces leading the lines), there are more wrong
identation lines above the first changed line, but I thought it seems a
bad idea to fix them too when they are unrelated to what this patch
wants to fix, so I left them alone.

With the above clarification, may I push this patch without any updates
for the mentioned indentation issue?

>>
>>  if (elements == 4)
>>{
>> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 
>> b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
>> new file mode 100644
>> index 000..a4122482989
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
>> @@ -0,0 +1,21 @@
>> +! { dg-require-effective-target powerpc_vsx_ok }
>> +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" }
>> +
>> +INTERFACE
>> +  FUNCTION elemental_mult (a, b, c)
>> +type(*), DIMENSION(..) :: a, b, c
>> +  END
>> +END INTERFACE
>> +
>> +allocatable  z
>> +integer, dimension(2,2) :: a, b
>> +call test_CFI_address
>> +contains
>> +  subroutine test_CFI_address
>> +if (elemental_mult (z, x, y) .ne. 0) stop
>> +a = reshape ([4,3,2,1], [2,2])
>> +b = reshape ([2,3,4,5], [2,2])
>> +if (elemental_mult (i, a, b) .ne. 0) stop
>> +  end
>> +end
>> +
>>
> 
> The patch is okay with the indentation correction.
> 
> Thanks, David
> 

Thanks!

BR,
Kewen


[Bug testsuite/102944] Many gcc.dg/Wstringop-overflow-*.c failures

2021-10-27 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102944

--- Comment #1 from Hongtao.liu  ---
Mine.

Re: [PATCH] AVX512FP16: Optimize _Float16 reciprocal for div and sqrt

2021-10-27 Thread Hongtao Liu via Gcc-patches
On Tue, Oct 26, 2021 at 5:51 PM Hongyu Wang via Gcc-patches
 wrote:
>
> Hi,
>
> For _Float16 type, add insn and expanders to optimize x / y to
> x * rcp (y), and x / sqrt (y) to x * rsqrt (y).
> As Half float only have minor precision difference between div and
> mul * rcp, there is no need for Newton-Rhapson approximation.
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde.
> Ok for master?
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/i386.c (use_rsqrt_p): Add mode parameter, enable
>   HFmode rsqrt without TARGET_SSE_MATH.
> (ix86_optab_supported_p): Refactor rint, adjust floor, ceil,
> btrunc condition to be restricted by -ftrapping-math, adjust
> use_rsqrt_p function call.
> * config/i386/i386.md (rcphf2): New define_insn.
> (rsqrthf2): Likewise.
> * config/i386/sse.md (div3): Change VF2H to VF2.
> (div3): New expander for HF mode.
> (rsqrt2): Likewise.
> (*avx512fp16_vmrcpv8hf2): New define_insn for rpad pass.
> (*avx512fp16_vmrsqrtv8hf2): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512fp16-recip-1.c: New test.
> * gcc.target/i386/avx512fp16-recip-2.c: Ditto.
> * gcc.target/i386/pr102464.c: Add -fno-trapping-math.
> ---
>  gcc/config/i386/i386.c| 29 +++---
>  gcc/config/i386/i386.md   | 44 -
>  gcc/config/i386/sse.md| 63 +++-
>  .../gcc.target/i386/avx512fp16-recip-1.c  | 43 
>  .../gcc.target/i386/avx512fp16-recip-2.c  | 97 +++
>  gcc/testsuite/gcc.target/i386/pr102464.c  |  2 +-
>  6 files changed, 258 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-recip-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-recip-2.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 299e1ab2621..c5789365d3b 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -18905,9 +18905,10 @@ ix86_vectorize_builtin_scatter (const_tree vectype,
> 1.0/sqrt.  */
>
>  static bool
> -use_rsqrt_p ()
> +use_rsqrt_p (machine_mode mode)
>  {
> -  return (TARGET_SSE && TARGET_SSE_MATH
> +  return ((mode == HFmode
> +  || (TARGET_SSE && TARGET_SSE_MATH))
>   && flag_finite_math_only
>   && !flag_trapping_math
>   && flag_unsafe_math_optimizations);
> @@ -23603,29 +23604,27 @@ ix86_optab_supported_p (int op, machine_mode mode1, 
> machine_mode,
>return opt_type == OPTIMIZE_FOR_SPEED;
>
>  case rint_optab:
> -  if (mode1 == HFmode)
> -   return true;
> -  else if (SSE_FLOAT_MODE_P (mode1)
> -  && TARGET_SSE_MATH
> -  && !flag_trapping_math
> -  && !TARGET_SSE4_1)
> +  if (SSE_FLOAT_MODE_P (mode1)
> + && TARGET_SSE_MATH
> + && !flag_trapping_math
> + && !TARGET_SSE4_1
> + && mode1 != HFmode)
> return opt_type == OPTIMIZE_FOR_SPEED;
>return true;
>
>  case floor_optab:
>  case ceil_optab:
>  case btrunc_optab:
> -  if (mode1 == HFmode)
> -   return true;
> -  else if (SSE_FLOAT_MODE_P (mode1)
> -  && TARGET_SSE_MATH
> -  && !flag_trapping_math
> -  && TARGET_SSE4_1)
> +  if (((SSE_FLOAT_MODE_P (mode1)
> +   && TARGET_SSE_MATH
> +   && TARGET_SSE4_1)
> +  || mode1 == HFmode)
> + && !flag_trapping_math)
> return true;
>return opt_type == OPTIMIZE_FOR_SPEED;
>
>  case rsqrt_optab:
> -  return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p ();
> +  return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (mode1);
>
>  default:
>return true;
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index e733a40fc90..11535df5425 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -8417,11 +8417,27 @@
> (match_operand:XF 2 "register_operand")))]
>"TARGET_80387")
>
> +/* There is no more precision loss than Newton-Rhapson approximation
> +  when using HFmode rcp/rsqrt, so do the transformation directly under
> +  TARGET_RECIP_DIV and fast-math.  */
>  (define_expand "divhf3"
>[(set (match_operand:HF 0 "register_operand")
> (div:HF (match_operand:HF 1 "register_operand")
>(match_operand:HF 2 "nonimmediate_operand")))]
> -  "TARGET_AVX512FP16")
> +  "TARGET_AVX512FP16"
> +{
> +  if (TARGET_RECIP_DIV
> +  && optimize_insn_for_speed_p ()
> +  && flag_finite_math_only && !flag_trapping_math
> +  && flag_unsafe_math_optimizations)
> +{
> +  rtx op = gen_reg_rtx (HFmode);
> +  operands[2] = force_reg (HFmode, operands[2]);
> +  emit_insn (gen_rcphf2 (op, operands[2]));
> +  emit_insn (gen_mulhf3 (operands[0], operands[1], op));
> +  DONE;
> +}
> +})
>
>  (define_expand "div3"
>[(set 

[Bug fortran/91497] -Wconversion warns when doing explicit type conversion

2021-10-27 Thread sandra at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91497

sandra at gcc dot gnu.org changed:

   What|Removed |Added

 CC||sandra at gcc dot gnu.org

--- Comment #23 from sandra at gcc dot gnu.org ---
Created attachment 51686
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51686=edit
add dg-require-effective-target to testcase

The new testcase is FAILing on x86 targets configured without REAL*16 support,
like so:

/path/to/gcc/testsuite/gfortran.dg/pr91497.f90:14:14: Error: Old-style type
declaration REAL*16 not supported at (1)
/path/to/gcc/testsuite/gfortran.dg/pr91497.f90:21:31: Error: Invalid real kind
16 at (1)
compiler exited with status 1

I've got this patch to add some dg-require-effective-target tests, but maybe it
would be better to fix the testcase so that it does not depend on
target-specific floating-point types?  Or add a second testcase that doesn't
require all the target restrictions, for broader test coverage on more
platforms?

[Bug target/102976] MMA test case emits wrong code when building a vector pair

2021-10-27 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102976

Peter Bergner  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed|2021-10-27 00:00:00 |2021-10-28

[Bug target/102976] MMA test case emits wrong code when building a vector pair

2021-10-27 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102976

Peter Bergner  changed:

   What|Removed |Added

 CC||chip.kerchner at ibm dot com,
   ||dje at gcc dot gnu.org,
   ||raji at linux dot vnet.ibm.com,
   ||segher at gcc dot gnu.org,
   ||wschmidt at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |bergner at gcc dot 
gnu.org
 Target||powerpc*-*-*
   Last reconfirmed||2021-10-27

--- Comment #1 from Peter Bergner  ---
Mine.

[Bug target/102976] New: MMA test case emits wrong code when building a vector pair

2021-10-27 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102976

Bug ID: 102976
   Summary: MMA test case emits wrong code when building a vector
pair
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bergner at gcc dot gnu.org
  Target Milestone: ---

The following test case generates wrong code using trunk and GCC11:

[bergner@pike debug]$ cat bug.c 
#include 
void
bug (__vector_pair *dst)
{
  register vector unsigned char vec0 asm ("vs44");
  register vector unsigned char vec1 asm ("vs32");
  __builtin_vsx_build_pair (dst, vec0, vec1);
}
[bergner@pike debug]$ gcc -S -O2 -mcpu=power10 bug.c
[bergner@pike debug]$ cat bug.s 
bug:
xxlor 0,32,32
xxlor 1,32,32
stxvp 0,0(3)
blr

The above only copies one of the inputs into the output pair, when it should
copy both r44 and r32, like so:
bug:
xxlor 0,44,44
xxlor 1,32,32
stxvp 0,0(3)
blr

This is due to a missing early clobber in the MMA patterns.  I have a fix I'm
testing.

[PATCH] Add GNU_PROPERTY_1_GLIBC_2_NEEDED

2021-10-27 Thread H.J. Lu via Gcc
Motivations:

1. Some binaries which require new ELF features, like DT_RELR, only
work with the new glibc binary.  They crash at run-time with the older
glibc binaries.
2. Somes binaries compiled with the new language features, like C2X
printf specifiers, only generate correct results with the new glibc
binary.  Since we don't add new glibc versions to the printf function
family, they generate incorrect results at run-time with the older
glibc binaries.

Here is a proposal to encode glibc version dependencies in relocatable
objects:

/* The glibc 2 minor versions needed by the object file. */
 #define GNU_PROPERTY_1_GLIBC_2_NEEDED   (GNU_PROPERTY_UINT32_OR_LO + 1)

/* The lowest glibc 2 minor version.  */
 #define GNU_PROPERTY_1_GLIBC_2_NEEDED_MINOR_BASE 35

/* Set if the object file requires glibc 2 minor version M.  */
 #define GNU_PROPERTY_1_GLIBC_2_NEEDED_MINOR_VERSION(m)  \
  (1U << ((m) - GNU_PROPERTY_1_GLIBC_2_NEEDED_MINOR_BASE))

Linker adds glibc versions in GNU_PROPERTY_1_GLIBC_2_NEEDED to the
.gnu.version_r section and removes GNU_PROPERTY_1_GLIBC_2_NEEDED note
when generating shared libraries and executables.

[hjl@gnu-cfl-2 elfvers-1]$ ./readelf -n x.o

Displaying notes found in: .note.gnu.property
  OwnerData sizeDescription
  GNU  0x0020   NT_GNU_PROPERTY_TYPE_0
  Properties: x86 ISA used:
x86 feature used: x86
[hjl@gnu-cfl-2 elfvers-1]$ ./readelf -n glibc-2-minor-1.o

Displaying notes found in: .note.gnu.property
  OwnerData sizeDescription
  GNU  0x0010   NT_GNU_PROPERTY_TYPE_0
  Properties: 1_glibc_2_needed: 2.35, 2.38
  GNU  0x0020   NT_GNU_PROPERTY_TYPE_0
  Properties: x86 ISA used:
x86 feature used: x86
[hjl@gnu-cfl-2 elfvers-1]$ make x
gcc -B./ -o x x.o glibc-2-minor-1.o
[hjl@gnu-cfl-2 elfvers-1]$ ./readelf -n --version-info x
Version symbols section '.gnu.version' contains 4 entries:
 Addr: 0x004004ae  Offset: 0x0004ae  Link: 6 (.dynsym)
  000:   0 (*local*)   2 (GLIBC_2.34)3 (GLIBC_2.2.5)   1 (*global*)

Version needs section '.gnu.version_r' contains 1 entry:
 Addr: 0x004004b8  Offset: 0x0004b8  Link: 7 (.dynstr)
  00: Version: 1  File: libc.so.6  Cnt: 4
  0x0010:   Name: GLIBC_2.38  Flags: none  Version: 5
  0x0020:   Name: GLIBC_2.35  Flags: none  Version: 4
  0x0030:   Name: GLIBC_2.2.5  Flags: none  Version: 3
  0x0040:   Name: GLIBC_2.34  Flags: none  Version: 2
...
[hjl@gnu-cfl-2 elfvers-1]$ ./x
./x: /lib64/libc.so.6: version `GLIBC_2.38' not found (required by ./x)
./x: /lib64/libc.so.6: version `GLIBC_2.35' not found (required by ./x)
[hjl@gnu-cfl-2 elfvers-1]$
---
 object-files.tex | 50 
 1 file changed, 50 insertions(+)

diff --git a/object-files.tex b/object-files.tex
index 834f508..41a434c 100644
--- a/object-files.tex
+++ b/object-files.tex
@@ -444,6 +444,7 @@ The following program property types are defined:
   \texttt{GNU_PROPERTY_UINT32_AND_HI} & \texttt{0xb0007fff} \\
   \texttt{GNU_PROPERTY_UINT32_OR_LO} & \texttt{0xb0008000} \\
   \texttt{GNU_PROPERTY_1_NEEDED} & \texttt{GNU_PROPERTY_UINT32_OR_LO + 0} 
\\
+  \texttt{GNU_PROPERTY_1_GLIBC_2_NEEDED} & 
\texttt{GNU_PROPERTY_UINT32_OR_LO + 1} \\
   \texttt{GNU_PROPERTY_UINT32_OR_HI} & \texttt{0xb000} \\
   \texttt{GNU_PROPERTY_LOPROC} & \texttt{0xc000} \\
   \texttt{GNU_PROPERTY_HIPROC} & \texttt{0xdfff} \\
@@ -492,6 +493,11 @@ The following program property types are defined:
  \item[GNU_PROPERTY_1_NEEDED]
The \code{pr_data} field contains a 4-byte integer to indicate the
properties needed by object file.
+ \item[GNU_PROPERTY_1_GLIBC_2_NEEDED]
+   The \code{pr_data} field contains a 4-byte integer to indicate the
+   minor versions of the GNU C library version 2 needed by object file.
+   This property is only valid in relocatable object files.  Linker
+   should not add it to executables nor shared libraries.
  \item[GNU_PROPERTY_LOPROC through GNU_PROPERTY_HIPROC]
Values in this inclusive range are reserved for processor-specific
semantics.
@@ -528,6 +534,50 @@ The following bits are defined for 
\code{GNU_PROPERTY_1_NEEDED}:
\end{sloppypar}
 \end{description}
 
+The following values are defined for \code{GNU_PROPERTY_1_GLIBC_2_NEEDED}:
+
+\begin{table}[H]
+\Hrule
+  \caption{GNU_PROPERTY_1_GLIBC_2_NEEDED Values}
+  \begin{center}
+\begin{footnotesize}
+  \begin{tabular}[t]{l|l}
+\multicolumn{1}{c}{Name} & \multicolumn{1}{c}{Value} \\
+\hline
+ \texttt{GNU_PROPERTY_1_GLIBC_2_NEEDED_MINOR_BASE} & \texttt{35} \\
+ \texttt{GNU_PROPERTY_1_GLIBC_2_NEEDED_MINOR_VERSION(m)}
+   & \texttt{1U << (m - 35)} \\
+  \end{tabular}
+\end{footnotesize}
+  \end{center}
+\Hrule
+\end{table}
+
+\begin{description}
+ \item[GNU_PROPERTY_1_GLIBC_2_NEEDED_MINOR_BASE]
+   \begin{sloppypar}
+   This specifies the 

Re: [COMMITTED] Kill second order relations in the path solver.

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
On Wed, 27 Oct 2021 20:13:21 +0200
Aldy Hernandez via Gcc-patches  wrote:

[would have to think about this some more but it's late here. Nits:]

> diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
> index 2acf375ca9a..0ad4f7a9495 100644
> --- a/gcc/value-relation.cc
> +++ b/gcc/value-relation.cc
> @@ -1297,8 +1297,9 @@ path_oracle::killing_def (tree ssa)
>fprintf (dump_file, "\n");
>  }
>  
> +  unsigned v = SSA_NAME_VERSION (ssa);
>bitmap b = BITMAP_ALLOC (_bitmaps);
> -  bitmap_set_bit (b, SSA_NAME_VERSION (ssa));
> +  bitmap_set_bit (b, v);
>equiv_chain *ptr = (equiv_chain *) obstack_alloc (_chain_obstack,
>   sizeof (equiv_chain));
>ptr->m_names = b;
> @@ -1306,6 +1307,24 @@ path_oracle::killing_def (tree ssa)
>ptr->m_next = m_equiv.m_next;
>m_equiv.m_next = ptr;
>bitmap_ior_into (m_equiv.m_names, b);
> +
> +  // Walk the relation list an remove SSA from any relations.

s/an /and /

> +  if (!bitmap_bit_p (m_relations.m_names, v))
> +return;
> +
> +  bitmap_clear_bit (m_relations.m_names, v);

IIRC bitmap_clear_bit returns true if the bit was set, false otherwise,
so should be used as if(!bitmap_clear_bit) above.
I would not be surprised if this generates better code as we probably
do not grok to optimize the !bit_p else clear_bit combo. Shame (?).

> +  relation_chain **prev = &(m_relations.m_head);

s/[()]//
thanks,

> +  relation_chain *next = NULL;
> +  for (relation_chain *ptr = m_relations.m_head; ptr; ptr = next)
> +{
> +  gcc_checking_assert (*prev == ptr);
> +  next = ptr->m_next;
> +  if (SSA_NAME_VERSION (ptr->op1 ()) == v
> +   || SSA_NAME_VERSION (ptr->op2 ()) == v)
> + *prev = ptr->m_next;
> +  else
> + prev = &(ptr->m_next);
> +}
>  }
>  
>  // Register relation K between SSA1 and SSA2, resolving unknowns by



[Bug target/102953] Improvements to CET-IBT and ENDBR generation

2021-10-27 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102953

--- Comment #8 from Andrew Cooper  ---
Actually, there is a (possibly pre-existing) diagnostics issue:

$ cat proto.c
static void __attribute__((cf_check)) foo(void);
static void __attribute__((unused)) foo(void)
{
}
void (*ptr)(void) = foo;

$ gcc -Wall -Os -fcf-protection=branch -mmanual-endbr -fcf-check-attribute=no
-c proto.c -o proto.o
proto.c:2:37: error: conflicting types for 'foo'; have 'void(void)'
2 | static void __attribute__((unused)) foo(void)
  | ^~~
proto.c:1:39: note: previous declaration of 'foo' with type 'void(void)'
1 | static void __attribute__((cf_check)) foo(void);
  |   ^~~


The diagnostic complaining that the forward declaration doesn't match the
definition gives 'void(void)' as the type in both cases, leaving out the fact
that they differ by cf_check-ness.

Re: RISCV: Add zmmul extension

2021-10-27 Thread Jim Wilson
On Wed, Oct 27, 2021 at 12:14 AM Kito Cheng  wrote:

> Otherwise it is LGTM, but I'm just surprised it's still 0.1 and not frozen
> yet.
>

We should have binutils support first before we have gcc support.
Otherwise that may lead to binutils errors later when zmmul gets passed
down to binutils.  I didn't see a binutils patch yet.

Jim


RE: __builtin_addc support??

2021-10-27 Thread sotrdg sotrdg via Gcc
HEY. ZERO COST ABSTRACTIONS lol

Sent from Mail for Windows

From: Segher Boessenkool
Sent: Wednesday, October 27, 2021 19:17
To: sotrdg sotrdg
Cc: gcc@gcc.gnu.org; 
gcc-requ...@gcc.gnu.org; 
gcc-h...@gcc.gnu.org
Subject: Re: __builtin_addc support??

On Wed, Oct 27, 2021 at 04:12:27PM +, sotrdg sotrdg via Gcc-help wrote:
> 79173 – add-with-carry and subtract-with-borrow support (x86_64 and others) 
> (gnu.org)
>
> What I find quite interesting is things like this.
>
> Since llvm clang provides __builtin_addc __builtin_subc for all targets. Can 
> we provide something similar? Since currently no solutions we can access 
> carry flag besides x86

Why?  We have __builtin_add_overflow, which is a smaller factor, and
enough to build up any bigger factors with.  You can easily write the
same thing in standard C of course, which often is a better plan.

If you want exact machine insns as output, write those, i.e., write
assembler code, not C.  Builtins are not there to please the "C is a
portable assembler" crowd: they are there to expose functionality you
cannot (conveniently) get using just pure standard C.


Segher



RE: __builtin_addc support??

2021-10-27 Thread sotrdg sotrdg via Gcc
LOL

https://github.com/tearosccebe/fast_io/blob/4ca355fcbf31aa26a0259ad09671eaab899930fc/include/fast_io_core_impl/intrinsics.h#L366

You are wrong dude.
Give me solution for addcarry

Sent from Mail for Windows


From: Segher Boessenkool 
Sent: Wednesday, October 27, 2021 7:16:41 PM
To: sotrdg sotrdg 
Cc: gcc@gcc.gnu.org ; gcc-requ...@gcc.gnu.org 
; gcc-h...@gcc.gnu.org 
Subject: Re: __builtin_addc support??

On Wed, Oct 27, 2021 at 04:12:27PM +, sotrdg sotrdg via Gcc-help wrote:
> 79173 – add-with-carry and subtract-with-borrow support (x86_64 and others) 
> (gnu.org)
>
> What I find quite interesting is things like this.
>
> Since llvm clang provides __builtin_addc __builtin_subc for all targets. Can 
> we provide something similar? Since currently no solutions we can access 
> carry flag besides x86

Why?  We have __builtin_add_overflow, which is a smaller factor, and
enough to build up any bigger factors with.  You can easily write the
same thing in standard C of course, which often is a better plan.

If you want exact machine insns as output, write those, i.e., write
assembler code, not C.  Builtins are not there to please the "C is a
portable assembler" crowd: they are there to expose functionality you
cannot (conveniently) get using just pure standard C.


Segher


Re: [PATCH] rs6000: Fix bootstrap (libffi)

2021-10-27 Thread Segher Boessenkool
Hi!

On Wed, Oct 27, 2021 at 11:44:59AM -0700, H.J. Lu wrote:
> On Mon, Oct 25, 2021 at 4:39 PM Segher Boessenkool
>  wrote:
> > This fixes bootstrap for the current problems building libffi.
> >
> > I'll work on getting this into upstream as well.  If the maintainers
> > want it done differently, at least we have bootstrap working again
> > until then.

> I am checking in this patch:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582717.html

Ah thanks :-)  I thought I'd get it fixed upstream soon, but that might
not happen (or not in time, etc.)  This is a good idea no matter what.


Segher


Re: __builtin_addc support??

2021-10-27 Thread Segher Boessenkool
On Wed, Oct 27, 2021 at 04:12:27PM +, sotrdg sotrdg via Gcc-help wrote:
> 79173 – add-with-carry and subtract-with-borrow support (x86_64 and others) 
> (gnu.org)
> 
> What I find quite interesting is things like this.
> 
> Since llvm clang provides __builtin_addc __builtin_subc for all targets. Can 
> we provide something similar? Since currently no solutions we can access 
> carry flag besides x86

Why?  We have __builtin_add_overflow, which is a smaller factor, and
enough to build up any bigger factors with.  You can easily write the
same thing in standard C of course, which often is a better plan.

If you want exact machine insns as output, write those, i.e., write
assembler code, not C.  Builtins are not there to please the "C is a
portable assembler" crowd: they are there to expose functionality you
cannot (conveniently) get using just pure standard C.


Segher


[Bug target/102974] GCC optimization is very poor for add carry and multiplication combos

2021-10-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974

--- Comment #7 from cqwrteur  ---
(In reply to cqwrteur from comment #6)
> (In reply to Andrew Pinski from comment #5)
> > (In reply to cqwrteur from comment #4)
> > > (In reply to cqwrteur from comment #3)
> > > > (In reply to Andrew Pinski from comment #2)
> > > > > There might be another bug about _addcarryx_u64 already.
> > > > 
> > > > This is 32 bit addcarry.
> > > 
> > > but yeah. GCC does not perform optimizations very well to add carries and
> > > mul + recognize >>64u <<64u patterns
> > 
> > I mean all of _addcarryx_* intrinsics.
> 
> https://godbolt.org/z/qq3nb49Eq
> https://godbolt.org/z/cqoYG35jx
> Also this is weird. just extract part of code into function generates
> different assembly for __builtin_bit_cast. It must be a inliner bug.

my fault for misreading(In reply to Andrew Pinski from comment #5)
> (In reply to cqwrteur from comment #4)
> > (In reply to cqwrteur from comment #3)
> > > (In reply to Andrew Pinski from comment #2)
> > > > There might be another bug about _addcarryx_u64 already.
> > > 
> > > This is 32 bit addcarry.
> > 
> > but yeah. GCC does not perform optimizations very well to add carries and
> > mul + recognize >>64u <<64u patterns
> 
> I mean all of _addcarryx_* intrinsics.

This example is also interesting that -O2, -O3, -Ofast generates much worse
assembly than -O1. There is no point for doing SIMD for things like this

Re: dejagnu version update?

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc
On Sat, 4 Aug 2018 18:32:24 +0200
Bernhard Reutner-Fischer  wrote:

> On Tue, 16 May 2017 at 21:08, Mike Stump  wrote:
> >
> > On May 16, 2017, at 5:16 AM, Jonathan Wakely  wrote: 
> >  
> > > The change I care about in 1.5.3  
> >
> > So, we haven't talked much about the version people want most.  If we 
> > update, might as well get something that more people care about.  1.5.3 is 
> > in ubuntu LTS 16.04 and Fedora 24, so it's been around awhile.  SUSU is 
> > said to be using 1.6, in the post 1.4.4 systems.  People stated they want 
> > 1.5.2 and 1.5.3, so, I'm inclined to say, let's shoot for 1.5.3 when we do 
> > update.
> >
> > As for the machines in the FSF compile farm, nah, tail wagging the dog.  
> > I'd rather just update the requirement, and the owners or users of those 
> > machines can install a new dejagnu, if they are using one that is too old 
> > and they want to support testing gcc.  
> 
> So.. let me ping that, again, now that another year has passed :)

or another 3 or 4 :)
> 
> PS: Recap: https://gcc.gnu.org/ml/fortran/2012-03/msg00094.html was
> later applied as
> http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5481f29161477520c691d525653323b82fa47ad7
> and was part of the dejagnu-1.5.2 release from 2015. Jonathan requires
> 1.5.3 for libstdc++ testing.
(i.e.
http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347
 )
> The libdirs fix would allow us to remove the 150 occurrences of the
> load_gcc_lib hack, refer to the patch to the fortran list back then.
> AFAIR this is still not fixed: +# BUG: gcc-dg calls
> gcc-set-multilib-library-path but does not load gcc-defs!
> 
> debian-stable (i think 9 ATM), Ubuntu LTS ship versions recent enough
> to contain both fixes. Commercial distros seem to ship fixed versions,
> too.

It seems in May 2020 there was a thread on gcc with about the same
subject: https://gcc.gnu.org/pipermail/gcc/2020-May/232427.html
where Mike suggests to have approved to bump the required minimum
version to 1.5.3.
So who's in the position to update the
https://gcc.gnu.org/install/prerequisites.html
to s/1.4.4/1.5.3/g && git commit -m 'bump dejagnu required version' ?

Just asking patiently and politely.
I don't want to rush anybody into such a bump :)

But as you may remember, folks routinely run afoul of using too old
versions (without the 5256bd8 multilib prepending for example, recently
someone doing ARM stuff IIRC) so a bump would just be fair IMHO.

Maybe now, for gcc-12, is the time to bump prerequisites to 1.5.3?

thanks and sorry for my impatience (and, once again, the noise).
cheers,


Re: dejagnu version update?

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
On Sat, 4 Aug 2018 18:32:24 +0200
Bernhard Reutner-Fischer  wrote:

> On Tue, 16 May 2017 at 21:08, Mike Stump  wrote:
> >
> > On May 16, 2017, at 5:16 AM, Jonathan Wakely  wrote: 
> >  
> > > The change I care about in 1.5.3  
> >
> > So, we haven't talked much about the version people want most.  If we 
> > update, might as well get something that more people care about.  1.5.3 is 
> > in ubuntu LTS 16.04 and Fedora 24, so it's been around awhile.  SUSU is 
> > said to be using 1.6, in the post 1.4.4 systems.  People stated they want 
> > 1.5.2 and 1.5.3, so, I'm inclined to say, let's shoot for 1.5.3 when we do 
> > update.
> >
> > As for the machines in the FSF compile farm, nah, tail wagging the dog.  
> > I'd rather just update the requirement, and the owners or users of those 
> > machines can install a new dejagnu, if they are using one that is too old 
> > and they want to support testing gcc.  
> 
> So.. let me ping that, again, now that another year has passed :)

or another 3 or 4 :)
> 
> PS: Recap: https://gcc.gnu.org/ml/fortran/2012-03/msg00094.html was
> later applied as
> http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5481f29161477520c691d525653323b82fa47ad7
> and was part of the dejagnu-1.5.2 release from 2015. Jonathan requires
> 1.5.3 for libstdc++ testing.
(i.e.
http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347
 )
> The libdirs fix would allow us to remove the 150 occurrences of the
> load_gcc_lib hack, refer to the patch to the fortran list back then.
> AFAIR this is still not fixed: +# BUG: gcc-dg calls
> gcc-set-multilib-library-path but does not load gcc-defs!
> 
> debian-stable (i think 9 ATM), Ubuntu LTS ship versions recent enough
> to contain both fixes. Commercial distros seem to ship fixed versions,
> too.

It seems in May 2020 there was a thread on gcc with about the same
subject: https://gcc.gnu.org/pipermail/gcc/2020-May/232427.html
where Mike suggests to have approved to bump the required minimum
version to 1.5.3.
So who's in the position to update the
https://gcc.gnu.org/install/prerequisites.html
to s/1.4.4/1.5.3/g && git commit -m 'bump dejagnu required version' ?

Just asking patiently and politely.
I don't want to rush anybody into such a bump :)

But as you may remember, folks routinely run afoul of using too old
versions (without the 5256bd8 multilib prepending for example, recently
someone doing ARM stuff IIRC) so a bump would just be fair IMHO.

Maybe now, for gcc-12, is the time to bump prerequisites to 1.5.3?

thanks and sorry for my impatience (and, once again, the noise).
cheers,


[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-27 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

H.J. Lu  changed:

   What|Removed |Added

  Attachment #51684|0   |1
is obsolete||

--- Comment #19 from H.J. Lu  ---
Created attachment 51685
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51685=edit
The v4 patch to add -mharden-sls=

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-27 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #18 from Andrew Cooper  ---
Yes to both.

[r12-4744 Regression] FAIL: gcc.dg/guality/pr41616-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects -DPREVENT_OPTIMIZATION execution test on Linux/x86_64

2021-10-27 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

2f0b6a971a051f6e687a15dd2fa4bf431381e551 is the first bad commit
commit 2f0b6a971a051f6e687a15dd2fa4bf431381e551
Author: Aldy Hernandez 
Date:   Wed Oct 27 18:22:29 2021 +0200

Reorder relation calculating code in the path solver.

caused

FAIL: gcc.dg/guality/pr41616-1.c   -O2  -DPREVENT_OPTIMIZATION  execution test
FAIL: gcc.dg/guality/pr41616-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION execution test
FAIL: gcc.dg/guality/pr41616-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION execution test
FAIL: gcc.dg/guality/pr41616-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  execution 
test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4744/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr41616-1.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[Bug c++/96441] ICE in tree check: expected integer_cst, have cond_expr in get_len, at tree.h:5954

2021-10-27 Thread arthur.j.odwyer at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96441

Arthur O'Dwyer  changed:

   What|Removed |Added

 CC||arthur.j.odwyer at gmail dot 
com

--- Comment #2 from Arthur O'Dwyer  ---
Still there in trunk. Here's a very slightly reduced version:

// https://godbolt.org/z/8arT4Gn6P
enum a : int;
template;
template<> enum a : int {c};



:2:35: error: expected unqualified-id before ';' token
2 | template;
  |   ^
:3:26: internal compiler error: Segmentation fault
3 | template<> enum a : int {c};
  |  ^
0x20037b9 internal_error(char const*, ...)
???:0
0x8c7150 build_enumerator(tree_node*, tree_node*, tree_node*, tree_node*,
unsigned int)
???:0
0xa0bea5 c_parse_file()
???:0
0xb92e22 c_common_parse_file()
???:0
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-27 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #17 from H.J. Lu  ---
[hjl@gnu-tgl-2 pr102952]$ cat z2.i
extern void (*fptr) (int, int);

void
foo (int x, int y)
{
  fptr (x, y);
}
[hjl@gnu-tgl-2 pr102952]$ make z2.s
/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -O2
-mindirect-branch=thunk -mindirect-branch-cs-prefix -mharden-sls=all -S z2.i
[hjl@gnu-tgl-2 pr102952]$ cat z2.s
.file   "z2.i"
.text
.p2align 4
.globl  foo
.type   foo, @function
foo:
.LFB0:
.cfi_startproc
movqfptr(%rip), %rax
jmp __x86_indirect_thunk_rax

Is int3 needed here?

.cfi_endproc
.LFE0:
.size   foo, .-foo
.section   
.text.__x86_indirect_thunk_rax,"axG",@progbits,__x86_indirect_thunk_rax,comdat
.globl  __x86_indirect_thunk_rax
.hidden __x86_indirect_thunk_rax
.type   __x86_indirect_thunk_rax, @function
__x86_indirect_thunk_rax:
.LFB1:
.cfi_startproc
call.LIND1
.LIND0:
pause
lfence
jmp .LIND0
.LIND1:
.cfi_def_cfa_offset 16
mov %rax, (%rsp)
ret
int3   <<<<<<<<<<<<<<<<<<<< Is this needed?
.cfi_endproc
.LFE1:
.ident  "GCC: (GNU) 12.0.0 20211027 (experimental)"
.section.note.GNU-stack,"",@progbits
[hjl@gnu-tgl-2 pr102952]$

[Bug target/102974] GCC optimization is very poor for add carry and multiplication combos

2021-10-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974

--- Comment #6 from cqwrteur  ---
(In reply to Andrew Pinski from comment #5)
> (In reply to cqwrteur from comment #4)
> > (In reply to cqwrteur from comment #3)
> > > (In reply to Andrew Pinski from comment #2)
> > > > There might be another bug about _addcarryx_u64 already.
> > > 
> > > This is 32 bit addcarry.
> > 
> > but yeah. GCC does not perform optimizations very well to add carries and
> > mul + recognize >>64u <<64u patterns
> 
> I mean all of _addcarryx_* intrinsics.

https://godbolt.org/z/qq3nb49Eq
https://godbolt.org/z/cqoYG35jx
Also this is weird. just extract part of code into function generates different
assembly for __builtin_bit_cast. It must be a inliner bug.

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-27 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #16 from H.J. Lu  ---
(In reply to Andrew Cooper from comment #15)
> So this is the irritating corner case where the two options are linked.
> 
> *If* we are using -mindirect-branch-cs-prefix, then we intend to rewrite
> `jmp __x86_indirect_thunk_*` to `jmp *%reg` or `lfence; jmp *%reg` based on
> boot time configuration/settings.
> 
> In this case, we still need to fit the `int3` for SLS protection in
> somewhere.
> 
> The two options are:
> 1) Special case `jmp __x86_indirect_thunk_*` as if it were an indirect jump
> and write out an `int3` directly, or

I can do this.

> 2) Pad one extra %cs prefix on the jmp, so we've got space to insert one at
> boot time.

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-27 Thread andrew.cooper3 at citrix dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #15 from Andrew Cooper  ---
So this is the irritating corner case where the two options are linked.

*If* we are using -mindirect-branch-cs-prefix, then we intend to rewrite `jmp
__x86_indirect_thunk_*` to `jmp *%reg` or `lfence; jmp *%reg` based on boot
time configuration/settings.

In this case, we still need to fit the `int3` for SLS protection in somewhere.

The two options are:
1) Special case `jmp __x86_indirect_thunk_*` as if it were an indirect jump and
write out an `int3` directly, or
2) Pad one extra %cs prefix on the jmp, so we've got space to insert one at
boot time.

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-27 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #14 from H.J. Lu  ---
(In reply to peterz from comment #13)
> (In reply to H.J. Lu from comment #12)
> > (In reply to peterz from comment #9)
> > > Created attachment 51683 [details]
> > > kernel patch to test -mharden-sls=all
> > > 
> > > $ make O=defconfig CC=gcc-12.0.0 arch/x86/entry/common.o
> > > ...
> > > arch/x86/entry/common.o: warning: objtool: do_SYSENTER_32()+0x1b:
> > > unreachable instruction
> > 
> > Please try the v2 patch.
> 
> Per comment #6 this should be v3, no? Anyway, the good news is that I now
> seem to have a kernel image with lots of extra int3 instructions, but all in
> the right place.
> 
> *However*, I seem to be missing a few:
> 
>   36f4:   41 5f   pop%r15
>   36f6:   e9 00 00 00 00  jmp36fb
> <__do_set_cpus_allowed+0x5b>
> 36f7: R_X86_64_PLT32__x86_indirect_thunk_rax-0x4

This is a direct branch.

>   36fb:   48 8b 87 90 02 00 00mov0x290(%rdi),%rax
> 
> There should be one after the jmp __x86_indirect_thunk_* thingy. I'll do an
> objtool patch to search for missing int3, but that'll have to wait until
> tomorrow, it's past midnight.

[PATCH] or1k: Add return address argument to _mcount call

2021-10-27 Thread Stafford Horne via Gcc-patches
This fixes an issue in the glibc port I am working on where the build
fails due to the warning:

  error: calling ‘__builtin_return_address’ with a nonzero argument is unsafe 
[-Werror=frame-address]

This is due to how the current implementation of _mcount in glibc uses
__builtin_return_address with a count argument of 1.

Fix that by passing the value of LR_REGNUM to the _mcount function,
effectivtly providing the value _mcount is after.

This is an ABI change, but I think it's OK because the glibc port for
or1k is not yet upstreamed.  Also, I think just adding an argument
should not break anything anyway.

gcc/ChangeLog:

* config/or1k/or1k.h (PROFILE_HOOK): Add return address argument
to _mcount.
---
 gcc/config/or1k/or1k.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/or1k/or1k.h b/gcc/config/or1k/or1k.h
index fe01ab81ead..4603cb67160 100644
--- a/gcc/config/or1k/or1k.h
+++ b/gcc/config/or1k/or1k.h
@@ -387,9 +387,10 @@ do {\
profiling a function entry.  */
 #define PROFILE_HOOK(LABEL)\
   {\
-rtx fun;   \
+rtx fun, ra;   \
+ra = get_hard_reg_initial_val (Pmode, LR_REGNUM);  \
 fun = gen_rtx_SYMBOL_REF (Pmode, "_mcount");   \
-emit_library_call (fun, LCT_NORMAL, VOIDmode); \
+emit_library_call (fun, LCT_NORMAL, VOIDmode, ra, Pmode);  \
   }
 
 /* All the work is done in PROFILE_HOOK, but this is still required.  */
-- 
2.31.1



[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-27 Thread peterz at infradead dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #13 from peterz at infradead dot org ---
(In reply to H.J. Lu from comment #12)
> (In reply to peterz from comment #9)
> > Created attachment 51683 [details]
> > kernel patch to test -mharden-sls=all
> > 
> > $ make O=defconfig CC=gcc-12.0.0 arch/x86/entry/common.o
> > ...
> > arch/x86/entry/common.o: warning: objtool: do_SYSENTER_32()+0x1b:
> > unreachable instruction
> 
> Please try the v2 patch.

Per comment #6 this should be v3, no? Anyway, the good news is that I now seem
to have a kernel image with lots of extra int3 instructions, but all in the
right place.

*However*, I seem to be missing a few:

  36f4:   41 5f   pop%r15
  36f6:   e9 00 00 00 00  jmp36fb <__do_set_cpus_allowed+0x5b>
36f7: R_X86_64_PLT32__x86_indirect_thunk_rax-0x4
  36fb:   48 8b 87 90 02 00 00mov0x290(%rdi),%rax

There should be one after the jmp __x86_indirect_thunk_* thingy. I'll do an
objtool patch to search for missing int3, but that'll have to wait until
tomorrow, it's past midnight.

[Bug c++/102975] New: Local alias diagnosed as unused when used in failing constraint

2021-10-27 Thread johelegp at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102975

Bug ID: 102975
   Summary: Local alias diagnosed as unused when used in failing
constraint
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: johelegp at gmail dot com
CC: johelegp at gmail dot com
  Target Milestone: ---

See https://godbolt.org/z/aePcW8WjK.

```C++
template concept Never = false;
template concept C = Never;
void f() {
  struct X {
using type = int;
  };
  static_assert(not C);
}
```

```
: In function 'void f()':
:5:11: warning: typedef 'using type = int' locally defined but not used
[-Wunused-local-typedefs]
5 | using type = int;
  |   ^~~~
```

[Bug tree-optimization/102969] [12 regression] g++.dg/warn/Wstringop-overflow-4.C fails after r12-4726

2021-10-27 Thread msebor at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102969

Martin Sebor  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 Blocks||88443
   Assignee|unassigned at gcc dot gnu.org  |msebor at gcc dot 
gnu.org


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88443
[Bug 88443] [meta-bug] bogus/missing -Wstringop-overflow warnings

Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-27 Thread Jason Merrill via Gcc-patches

On 10/27/21 17:10, Patrick Palka wrote:

On Wed, 27 Oct 2021, Jason Merrill wrote:


On 10/27/21 14:54, Patrick Palka wrote:

On Tue, 26 Oct 2021, Jakub Jelinek wrote:


On Tue, Oct 26, 2021 at 05:07:43PM -0400, Patrick Palka wrote:

The performance impact of the other calls to
cxx_eval_outermost_const_expr
from p_c_e_1 is probably already mostly mitigated by the constexpr call
cache and the fact that we try to evaluate all calls to constexpr
functions during cp_fold_function anyway (at least with -O).  So trial


constexpr function bodies don't go through cp_fold_function
(intentionally,
so that we don't optimize away UB), the bodies are copied before the trees
of the
normal copy are folded.


Ah right, I had forgotten about that..

Here's another approach that doesn't need to remove trial evaluation for
&&/||.  The idea is to first quietly check if the second operand is
potentially constant _before_ performing trial evaluation of the first
operand.  This speeds up the case we care about (both operands are
potentially constant) without regressing any diagnostics.  We have to be
careful about emitting bogus diagnostics when tf_error is set, hence the
first hunk below which makes p_c_e_1 always proceed quietly first, and
replay noisily in case of error (similar to how satisfaction works).

Would something like this be preferable?


Seems plausible, though doubling the number of stack frames is a downside.


Whoops, good point..  The noisy -> quiet adjustment only needs to
be performed during the outermost call to p_c_e_1, and not also during
each recursive call.  The revised diff below fixes this thinko, and so
only a single extra stack frame is needed AFAICT.


What did you think of Jakub's suggestion of linearizing the terms?


IIUC that would fix the quadraticness, but it wouldn't address that
we end up evaluating the entire expression twice, once during the trial
evaluation of each term from p_c_e_1 and again during the proper
evaluation of the entire expression.  It'd be nice if we could somehow
avoid the double evaluation, as in the approach below (or in the first
patch).


OK with more comments to explain the tf_error hijinks.


-- >8 --

gcc/cp/ChangeLog:

* constexpr.c (potential_constant_expression_1): When tf_error is
set, proceed quietly first and return true if successful.
: When tf_error is not set, check potentiality
of the second operand before performing trial evaluation of the
first operand rather than after.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 6f83d303cdd..7855a948baf 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -8892,13 +8892,16 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict, bool now,
tmp = boolean_false_node;
  truth:
{
-   tree op = TREE_OPERAND (t, 0);
-   if (!RECUR (op, rval))
+   tree op0 = TREE_OPERAND (t, 0);
+   tree op1 = TREE_OPERAND (t, 1);
+   if (!RECUR (op0, rval))
  return false;
+   if (!(flags & tf_error) && RECUR (op1, rval))
+ return true;
if (!processing_template_decl)
- op = cxx_eval_outermost_constant_expr (op, true);
-   if (tree_int_cst_equal (op, tmp))
- return RECUR (TREE_OPERAND (t, 1), rval);
+ op0 = cxx_eval_outermost_constant_expr (op0, true);
+   if (tree_int_cst_equal (op0, tmp))
+ return (flags & tf_error) ? RECUR (op1, rval) : false;
else
  return true;
}
@@ -9107,6 +9110,14 @@ bool
  potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool 
now,
 tsubst_flags_t flags)
  {
+  if (flags & tf_error)
+{
+  flags &= ~tf_error;
+  if (potential_constant_expression_1 (t, want_rval, strict, now, flags))
+   return true;
+  flags |= tf_error;
+}
+
tree target = NULL_TREE;
return potential_constant_expression_1 (t, want_rval, strict, now,
  flags, );




Re: [PATCH,FORTRAN] Fix memory leak of gsymbol

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
ping
[I'll rebase and retest this too since it's been a while.
Ok if it passes?]

On Sun, 21 Oct 2018 16:04:34 +0200
Bernhard Reutner-Fischer  wrote:

> Hi!
> 
> Regtested on x86_64-unknown-linux, installing on
> aldot/fortran-fe-stringpool.
> 
> We did not free global symbols. For a simplified abstract_type_3.f03
> valgrind reports:
> 
> 96 bytes in 1 blocks are still reachable in loss record 461 of 602
>at 0x48377D5: calloc (vg_replace_malloc.c:711)
>by 0x21257C3: xcalloc (xmalloc.c:162)
>by 0x98611B: gfc_get_gsymbol(char const*) (symbol.c:4341)
>by 0x932C58: parse_module() (parse.c:5912)
>by 0x9336F8: gfc_parse_file() (parse.c:6236)
>by 0x991449: gfc_be_parse_file() (f95-lang.c:204)
>by 0x11D8EDE: compile_file() (toplev.c:455)
>by 0x11DB9C3: do_compile() (toplev.c:2170)
>by 0x11DBCAF: toplev::main(int, char**) (toplev.c:2305)
>by 0x2045D37: main (main.c:39)
> 
> This patch reduces leaks to
> 
>  LEAK SUMMARY:
> definitely lost: 344 bytes in 1 blocks
> indirectly lost: 3,024 bytes in 4 blocks
>   possibly lost: 0 bytes in 0 blocks
> -   still reachable: 1,576,174 bytes in 2,277 blocks
> +   still reachable: 1,576,078 bytes in 2,276 blocks
>  suppressed: 0 bytes in 0 blocks
> 
> gcc/fortran/ChangeLog:
> 
> 2018-10-21  Bernhard Reutner-Fischer  
> 
>   * parse.c (clean_up_modules): Free gsym.
> ---
>  gcc/fortran/parse.c | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
> index b7265c42f58..f7c369a17ac 100644
> --- a/gcc/fortran/parse.c
> +++ b/gcc/fortran/parse.c
> @@ -6066,7 +6066,7 @@ resolve_all_program_units (gfc_namespace 
> *gfc_global_ns_list)
>  
>  
>  static void
> -clean_up_modules (gfc_gsymbol *gsym)
> +clean_up_modules (gfc_gsymbol *)
>  {
>if (gsym == NULL)
>  return;
> @@ -6074,14 +6074,18 @@ clean_up_modules (gfc_gsymbol *gsym)
>clean_up_modules (gsym->left);
>clean_up_modules (gsym->right);
>  
> -  if (gsym->type != GSYM_MODULE || !gsym->ns)
> +  if (gsym->type != GSYM_MODULE)
>  return;
>  
> -  gfc_current_ns = gsym->ns;
> -  gfc_derived_types = gfc_current_ns->derived_types;
> -  gfc_done_2 ();
> -  gsym->ns = NULL;
> -  return;
> +  if (gsym->ns)
> +{
> +  gfc_current_ns = gsym->ns;
> +  gfc_derived_types = gfc_current_ns->derived_types;
> +  gfc_done_2 ();
> +  gsym->ns = NULL;
> +}
> +  free (gsym);
> +  gsym = NULL;
>  }
>  
>  



[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-27 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #12 from H.J. Lu  ---
(In reply to peterz from comment #9)
> Created attachment 51683 [details]
> kernel patch to test -mharden-sls=all
> 
> $ make O=defconfig CC=gcc-12.0.0 arch/x86/entry/common.o
> ...
> arch/x86/entry/common.o: warning: objtool: do_SYSENTER_32()+0x1b:
> unreachable instruction

Please try the v2 patch.

[Bug target/102952] New code-gen options for retpolines and straight line speculation

2021-10-27 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

H.J. Lu  changed:

   What|Removed |Added

  Attachment #51679|0   |1
is obsolete||

--- Comment #11 from H.J. Lu  ---
Created attachment 51684
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51684=edit
The v2 patch to add -mharden-sls=

[Bug fortran/102966] size() returns 0 for an unallocated array, no error message or error exit

2021-10-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102966

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
This was added in r11-5064-g0c81ccc3d87098b93b0e6a2dd76815e4d6e78ff0

Re: [PATCH,FORTRAN] Fix memory leak in finalization wrappers

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
Ping
[hmz. it's been a while, I'll rebase and retest this one.
Ok if it passes?]

On Mon, 15 Oct 2018 10:23:06 +0200
Bernhard Reutner-Fischer  wrote:

> If a finalization is not required we created a namespace containing
> formal arguments for an internal interface definition but never used
> any of these. So the whole sub_ns namespace was not wired up to the
> program and consequently was never freed. The fix is to simply not
> generate any finalization wrappers if we know that it will be unused.
> Note that this reverts back to the original r190869
> (8a96d64282ac534cb597f446f02ac5d0b13249cc) handling for this case
> by reverting this specific part of r194075
> (f1ee56b4be7cc3892e6ccc75d73033c129098e87) for PR fortran/37336.
> 
> Regtests cleanly, installed to the fortran-fe-stringpool branch, sent
> here for reference and later inclusion.
> I might plug a few more leaks in preparation of switching to hash-maps.
> I fear that the leaks around interfaces are another candidate ;)
> 
> Should probably add a tag for the compile-time leak PR68800 shouldn't i.
> 
> valgrind summary for e.g.
> gfortran.dg/abstract_type_3.f03 and gfortran.dg/abstract_type_4.f03
> where ".orig" is pristine trunk and ".mine" contains this fix:
> 
> at3.orig.vg:LEAK SUMMARY:
> at3.orig.vg-   definitely lost: 8,460 bytes in 11 blocks
> at3.orig.vg-   indirectly lost: 13,288 bytes in 55 blocks
> at3.orig.vg- possibly lost: 0 bytes in 0 blocks
> at3.orig.vg-   still reachable: 572,278 bytes in 2,142 blocks
> at3.orig.vg-suppressed: 0 bytes in 0 blocks
> at3.orig.vg-
> at3.orig.vg-Use --track-origins=yes to see where uninitialised values come 
> from
> at3.orig.vg-ERROR SUMMARY: 38 errors from 33 contexts (suppressed: 0 from 0)
> --
> at3.mine.vg:LEAK SUMMARY:
> at3.mine.vg-   definitely lost: 344 bytes in 1 blocks
> at3.mine.vg-   indirectly lost: 7,192 bytes in 18 blocks
> at3.mine.vg- possibly lost: 0 bytes in 0 blocks
> at3.mine.vg-   still reachable: 572,278 bytes in 2,142 blocks
> at3.mine.vg-suppressed: 0 bytes in 0 blocks
> at3.mine.vg-
> at3.mine.vg-ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
> at3.mine.vg-ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
> at4.orig.vg:LEAK SUMMARY:
> at4.orig.vg-   definitely lost: 13,751 bytes in 12 blocks
> at4.orig.vg-   indirectly lost: 11,976 bytes in 60 blocks
> at4.orig.vg- possibly lost: 0 bytes in 0 blocks
> at4.orig.vg-   still reachable: 572,278 bytes in 2,142 blocks
> at4.orig.vg-suppressed: 0 bytes in 0 blocks
> at4.orig.vg-
> at4.orig.vg-Use --track-origins=yes to see where uninitialised values come 
> from
> at4.orig.vg-ERROR SUMMARY: 18 errors from 16 contexts (suppressed: 0 from 0)
> --
> at4.mine.vg:LEAK SUMMARY:
> at4.mine.vg-   definitely lost: 3,008 bytes in 3 blocks
> at4.mine.vg-   indirectly lost: 4,056 bytes in 11 blocks
> at4.mine.vg- possibly lost: 0 bytes in 0 blocks
> at4.mine.vg-   still reachable: 572,278 bytes in 2,142 blocks
> at4.mine.vg-suppressed: 0 bytes in 0 blocks
> at4.mine.vg-
> at4.mine.vg-ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
> at4.mine.vg-ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
> 
> gcc/fortran/ChangeLog:
> 
> 2018-10-12  Bernhard Reutner-Fischer  
> 
>   * class.c (generate_finalization_wrapper): Do leak finalization
>   wrappers if they will not be used.
>   * expr.c (gfc_free_actual_arglist): Formatting fix.
>   * gfortran.h (gfc_free_symbol): Pass argument by reference.
>   (gfc_release_symbol): Likewise.
>   (gfc_free_namespace): Likewise.
>   * symbol.c (gfc_release_symbol): Adjust acordingly.
>   (free_components): Set procedure pointer components
>   of derived types to NULL after freeing.
>   (free_tb_tree): Likewise.
>   (gfc_free_symbol): Set sym to NULL after freeing.
>   (gfc_free_namespace): Set namespace to NULL after freeing.
> ---
>  gcc/fortran/class.c| 25 +
>  gcc/fortran/expr.c |  2 +-
>  gcc/fortran/gfortran.h |  6 +++---
>  gcc/fortran/symbol.c   | 19 ++-
>  4 files changed, 23 insertions(+), 29 deletions(-)
> 
> diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
> index 69c95fc5dfa..e0bb381a55f 100644
> --- a/gcc/fortran/class.c
> +++ b/gcc/fortran/class.c
> @@ -1533,7 +1533,6 @@ generate_finalization_wrapper (gfc_symbol *derived, 
> gfc_namespace *ns,
>gfc_code *last_code, *block;
>const char *name;
>bool finalizable_comp = false;
> -  bool expr_null_wrapper = false;
>gfc_expr *ancestor_wrapper = NULL, *rank;
>gfc_iterator *iter;
>  
> @@ -1561,13 +1560,17 @@ generate_finalization_wrapper (gfc_symbol *derived, 
> gfc_namespace *ns,
>  }
>  
>/* No wrapper of the ancestor and no own FINAL subroutines and allocatable
> - components: Return a NULL() expression; we defer this a bit to have have
> + components: Return a NULL() expression; we defer this a bit to have
>

[PATCH,Fortran 0/1] Correct CAF locations in simplify

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
Hi!

I found this lying around in an oldish tree.
Regtest running over night, ok for trunk if it passes?

Bernhard Reutner-Fischer (1):
  Tweak locations around CAF simplify

 gcc/fortran/simplify.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

-- 
2.33.0



[PATCH,Fortran 1/1] Tweak locations around CAF simplify

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
From: Bernhard Reutner-Fischer 

addresses: FIXME: gfc_current_locus is wrong
by using the locus of the current intrinsic.
Regtests clean, ok for trunk?

gcc/fortran/ChangeLog:

2018-09-20  Bernhard Reutner-Fischer  

* simplify.c (gfc_simplify_failed_or_stopped_images): Use
current intrinsic where locus.
(gfc_simplify_get_team): Likewise.
(gfc_simplify_num_images): Likewise.
(gfc_simplify_image_status): Likewise.
(gfc_simplify_this_image): Likewise.
---
 gcc/fortran/simplify.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index d675f2c3aef..46e88bb2bf1 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -2985,8 +2985,9 @@ gfc_simplify_failed_or_stopped_images (gfc_expr *team 
ATTRIBUTE_UNUSED,
 {
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_current_locus = *gfc_current_intrinsic_where;
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
+
   return _bad_expr;
 }
 
@@ -2999,7 +3000,8 @@ gfc_simplify_failed_or_stopped_images (gfc_expr *team 
ATTRIBUTE_UNUSED,
   else
actual_kind = gfc_default_integer_kind;
 
-  result = gfc_get_array_expr (BT_INTEGER, actual_kind, 
_current_locus);
+  result = gfc_get_array_expr (BT_INTEGER, actual_kind,
+ gfc_current_intrinsic_where);
   result->rank = 1;
   return result;
 }
@@ -3015,15 +3017,16 @@ gfc_simplify_get_team (gfc_expr *level ATTRIBUTE_UNUSED)
 {
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_current_locus = *gfc_current_intrinsic_where;
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
   return _bad_expr;
 }
 
   if (flag_coarray == GFC_FCOARRAY_SINGLE)
 {
   gfc_expr *result;
-  result = gfc_get_array_expr (BT_INTEGER, gfc_default_integer_kind, 
_current_locus);
+  result = gfc_get_array_expr (BT_INTEGER, gfc_default_integer_kind,
+ gfc_current_intrinsic_where);
   result->rank = 0;
   return result;
 }
@@ -6340,7 +6343,8 @@ gfc_simplify_num_images (gfc_expr *distance 
ATTRIBUTE_UNUSED, gfc_expr *failed)
 
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
   return _bad_expr;
 }
 
@@ -6350,9 +6354,8 @@ gfc_simplify_num_images (gfc_expr *distance 
ATTRIBUTE_UNUSED, gfc_expr *failed)
   if (failed && failed->expr_type != EXPR_CONSTANT)
 return NULL;
 
-  /* FIXME: gfc_current_locus is wrong.  */
   result = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind,
- _current_locus);
+ gfc_current_intrinsic_where);
 
   if (failed && failed->value.logical != 0)
 mpz_set_si (result->value.integer, 0);
@@ -8345,8 +8348,8 @@ gfc_simplify_image_status (gfc_expr *image, gfc_expr 
*team ATTRIBUTE_UNUSED)
 {
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_current_locus = *gfc_current_intrinsic_where;
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
   return _bad_expr;
 }
 
@@ -8383,9 +8386,8 @@ gfc_simplify_this_image (gfc_expr *coarray, gfc_expr *dim,
   if (coarray == NULL || !gfc_is_coarray (coarray))
 {
   gfc_expr *result;
-  /* FIXME: gfc_current_locus is wrong.  */
   result = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind,
- _current_locus);
+ gfc_current_intrinsic_where);
   mpz_set_si (result->value.integer, 1);
   return result;
 }
-- 
2.33.0



[Bug target/102974] GCC optimization is very poor for add carry and multiplication combos

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974

--- Comment #5 from Andrew Pinski  ---
(In reply to cqwrteur from comment #4)
> (In reply to cqwrteur from comment #3)
> > (In reply to Andrew Pinski from comment #2)
> > > There might be another bug about _addcarryx_u64 already.
> > 
> > This is 32 bit addcarry.
> 
> but yeah. GCC does not perform optimizations very well to add carries and
> mul + recognize >>64u <<64u patterns

I mean all of _addcarryx_* intrinsics.

[Bug target/102974] GCC optimization is very poor for add carry and multiplication combos

2021-10-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974

--- Comment #4 from cqwrteur  ---
(In reply to cqwrteur from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > There might be another bug about _addcarryx_u64 already.
> 
> This is 32 bit addcarry.

but yeah. GCC does not perform optimizations very well to add carries and mul +
recognize >>64u <<64u patterns

[Bug target/102974] GCC optimization is very poor for add carry and multiplication combos

2021-10-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974

--- Comment #3 from cqwrteur  ---
(In reply to Andrew Pinski from comment #2)
> There might be another bug about _addcarryx_u64 already.

This is 32 bit addcarry.

[Bug tree-optimization/102960] [10/11/12 Regression] ICE: in sign_mask, at wide-int.h:855 in GCC 10.3.0

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102960

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
   Target Milestone|--- |10.4

[Bug target/102974] GCC optimization is very poor for add carry and multiplication combos

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974

Andrew Pinski  changed:

   What|Removed |Added

  Component|tree-optimization   |target

--- Comment #2 from Andrew Pinski  ---
There might be another bug about _addcarryx_u64 already.

[Bug tree-optimization/102974] GCC optimization is very poor for add carry and multiplication combos

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974

--- Comment #1 from Andrew Pinski  ---
#include
#include

struct ul32x2
{
std::uint_least32_t low,high;
};

inline constexpr std::uint_least32_t umul_least_32(std::uint_least32_t
a,std::uint_least32_t b,std::uint_least32_t& high) noexcept
{
if
constexpr(std::endian::native==std::endian::little||std::endian::native==std::endian::big)
{
struct ul32x2_little_endian_t
{
std::uint_least32_t low,high;
};
struct ul32x2_big_endian_t
{
std::uint_least32_t high,low;
};
using ul32x2_t =
std::conditional_t;
auto
ret{__builtin_bit_cast(ul32x2_t,static_cast(a)*b)};
high=ret.high;
return ret.low;
}
else
{
std::uint_least64_t v{static_cast(a)*b};
high=static_cast(v>>32u);
return static_cast(v);
}
}
template
#if __cpp_lib_concepts >= 202002L
requires (std::unsigned_integral)
#endif
inline constexpr bool add_carry_naive(bool carry,T a,T b,T& out) noexcept
{
T temp{carry+a};
out=temp+b;
return (out < b) | (temp < a);
}

template
#if __cpp_lib_concepts >= 202002L
requires (std::unsigned_integral)
#endif
inline constexpr bool add_carry(bool carry,T a,T b,T& out) noexcept
{
#if __cpp_lib_is_constant_evaluated >= 201811L
if(std::is_constant_evaluated())
return add_carry_naive(carry,a,b,out);
else
#endif
{
#if defined(_MSC_VER) && !defined(__clang__)
#if (defined(_M_IX86) || defined(_M_AMD64))
if constexpr(sizeof(T)==8)
{
#if defined(_M_AMD64)
return
_addcarryx_u64(carry,a,b,reinterpret_cast(__builtin_addressof(out)));
#else
return _addcarryx_u32(_addcarryx_u32(carry,
   
*reinterpret_cast(__builtin_addressof(a)),*reinterpret_cast(__builtin_addressof(b)),reinterpret_cast(__builtin_addressof(out))),
   
reinterpret_cast(__builtin_addressof(a))[1],reinterpret_cast(__builtin_addressof(b))[1],reinterpret_cast(__builtin_addressof(out))+1);
#endif
}
else if constexpr(sizeof(T)==4)
return
_addcarryx_u32(carry,a,b,reinterpret_cast(__builtin_addressof(out)));
else if constexpr(sizeof(T)==2)
return _addcarry_u16(carry,a,b,reinterpret_cast(__builtin_addressof(out)));
else if constexpr(sizeof(T)==1)
return _addcarry_u8(carry,a,b,reinterpret_cast(__builtin_addressof(out)));
else
return add_carry_naive(carry,a,b,out);
#else
return add_carry_naive(carry,a,b,out);
#endif
#elif defined(__has_builtin) &&
(__has_builtin(__builtin_addcb)&&__has_builtin(__builtin_addcs)&&__has_builtin(__builtin_addc)&&__has_builtin(__builtin_addcl)&&__has_builtin(__builtin_addcll))
if constexpr(sizeof(T)==sizeof(long long unsigned))
{
long long unsigned carryout;
out=__builtin_addcll(a,b,carry,__builtin_addressof(carryout));
return carryout;
}
else if constexpr(sizeof(T)==sizeof(long unsigned))
{
long unsigned carryout;
out=__builtin_addcl(a,b,carry,__builtin_addressof(carryout));
return carryout;
}
else if constexpr(sizeof(T)==sizeof(unsigned))
{
unsigned carryout;
out=__builtin_addc(a,b,carry,__builtin_addressof(carryout));
return carryout;
}
else if constexpr(sizeof(T)==sizeof(short unsigned))
{
short unsigned carryout;
out=__builtin_addcs(a,b,carry,__builtin_addressof(carryout));
return carryout;
}
else if constexpr(sizeof(T)==sizeof(char unsigned))
{
char unsigned carryout;
out=__builtin_addcb(a,b,carry,__builtin_addressof(carryout));
return carryout;
}
else
{
return add_carry_naive(carry,a,b,out);
}
#elif defined(__has_builtin) &&
(__has_builtin(__builtin_ia32_addcarryx_u32)||__has_builtin(__builtin_ia32_addcarry_u32)||__has_builtin(__builtin_ia32_addcarryx_u64))
if constexpr(sizeof(T)==8)
{
#if __has_builtin(__builtin_ia32_addcarryx_u64)
using may_alias_ptr_type [[gnu::may_alias]] = unsigned long
long*;
return
__builtin_ia32_addcarryx_u64(carry,a,b,reinterpret_cast(__builtin_addressof(out)));
#else
std::uint32_t a_low;
std::uint32_t a_high;
   
__builtin_memcpy(__builtin_addressof(a_low),__builtin_addressof(a),4);
   
__builtin_memcpy(__builtin_addressof(a_high),reinterpret_cast(__builtin_addressof(a))+4,4);
std::uint32_t b_low;

[Bug tree-optimization/102974] New: GCC optimization is very poor for add carry and multiplication combos

2021-10-27 Thread unlvsur at live dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102974

Bug ID: 102974
   Summary: GCC optimization is very poor for add carry and
multiplication combos
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: unlvsur at live dot com
  Target Milestone: ---

GCC:
https://godbolt.org/z/6sc6v5YcG

clang:
https://godbolt.org/z/eP8fTrWzd
msvc:
https://godbolt.org/z/snzoEe5he
GCC generates 16 more instructions than msvc and clang for carry flag
optimizations.

[PATCH,Fortran] Fortran: Delete unused decl in gfortran.h

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
From: Bernhard Reutner-Fischer 

Hi!

Delete some more declarations without definitions and make some
functions static.
Bootstrapped and regtested on x86_64-unknown-linux without regressions.
Ok for trunk?

gcc/fortran/ChangeLog:

* decl.c (gfc_insert_kind_parameter_exprs): Make static.
* expr.c (gfc_build_init_expr): Make static
(gfc_build_default_init_expr): Move below its static helper.
* gfortran.h (gfc_insert_kind_parameter_exprs, gfc_add_saved_common,
gfc_add_common, gfc_use_derived_tree, gfc_free_charlen,
gfc_get_ultimate_derived_super_type,
gfc_resolve_oacc_parallel_loop_blocks, gfc_build_init_expr,
gfc_iso_c_sub_interface): Delete.
* symbol.c (gfc_new_charlen, gfc_get_derived_super_type): Make
static.
---
 gcc/fortran/decl.c |  2 +-
 gcc/fortran/expr.c | 20 ++--
 gcc/fortran/gfortran.h |  9 -
 gcc/fortran/symbol.c   |  4 ++--
 4 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 2788348d1be..e9e23fe1acb 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -3713,7 +3713,7 @@ insert_parameter_exprs (gfc_expr* e, gfc_symbol* sym 
ATTRIBUTE_UNUSED,
 }
 
 
-bool
+static bool
 gfc_insert_kind_parameter_exprs (gfc_expr *e)
 {
   return gfc_traverse_expr (e, NULL, _parameter_exprs, 0);
diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index 4dea840e348..087d822021a 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -4587,21 +4587,12 @@ gfc_check_assign_symbol (gfc_symbol *sym, gfc_component 
*comp, gfc_expr *rvalue)
   return true;
 }
 
-/* Invoke gfc_build_init_expr to create an initializer expression, but do not
- * require that an expression be built.  */
-
-gfc_expr *
-gfc_build_default_init_expr (gfc_typespec *ts, locus *where)
-{
-  return gfc_build_init_expr (ts, where, false);
-}
-
 /* Build an initializer for a local integer, real, complex, logical, or
character variable, based on the command line flags finit-local-zero,
finit-integer=, finit-real=, finit-logical=, and finit-character=.
With force, an initializer is ALWAYS generated.  */
 
-gfc_expr *
+static gfc_expr *
 gfc_build_init_expr (gfc_typespec *ts, locus *where, bool force)
 {
   gfc_expr *init_expr;
@@ -4758,6 +4749,15 @@ gfc_build_init_expr (gfc_typespec *ts, locus *where, 
bool force)
   return init_expr;
 }
 
+/* Invoke gfc_build_init_expr to create an initializer expression, but do not
+ * require that an expression be built.  */
+
+gfc_expr *
+gfc_build_default_init_expr (gfc_typespec *ts, locus *where)
+{
+  return gfc_build_init_expr (ts, where, false);
+}
+
 /* Apply an initialization expression to a typespec. Can be used for symbols or
components. Similar to add_init_expr_to_sym in decl.c; could probably be
combined with some effort.  */
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index f7662c59a5d..8c11cf6d18d 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3116,7 +3116,6 @@ struct gfc_vect_builtin_tuple
 extern hash_map *gfc_vectorized_builtins;
 
 /* Handling Parameterized Derived Types  */
-bool gfc_insert_kind_parameter_exprs (gfc_expr *);
 bool gfc_insert_parameter_exprs (gfc_expr *, gfc_actual_arglist *);
 match gfc_get_pdt_instance (gfc_actual_arglist *, gfc_symbol **,
gfc_actual_arglist **);
@@ -3348,11 +3347,9 @@ bool gfc_add_threadprivate (symbol_attribute *, const 
char *, locus *);
 bool gfc_add_omp_declare_target (symbol_attribute *, const char *, locus *);
 bool gfc_add_omp_declare_target_link (symbol_attribute *, const char *,
  locus *);
-bool gfc_add_saved_common (symbol_attribute *, locus *);
 bool gfc_add_target (symbol_attribute *, locus *);
 bool gfc_add_dummy (symbol_attribute *, const char *, locus *);
 bool gfc_add_generic (symbol_attribute *, const char *, locus *);
-bool gfc_add_common (symbol_attribute *, locus *);
 bool gfc_add_in_common (symbol_attribute *, const char *, locus *);
 bool gfc_add_in_equivalence (symbol_attribute *, const char *, locus *);
 bool gfc_add_data (symbol_attribute *, const char *, locus *);
@@ -3387,7 +3384,6 @@ bool gfc_copy_attr (symbol_attribute *, symbol_attribute 
*, locus *);
 int gfc_copy_dummy_sym (gfc_symbol **, gfc_symbol *, int);
 bool gfc_add_component (gfc_symbol *, const char *, gfc_component **);
 gfc_symbol *gfc_use_derived (gfc_symbol *);
-gfc_symtree *gfc_use_derived_tree (gfc_symtree *);
 gfc_component *gfc_find_component (gfc_symbol *, const char *, bool, bool,
gfc_ref **);
 
@@ -3428,7 +3424,6 @@ void gfc_undo_symbols (void);
 void gfc_commit_symbols (void);
 void gfc_commit_symbol (gfc_symbol *);
 gfc_charlen *gfc_new_charlen (gfc_namespace *, gfc_charlen *);
-void gfc_free_charlen (gfc_charlen *, gfc_charlen *);
 void gfc_free_namespace (gfc_namespace *);
 
 void gfc_symbol_init_2 (void);
@@ -3448,7 +3443,6 

Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-27 Thread Patrick Palka via Gcc-patches
On Wed, 27 Oct 2021, Jason Merrill wrote:

> On 10/27/21 14:54, Patrick Palka wrote:
> > On Tue, 26 Oct 2021, Jakub Jelinek wrote:
> > 
> > > On Tue, Oct 26, 2021 at 05:07:43PM -0400, Patrick Palka wrote:
> > > > The performance impact of the other calls to
> > > > cxx_eval_outermost_const_expr
> > > > from p_c_e_1 is probably already mostly mitigated by the constexpr call
> > > > cache and the fact that we try to evaluate all calls to constexpr
> > > > functions during cp_fold_function anyway (at least with -O).  So trial
> > > 
> > > constexpr function bodies don't go through cp_fold_function
> > > (intentionally,
> > > so that we don't optimize away UB), the bodies are copied before the trees
> > > of the
> > > normal copy are folded.
> > 
> > Ah right, I had forgotten about that..
> > 
> > Here's another approach that doesn't need to remove trial evaluation for
> > &&/||.  The idea is to first quietly check if the second operand is
> > potentially constant _before_ performing trial evaluation of the first
> > operand.  This speeds up the case we care about (both operands are
> > potentially constant) without regressing any diagnostics.  We have to be
> > careful about emitting bogus diagnostics when tf_error is set, hence the
> > first hunk below which makes p_c_e_1 always proceed quietly first, and
> > replay noisily in case of error (similar to how satisfaction works).
> > 
> > Would something like this be preferable?
> 
> Seems plausible, though doubling the number of stack frames is a downside.

Whoops, good point..  The noisy -> quiet adjustment only needs to
be performed during the outermost call to p_c_e_1, and not also during
each recursive call.  The revised diff below fixes this thinko, and so
only a single extra stack frame is needed AFAICT.

> 
> What did you think of Jakub's suggestion of linearizing the terms?

IIUC that would fix the quadraticness, but it wouldn't address that
we end up evaluating the entire expression twice, once during the trial
evaluation of each term from p_c_e_1 and again during the proper
evaluation of the entire expression.  It'd be nice if we could somehow
avoid the double evaluation, as in the approach below (or in the first
patch).

-- >8 --

gcc/cp/ChangeLog:

* constexpr.c (potential_constant_expression_1): When tf_error is
set, proceed quietly first and return true if successful.
: When tf_error is not set, check potentiality
of the second operand before performing trial evaluation of the
first operand rather than after.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 6f83d303cdd..7855a948baf 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -8892,13 +8892,16 @@ potential_constant_expression_1 (tree t, bool 
want_rval, bool strict, bool now,
   tmp = boolean_false_node;
 truth:
   {
-   tree op = TREE_OPERAND (t, 0);
-   if (!RECUR (op, rval))
+   tree op0 = TREE_OPERAND (t, 0);
+   tree op1 = TREE_OPERAND (t, 1);
+   if (!RECUR (op0, rval))
  return false;
+   if (!(flags & tf_error) && RECUR (op1, rval))
+ return true;
if (!processing_template_decl)
- op = cxx_eval_outermost_constant_expr (op, true);
-   if (tree_int_cst_equal (op, tmp))
- return RECUR (TREE_OPERAND (t, 1), rval);
+ op0 = cxx_eval_outermost_constant_expr (op0, true);
+   if (tree_int_cst_equal (op0, tmp))
+ return (flags & tf_error) ? RECUR (op1, rval) : false;
else
  return true;
   }
@@ -9107,6 +9110,14 @@ bool
 potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool now,
 tsubst_flags_t flags)
 {
+  if (flags & tf_error)
+{
+  flags &= ~tf_error;
+  if (potential_constant_expression_1 (t, want_rval, strict, now, flags))
+   return true;
+  flags |= tf_error;
+}
+
   tree target = NULL_TREE;
   return potential_constant_expression_1 (t, want_rval, strict, now,
  flags, );



[Bug tree-optimization/102971] GCC cannot understand >>32 pattern

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102971

--- Comment #2 from Andrew Pinski  ---
#include 

std::uint64_t umul_naive(std::uint64_t a,std::uint64_t b,std::uint64_t& high)
noexcept
{
std::uint32_t a0(static_cast(a));
std::uint32_t a1(static_cast(a>>32));
std::uint32_t b0(static_cast(b));
std::uint32_t b1(static_cast(b>>32));
std::uint64_t c00(static_cast(a0)*b0);
std::uint64_t c01(static_cast(a0)*b1);
std::uint64_t c10(static_cast(a1)*b0);
std::uint64_t c11(static_cast(a1)*b1);

std::uint64_t d0{static_cast(c00)};
std::uint64_t c00_high{c00>>32};
std::uint64_t c01_low{static_cast(c01)};
std::uint64_t c01_high{c01>>32};
std::uint64_t c10_low{static_cast(c10)};
std::uint64_t c10_high{c10>>32};

std::uint64_t d1{c00_high+c01_low+c10_low};
std::uint64_t d2{(d1>>32)+c10_high+c01_high};
std::uint64_t d3{(d2>>32)+c11};
high=d2|(d3<<32);
return d0|(d1<<32);
}

[Bug tree-optimization/102969] [12 regression] g++.dg/warn/Wstringop-overflow-4.C fails after r12-4726

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102969

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Host|powerpc64-linux-gnu,|
   |powerpc64le-linux-gnu   |
   Last reconfirmed||2021-10-27
  Component|other   |tree-optimization
   Target Milestone|--- |12.0
 Target|powerpc64-linux-gnu,|powerpc64{,le}-linux-gnu,
   |powerpc64le-linux-gnu   |aarch64-linux-gnu
   Keywords||diagnostic
  Build|powerpc64-linux-gnu,|
   |powerpc64le-linux-gnu   |

--- Comment #1 from Andrew Pinski  ---
Confirmed. I noticed this too.
I debugged it a little bit too. The reason why it shows up in C++98 only is
because C++98 does the size check slightly different and does not throw an
exception in the case of an overflow.
Also You need all 3 checks in the code to get the warning in the first place,
that is:
  ptrdiff_t r_dmin_dmax = SR (DIFF_MIN, DIFF_MAX);
  T (S (1), new int16_t[r_dmin_dmax]);
  T (S (2), new int16_t[r_dmin_dmax + 1]);
  T (S (9), new int16_t[r_dmin_dmax * 2 + 1]);

Have just the one which warns does not cause a warning.  I didn't look further
than that.

Re: [PATCH] c++: Implement DR2351 - void{} [PR102820]

2021-10-27 Thread Jason Merrill via Gcc-patches

On 10/21/21 04:42, Jakub Jelinek wrote:

Hi!

Here is an attempt to implement DR2351 - void{} - where void{} after
pack expansion is considered valid and the same thing as void().
For templates, dunno if we have some better way to check if a CONSTRUCTOR
might be empty after pack expansion.  Would that only if the constructor
only contains EXPR_PACK_EXPANSION elements and nothing else, or something
else too?


I think that's the only case.  For template args there's the 
pack_expansion_args_count function, but I don't think there's anything 
similar for constructor elts; please feel free to add it.



With the patch as is we wouldn't diagnose
template 
void
bar (T... t)
{
   void{1, t...};
}
at parsing time, only at instantiation time, even when it will always
expand to at least one CONSTRUCTOR elt.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2021-10-21  Jakub Jelinek  

PR c++/102820
* semantics.c (finish_compound_literal): Implement DR2351 - void{}.
If type is cv void and compound_literal has no elements, return
void_node.  If type is cv void and compound_literal is instantiation
dependent, handle it like other dependent compound literals.

* g++.dg/cpp0x/dr2351.C: New test.

--- gcc/cp/semantics.c.jj   2021-10-15 11:58:45.079131947 +0200
+++ gcc/cp/semantics.c  2021-10-20 17:00:38.586705600 +0200
@@ -3104,9 +3104,20 @@ finish_compound_literal (tree type, tree
  
if (!TYPE_OBJ_P (type))

  {
-  if (complain & tf_error)
-   error ("compound literal of non-object type %qT", type);
-  return error_mark_node;
+  /* DR2351 */
+  if (VOID_TYPE_P (type) && CONSTRUCTOR_NELTS (compound_literal) == 0)
+   return void_node;
+  else if (VOID_TYPE_P (type)
+  && processing_template_decl
+  && instantiation_dependent_expression_p (compound_literal))
+   /* If there are packs in compound_literal, it could
+  be void{} after pack expansion.  */;
+  else
+   {
+ if (complain & tf_error)
+   error ("compound literal of non-object type %qT", type);
+ return error_mark_node;
+   }
  }
  
if (template_placeholder_p (type))

--- gcc/testsuite/g++.dg/cpp0x/dr2351.C.jj  2021-10-20 17:06:02.399162937 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/dr2351.C 2021-10-20 17:05:54.294276511 +0200
@@ -0,0 +1,36 @@
+// DR2351
+// { dg-do compile { target c++11 } }
+
+void
+foo ()
+{
+  void{};
+  void();
+}
+
+template 
+void
+bar (T... t)
+{
+  void{t...};
+  void(t...);
+}
+
+void
+baz ()
+{
+  bar ();
+}
+
+template 
+void
+qux (T... t)
+{
+  void{t...};  // { dg-error "compound literal of non-object type" }
+}
+
+void
+corge ()
+{
+  qux (1, 2);
+}

Jakub





[Bug middle-end/102970] [11/12 Regression] stable_sort uninitialized value with -funroll-loops -fno-tree-vectorize

2021-10-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102970

--- Comment #2 from Andrew Pinski  ---
The gimple level looks correct:
   [local count: 8209314308]:
  # __cur_2 = PHI <__cur_149(5), _141(3)>
  # __first_156 = PHI <__first_148(5), (3)>
  # prephitmp_155 = PHI 
  # prephitmp_153 = PHI 
  # prephitmp_108 = PHI 
  # prephitmp_105 = PHI 
  MEM[(int *)__cur_2] = prephitmp_155;
  MEM[(int *)__cur_2 + 4B] = prephitmp_153;
  MEM[(int *)__cur_2 + 8B] = prephitmp_108;
  MEM[(int *)__cur_2 + 12B] = prephitmp_105;
  __first_148 = __first_156 + 16;
  __cur_149 = __cur_2 + 16;
  if (  [(void *) + 304B] != __first_148)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 7306289739]:
  pretmp_3 = MEM[(int *)__first_148];
  pretmp_142 = MEM[(int *)__first_148 + 4B];
  pretmp_109 = MEM[(int *)__first_148 + 8B];
  pretmp_106 = MEM[(int *)__first_148 + 12B];
  goto ; [100.00%]

I suspect the rtl level unroller messes up the first iteration for the stores
in the above case.

Re: [PATCH 3/5] gcc: Add --nostdlib++ option

2021-10-27 Thread Bernhard Reutner-Fischer via Gcc-patches
On Wed, 27 Oct 2021 21:05:03 +0100
Richard Purdie via Gcc-patches  wrote:

> OpenEmbedded/Yocto Project builds libgcc and the other gcc runtime libraries
> separately from the compiler and slightly differently to the standard gcc 
> build.
> 
> In general this works well but in trying to build them separately we run into
> an issue since we're using our gcc, not xgcc and there is no way to tell 
> configure
> to use libgcc but not look for libstdc++.
> 
> This adds such an option allowing such configurations to work.

But shouldn't it be called --nostdlibc++ then?
thanks,


  1   2   3   >