[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #10 from Andrew Pinski  ---
(In reply to Alexander Monakov from comment #8)
> I think the following testcase indicates that GCC assumes that tail padding
> is accessible: 

Well it aligned accesses are always accessable 
the alignment of `struct S` in this case is 4 byte aligned after all.

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #9 from Martin Uecker  ---
Clang as well, but that would be only padding inside the first part without
taking into account extra element in the FAM. 

I am more concert about programmers using the formula sizeof(.) + n * sizeof
for memcpy etc.  (and we have an example in the standard using this formula).
Creating objects smaller than this seems a bit dangerous.

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #8 from Alexander Monakov  ---
(In reply to jos...@codesourcery.com from comment #6)
> For the standard, dynamically allocated case, you should only need to 
> allocate enough memory to contain the initial part of the struct and the 
> array members being accessed - not any padding after that array.  (There 
> were wording problems before C99 TC2; see DR#282.)

I think the following testcase indicates that GCC assumes that tail padding is
accessible:

struct S {
int i;
char c;
char fam[];
};

void f(struct S *p, struct S *q)
{
*p = *q;
}

f:
movq(%rsi), %rax
movq%rax, (%rdi)
ret

Sorry for the tangential remark, but there seems to be a contradiction.

[Bug fortran/90504] Improved NORM2 algorithm

2023-05-24 Thread jb at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90504

--- Comment #2 from Janne Blomqvist  ---
(In reply to anlauf from comment #1)
> (In reply to Janne Blomqvist from comment #0)
> > Hanson, Hopkins, Remark on Algorithm 539: A Modern Fortran Reference
> > Implementation for Carefully Computing the Euclidean Norm,
> > https://dl.acm.org/citation.cfm?id=3134441
> > 
> > Above article tests different algorithms for NORM2 and tests performance and
> > numerical accuracy.
> 
> This article is behind a paywall.
> 
> Is there a publicly available description?

https://kar.kent.ac.uk/67205/1/remark.pdf

(Found via the https://unpaywall.org/ browser extension)

[Bug tree-optimization/109959] `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out at -O2+

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

--- Comment #4 from Andrew Pinski  ---
Note the underlaying issue with VRP is similar to PR 109959 but it is about a
slightly different optimization though.

[Bug fortran/109948] [13/14 Regression] ICE(segfault) in gfc_expression_rank() from gfc_op_rank_conformable()

2023-05-24 Thread rimvydas.jas at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109948

--- Comment #5 from Rimvydas (RJ)  ---
(In reply to anlauf from comment #4)
> Can you check if this works for you?

This patch allows to avoid issue on all other associate use cases (tried on
gcc-13 branch).

However it is a bit suspicious that using variable name abbreviations (to dig
out arrays from deeply nested types) is enough to change how the internal
gfc_array_ref is populated.  ICE was triggered only on patterns involving first
using abbreviated name indexed access (like k(1)) followed by any operation
involving whole array.

[Bug tree-optimization/109960] [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

--- Comment #4 from Andrew Pinski  ---
I happened to notice this because I am working on a match patch that transform
`a ? 1 : b` into `a | b`.

In the case of stmt_can_terminate_bb_p, I noticed we had:
   [local count: 330920071]:
  _48 = MEM[(const struct gasm *)t_22(D)].D.129035.D.128905.D.128890.subcode;
  _49 = _48 & 2;
  if (_49 != 0)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 218407246]:
  _50 = (bool) _48;

   [local count: 940291388]:
  # _13 = PHI <0(14), _50(32), _12(29), 0(11), 0(30), 1(2), 1(31), 0(25)>

And the patch to match would do:
   [local count: 330920071]:
  _48 = MEM[(const struct gasm *)t_22(D)].D.129035.D.128905.D.128890.subcode;
  _49 = _48 & 2;
  _50 = (bool) _48;
  _127 = _49 != 0;
  _44 = _50 | _127;

   [local count: 940291388]:
  # _13 = PHI <0(14), 0(25), _12(29), 0(11), 0(30), 1(2), _44(31)>

Which is definitely better than before but I was like isn't that the same as:
  _49 = _48 & 3;
  _44 = _49 != 0;

[Bug target/100106] [10 Regression] ICE in gen_movdi, at config/arm/arm.md:6187 since r10-2840-g70cdb21e

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100106

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Alexandre Oliva :

https://gcc.gnu.org/g:d6b756447cd58bcca20e6892790582308b869817

commit r14-1187-gd6b756447cd58bcca20e6892790582308b869817
Author: Alexandre Oliva 
Date:   Wed May 24 03:07:56 2023 -0300

[PR100106] Reject unaligned subregs when strict alignment is required

The testcase for pr100106, compiled with optimization for 32-bit
powerpc -mcpu=604 with -mstrict-align expands the initialization of a
union from a float _Complex value into a load from an SCmode
constant pool entry, aligned to 4 bytes, into a DImode pseudo,
requiring 8-byte alignment.

The patch that introduced the testcase modified simplify_subreg to
avoid changing the MEM to outermode, but simplify_gen_subreg still
creates a SUBREG or a MEM that would require stricter alignment than
MEM's, and lra_constraints appears to get confused by that, repeatedly
creating unsatisfiable reloads for the SUBREG until it exceeds the
insn count.

Avoiding the unaligned SUBREG, expand splits the DImode dest into
SUBREGs and loads each SImode word of the constant pool with the
proper alignment.


for  gcc/ChangeLog

PR target/100106
* emit-rtl.cc (validate_subreg): Reject a SUBREG of a MEM that
requires stricter alignment than MEM's.

for  gcc/testsuite/ChangeLog

PR target/100106
* gcc.target/powerpc/pr100106-sa.c: New.

[Bug target/109933] __atomic_test_and_set is broken for BIG ENDIAN riscv targets

2023-05-24 Thread rory.bolt at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109933

--- Comment #9 from Rory Bolt  ---
Created attachment 55153
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55153=edit
patch

Tested fix for big endian, NOT tested on little endian

[Bug tree-optimization/109960] [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|1   |0
 Status|ASSIGNED|UNCONFIRMED
   Assignee|pinskia at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

--- Comment #3 from Andrew Pinski  ---
Nope not working, even tried to figure out how to modify tree-ssa-reassoc.cc to
teach it about `(bool)a` being the same as `(a & 1) != 0` But I could not
figure out how.

[Bug c++/109961] auto assigned from requires and lambda inside

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109961

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
Summary|storage size of 'variable   |auto assigned from requires
   |name' isn't known   |and lambda inside
   Keywords||c++-lambda, rejects-valid
   Last reconfirmed||2023-05-25
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed.
Reduced all the way:
```
auto a = requires{  []()  {}; };
```

[Bug tree-optimization/109960] [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-05-25
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #2 from Andrew Pinski  ---
Or maybe extend recognize_single_bit_test to recognize (bool)a != 0 is the same
as a & 1 != 0.

Let me try that.

[Bug c++/109961] New: storage size of 'variable name' isn't known

2023-05-24 Thread Darrell.Wright at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109961

Bug ID: 109961
   Summary: storage size of 'variable name' isn't known
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Darrell.Wright at gmail dot com
  Target Milestone: ---

The following valid code fails to compile in gcc-trunk on
https://foo.godbolt.org/z/vGMGbv8oP 

auto a = requires{ 
[]( int b ) consteval {
   if( b ) {
throw b;
   }
}( 0 );
};

With the following error

:3:6: error: storage size of 'a' isn't known
3 | auto a = requires{
  |  ^
Compiler returned: 1

[Bug tree-optimization/109960] [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

--- Comment #1 from Andrew Pinski  ---
We could have a pattern that does:

`(a & CST) != 0 ? 1: (bool)a` -> `a & (CST|1) != 0` to fix this I think.

[Bug tree-optimization/109960] [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

Andrew Pinski  changed:

   What|Removed |Added

  Known to work||8.5.0
  Known to fail||9.1.0
   Target Milestone|--- |10.5

[Bug tree-optimization/109960] New: [10/11/12/13/14 Regression] missing combining of `(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109960

Bug ID: 109960
   Summary: [10/11/12/13/14 Regression] missing combining of
`(a&1) != 0 || (a&2)!=0` into `(a&3)!=0`
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take the following C++ code (reduced from stmt_can_terminate_bb_p):
```
static inline bool f1(unsigned *a)
{
return (*a&1);
}
static inline bool f2(unsigned *a)
{
return (*a&2);
}

bool f(int c, unsigned *a)
{
  if (c)
return 0;
  return f2(a) || f1(a) ;
}
```

At -O1 we can produce:
```
movl$0, %eax
testl   %edi, %edi
jne .L1
testb   $3, (%rsi)
setne   %al
.L1:
ret
```
But at -O2 we get:
xorl%eax, %eax
testl   %edi, %edi
jne .L1
movl(%rsi), %edx
movl%edx, %eax
andl$1, %eax
andl$2, %edx
movl$1, %edx
cmovne  %edx, %eax
.L1:
ret

Which is just so much worse.
This started in GCC 9.

[Bug target/109927] Bootstrap fails for m68k in stage2 compilation of gimple-match.cc

2023-05-24 Thread userm57 at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109927

--- Comment #18 from Stan Johnson  ---
$ git clone git://gcc.gnu.org/git/gcc.git
$ cd gcc
$ git checkout master

I'm testing a manual bootstrap of "gcc version 14.0.0 20230524 (experimental)
(GCC)" now, accessed via git as shown above.

It will still take about 24 more hours for the bootstrap to finish (I'll send
an update if it fails), but with gimple-match.cc (and generic-match.cc, which
was not affected in my tests) split up, it looks like it will finish ok (it's
currently in about the middle of stage 2 and has successfully compiled all the
gimple-match-n.cc files).

Note that Gentoo's emerge of gcc-13 behaves a little differently than a manual
bootstrap. I don't know why, since I think I'm using Gentoo's ./configure
options in the manual bootstrap, but in Gentoo's emerge of gcc, they seem to
run cc1plus and "as" simultaneously for each compilation, perhaps aggravating
the memory issue for gimple-match.cc (or maybe not, since the problem is
virtual memory exhausted, not swap space exhausted).

Anyway, it looks like the solution was already close. Does anyone know whether
the change will be backported to gcc-12 or gcc-13 available from
ftp.gnu.org/pub/gnu/gcc?

Thanks to all of the GNU developers who continue to make modern tools available
for use on old hardware!

[Bug tree-optimization/109959] `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out at -O2+

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

--- Comment #3 from Andrew Pinski  ---
here is another related testcase but this was the exactly reduced one from
bitmap_single_bit_set_p :

```
_Bool f(unsigned a, int t)
{
  void g(void);
  if (t)
return 0;
  g();
  if (a > 1)
return 0;
  return a == 1;
}
```

this should be optimized down to:
```
_Bool f(unsigned a, int t)
{
  void g(void);
  if (t)
return 0;
  g();
  return a == 1;
}
```

[Bug tree-optimization/109959] `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out at -O2+

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

--- Comment #2 from Andrew Pinski  ---
I should note I found this while looking at code generation of
bitmap_single_bit_set_p after a match pattern addition.

[Bug tree-optimization/109959] `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out at -O2+

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

Andrew Pinski  changed:

   What|Removed |Added

Summary|`(a > 1) ? 0 : (a == 1)` is |`(a > 1) ? 0 : (a == 1)` is
   |not optimized when spelled  |not optimized when spelled
   |out |out at -O2+

--- Comment #1 from Andrew Pinski  ---
I should say this at -O2.

part of the reason is VRP changes `a == 1` to be `(bool)a` and then phiopt
comes along and decides to factor out the conversion (phiopt did that even
before my recent changes).

at -O1, it is actually optimized during reassoc1 (because the above is not
done) since GCC 7.

[Bug tree-optimization/109959] New: `(a > 1) ? 0 : (a == 1)` is not optimized when spelled out

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109959

Bug ID: 109959
   Summary: `(a > 1) ? 0 : (a == 1)` is not optimized when spelled
out
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
_Bool f(unsigned a)
{
if (a > 1)
  return 0;
return a == 1;
}


_Bool f0(unsigned a)
{
  return (a > 1) ? 0 : (a == 1);
}
```
Both of these should just optimize to:
`return a == 1`, f0 is currently.

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #7 from joseph at codesourcery dot com  ---
I suppose the question is how to interpret "the longest array (with the 
same element type) that would not make the structure larger than the 
object being accessed".  The difficulty of interpreting "make the 
structure larger" in terms of including post-array padding in the 
replacement structure is that there might not be a definition of what that 
post-array padding should be given the offset of the array need not be the 
same as the offset with literal replacement in the struct definition.

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #6 from joseph at codesourcery dot com  ---
For the standard, dynamically allocated case, you should only need to 
allocate enough memory to contain the initial part of the struct and the 
array members being accessed - not any padding after that array.  (There 
were wording problems before C99 TC2; see DR#282.)

[Bug tree-optimization/107986] [12/13/14 Regression] Bogus -Warray-bounds diagnostic with std::sort

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107986

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:1cd5bc387c453126fdb4c9400096180484ecddee

commit r14-1179-g1cd5bc387c453126fdb4c9400096180484ecddee
Author: Andrew MacLeod 
Date:   Wed May 24 09:52:26 2023 -0400

Gimple range PHI analyzer and testcases

Provide a PHI analyzer framework to provive better initial values for
PHI nodes which formk groups with initial values and single statements
which modify the PHI values in some predicatable way.

PR tree-optimization/107822
PR tree-optimization/107986
gcc/
* Makefile.in (OBJS): Add gimple-range-phi.o.
* gimple-range-cache.h (ranger_cache::m_estimate): New
phi_analyzer pointer member.
* gimple-range-fold.cc (fold_using_range::range_of_phi): Use
phi_analyzer if no loop info is available.
* gimple-range-phi.cc: New file.
* gimple-range-phi.h: New file.
* tree-vrp.cc (execute_ranger_vrp): Utililze a phi_analyzer.

gcc/testsuite/
* gcc.dg/pr107822.c: New.
* gcc.dg/pr107986-1.c: New.

[Bug tree-optimization/107822] [13/14/14 Regression] Dead Code Elimination Regression at -Os (trunk vs. 12.2.0)

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107822

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:1cd5bc387c453126fdb4c9400096180484ecddee

commit r14-1179-g1cd5bc387c453126fdb4c9400096180484ecddee
Author: Andrew MacLeod 
Date:   Wed May 24 09:52:26 2023 -0400

Gimple range PHI analyzer and testcases

Provide a PHI analyzer framework to provive better initial values for
PHI nodes which formk groups with initial values and single statements
which modify the PHI values in some predicatable way.

PR tree-optimization/107822
PR tree-optimization/107986
gcc/
* Makefile.in (OBJS): Add gimple-range-phi.o.
* gimple-range-cache.h (ranger_cache::m_estimate): New
phi_analyzer pointer member.
* gimple-range-fold.cc (fold_using_range::range_of_phi): Use
phi_analyzer if no loop info is available.
* gimple-range-phi.cc: New file.
* gimple-range-phi.h: New file.
* tree-vrp.cc (execute_ranger_vrp): Utililze a phi_analyzer.

gcc/testsuite/
* gcc.dg/pr107822.c: New.
* gcc.dg/pr107986-1.c: New.

[Bug libstdc++/109947] std::expected monadic operations do not support move-only error types yet

2023-05-24 Thread aemseemann at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109947

Martin Seemann  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #5 from Martin Seemann  ---
Thanks for the clarification! Now I am convinced that it is not a bug in
libstdc++ (although I still doubt that the side-effects were intended when the
committee formulated the "Effects" for monadic operations, but that's not
relevant here).

Marking as resolved and sorry for the noise.

[Bug fortran/90504] Improved NORM2 algorithm

2023-05-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90504

--- Comment #1 from anlauf at gcc dot gnu.org ---
(In reply to Janne Blomqvist from comment #0)
> Hanson, Hopkins, Remark on Algorithm 539: A Modern Fortran Reference
> Implementation for Carefully Computing the Euclidean Norm,
> https://dl.acm.org/citation.cfm?id=3134441
> 
> Above article tests different algorithms for NORM2 and tests performance and
> numerical accuracy.

This article is behind a paywall.

Is there a publicly available description?

[Bug fortran/87270] "FINAL" subroutine is called when compiled with "gfortran -O1", but not "gfortran -O0"

2023-05-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87270

--- Comment #6 from anlauf at gcc dot gnu.org ---
All current compilers seem to give the same, apparently correct result,
even with different optimization level.

So can we close this finally?

[Bug c++/109876] [10/11/12/13/14 Regression] initializer_list not usable in constant expressions in a template

2023-05-24 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109876

--- Comment #9 from Jason Merrill  ---
(In reply to Marek Polacek from comment #8)
> > Instead, we should probably treat num as value-dependent even though it 
> > actually isn't.
> 
> An attempt to implement that:
> 
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -27969,6 +27969,12 @@ value_dependent_expression_p (tree expression)
>else if (TYPE_REF_P (TREE_TYPE (expression)))
> /* FIXME cp_finish_decl doesn't fold reference initializers.  */
> return true;
> +  else if (DECL_DECLARED_CONSTEXPR_P (expression)
> +  && TREE_STATIC (expression)

I'd expect we could get a similar issue with non-static constexprs.

> +  && !DECL_NAMESPACE_SCOPE_P (expression)

This seems an unnecessary optimization?

> +  && DECL_INITIAL (expression)

Perhaps we also want to return true if DECL_INITIAL is null?

> +  && TREE_CODE (DECL_INITIAL (expression)) == IMPLICIT_CONV_EXPR)

Maybe !TREE_CONSTANT?

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #5 from Martin Uecker  ---
Clang bug:
https://github.com/llvm/llvm-project/issues/62929

[Bug libstdc++/109947] std::expected monadic operations do not support move-only error types yet

2023-05-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109947

--- Comment #4 from Jonathan Wakely  ---
(In reply to Martin Seemann from comment #3)
> So it comes down to how to interpret the "Effects:" clause: Does "Equivalent
> to " mean  that all restrictions of
> `value()` apply transitively or is it merely an implementation hint?

The former.  The standard says:

Whenever the Effects element specifies that the semantics of some function F
are Equivalent to some code sequence, then the various elements are interpreted
as follows. If F’s semantics specifies any Constraints or Mandates elements,
then those requirements are logically imposed prior to the equivalent-to
semantics. Next, the semantics of the code sequence are determined by the
Constraints, Mandates, Preconditions, Effects, Synchronization, Postconditions,
Returns, Throws, Complexity, Remarks, and Error conditions specified for the
function invocations contained in the code sequence. The value returned from F
is specified by F’s Returns element, or if F has no Returns element, a non-void
return from F is specified by the return statements (8.7.4) in the code
sequence. If F’s semantics contains a Throws, Postconditions, or Complexity
element, then that supersedes any occurrences of that element in the code
sequence.


> (Strangely enough, in the "Effects:" clause of `value_or()&&` the expression
> `std::move(**this)` is used  instead of `std::move(value())`. Maybe this is
> an oversight/inconsistency of the standard.)

Yes. The spec were written by different people at different times.

[Bug c++/109876] [10/11/12/13/14 Regression] initializer_list not usable in constant expressions in a template

2023-05-24 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109876

--- Comment #8 from Marek Polacek  ---
> Instead, we should probably treat num as value-dependent even though it 
> actually isn't.

An attempt to implement that:

--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -27969,6 +27969,12 @@ value_dependent_expression_p (tree expression)
   else if (TYPE_REF_P (TREE_TYPE (expression)))
/* FIXME cp_finish_decl doesn't fold reference initializers.  */
return true;
+  else if (DECL_DECLARED_CONSTEXPR_P (expression)
+  && TREE_STATIC (expression)
+  && !DECL_NAMESPACE_SCOPE_P (expression)
+  && DECL_INITIAL (expression)
+  && TREE_CODE (DECL_INITIAL (expression)) == IMPLICIT_CONV_EXPR)
+   return true;
   return false;

 case DYNAMIC_CAST_EXPR:

[Bug fortran/104350] ICE in gfc_array_dimen_size(): Bad dimension

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104350

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:ec2e86274427a402d2de2199ba550f7295ea9b5f

commit r14-1175-gec2e86274427a402d2de2199ba550f7295ea9b5f
Author: Harald Anlauf 
Date:   Wed May 24 21:04:43 2023 +0200

Fortran: reject bad DIM argument of SIZE intrinsic in simplification
[PR104350]

gcc/fortran/ChangeLog:

PR fortran/104350
* simplify.cc (simplify_size): Reject DIM argument of intrinsic
SIZE
with error when out of valid range.

gcc/testsuite/ChangeLog:

PR fortran/104350
* gfortran.dg/size_dim_2.f90: New test.

[Bug fortran/103794] ICE in gfc_check_reshape, at fortran/check.c:4727

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103794

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:5fd5d8fb744fd9251d04e4b17d04f2340e6a283b

commit r14-1174-g5fd5d8fb744fd9251d04e4b17d04f2340e6a283b
Author: Harald Anlauf 
Date:   Sun May 21 22:25:29 2023 +0200

Fortran: checking and simplification of RESHAPE intrinsic [PR103794]

gcc/fortran/ChangeLog:

PR fortran/103794
* check.cc (gfc_check_reshape): Expand constant arguments SHAPE and
ORDER before checking.
* gfortran.h (gfc_is_constant_array_expr): Add prototype.
* iresolve.cc (gfc_resolve_reshape): Expand constant argument
SHAPE.
* simplify.cc (is_constant_array_expr): If array is determined to
be
constant, expand small array constructors if needed.
(gfc_is_constant_array_expr): Wrapper for is_constant_array_expr.
(gfc_simplify_reshape): Fix check for insufficient elements in
SOURCE
when no padding specified.

gcc/testsuite/ChangeLog:

PR fortran/103794
* gfortran.dg/reshape_10.f90: New test.
* gfortran.dg/reshape_11.f90: New test.

[Bug libstdc++/109261] std::experimental::simd is not usable in several constant expressions

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109261

--- Comment #13 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Matthias Kretz
:

https://gcc.gnu.org/g:2b502c3119c91fe3ba2313f0842a3bedd395bc91

commit r12-9651-g2b502c3119c91fe3ba2313f0842a3bedd395bc91
Author: Matthias Kretz 
Date:   Wed May 24 12:50:46 2023 +0200

libstdc++: Fix SFINAE for __is_intrinsic_type on ARM

On ARM NEON doesn't support double, so __is_intrinsic_type_v should say false (instead of being ill-formed).

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/109261
* include/experimental/bits/simd.h (__intrinsic_type):
Specialize __intrinsic_type and
__intrinsic_type in any case, but provide the member
type only with __aarch64__.

(cherry picked from commit aa8b363171a95b8f867a74f29c75f9577e9087e1)

[Bug libstdc++/109949] new test case experimental/simd/pr109261_constexpr_simd.cc in r12-9647-g3acbaf1b253215 fails

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109949

--- Comment #10 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Matthias Kretz
:

https://gcc.gnu.org/g:ff7360dafe209b960535eaaa3efcfbaaa44daff9

commit r12-9652-gff7360dafe209b960535eaaa3efcfbaaa44daff9
Author: Matthias Kretz 
Date:   Wed May 24 16:43:07 2023 +0200

libstdc++: Fix type of first argument to vec_cntm call

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/109949
* include/experimental/bits/simd.h (__intrinsic_type): If
__ALTIVEC__ is defined, map gnu::vector_size types to their
corresponding __vector T types without losing unsignedness of
integer types. Also prefer long long over long.
* include/experimental/bits/simd_ppc.h (_S_popcount): Cast mask
object to the expected unsigned vector type.

(cherry picked from commit efd2b55d8562c6e80cb7ee8b9b1f9418f0c00cd9)

[Bug libstdc++/109261] std::experimental::simd is not usable in several constant expressions

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109261

--- Comment #12 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Matthias Kretz
:

https://gcc.gnu.org/g:8be71168f7bbafa04f592a7524432351ffea71ba

commit r12-9650-g8be71168f7bbafa04f592a7524432351ffea71ba
Author: Matthias Kretz 
Date:   Tue May 23 23:48:49 2023 +0200

libstdc++: Add missing constexpr to simd_neon

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/109261
* include/experimental/bits/simd_neon.h (_S_reduce): Add
constexpr and make NEON implementation conditional on
not __builtin_is_constant_evaluated.

(cherry picked from commit b0a483b0a011f9cbc8b25053eae809c77dae2a12)

[Bug libstdc++/109949] new test case experimental/simd/pr109261_constexpr_simd.cc in r12-9647-g3acbaf1b253215 fails

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109949

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Matthias Kretz :

https://gcc.gnu.org/g:efd2b55d8562c6e80cb7ee8b9b1f9418f0c00cd9

commit r14-1173-gefd2b55d8562c6e80cb7ee8b9b1f9418f0c00cd9
Author: Matthias Kretz 
Date:   Wed May 24 16:43:07 2023 +0200

libstdc++: Fix type of first argument to vec_cntm call

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/109949
* include/experimental/bits/simd.h (__intrinsic_type): If
__ALTIVEC__ is defined, map gnu::vector_size types to their
corresponding __vector T types without losing unsignedness of
integer types. Also prefer long long over long.
* include/experimental/bits/simd_ppc.h (_S_popcount): Cast mask
object to the expected unsigned vector type.

[Bug libstdc++/109947] std::expected monadic operations do not support move-only error types yet

2023-05-24 Thread aemseemann at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109947

--- Comment #3 from Martin Seemann  ---
Thanks for pointing me to the LWG issue. It makes sense that the error type
must be copyable for the `value()` overloads due to potentially throwing a
`bad_expected_access` with the embedded error embedded.

However, the monadic operations will never throw this exception.
Consequently, the standard draft for the monadic operations
(https://eel.is/c++draft/expected.object.monadic) does not contain any
"Throws:" clause nor is copyability of the error type included in the
"Constraints:" clause.

So it comes down to how to interpret the "Effects:" clause: Does "Equivalent to
" mean  that all restrictions of
`value()` apply transitively or is it merely an implementation hint?

(Strangely enough, in the "Effects:" clause of `value_or()&&` the expression
`std::move(**this)` is used  instead of `std::move(value())`. Maybe this is an
oversight/inconsistency of the standard.)

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #4 from Martin Uecker  ---

The concern would be that a program relying on the size of an object being
larger may then have out of bounds accesses.  But rereading the standard, I am
also not not seeing that this is required. (for the extension nothing is
required anyway, but it should be consistent with it).

[Bug fortran/104350] ICE in gfc_array_dimen_size(): Bad dimension

2023-05-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104350

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |anlauf at gcc dot 
gnu.org

--- Comment #3 from anlauf at gcc dot gnu.org ---
Submitted: https://gcc.gnu.org/pipermail/fortran/2023-May/059322.html

[Bug rtl-optimization/101188] [AVR] Miscompilation and function pointers

2023-05-24 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188

--- Comment #6 from Georg-Johann Lay  ---
Created attachment 55152
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55152=edit
diff testcase by v4.9.2 vs v5.2.1

Code from v4.9.2 is correct, but from v5.2.1 is bogus:

--- fail1-4.9.2.sx  2023-05-24 17:20:46.508698338 +0200
+++ fail1-5.2.1.sx  2023-05-24 17:19:50.019976879 +0200
@@ -39,11 +39,11 @@
adiw r24,1   ;  13  addhi3_clobber/1[length = 1]
std Z+1,r25  ;  14  *movhi/4[length = 2]
st Z,r24
-   adiw r30,2   ;  15  *addhi3/3   [length = 1]
-   movw r14,r16 ;  39  *movhi/1[length = 1]
-   ldi r24,68   ;  16  addhi3_clobber/3[length = 3]
-   add r14,r24
+   movw r14,r16 ;  38  *movhi/1[length = 1]
+   ldi r31,68   ;  15  addhi3_clobber/3[length = 3]
+   add r14,r31
adc r15,__zero_reg__
+   adiw r30,2   ;  17  *addhi3/3   [length = 1]
ld __tmp_reg__,Z+;  18  *movhi/3[length = 3]
ld r31,Z
mov r30,__tmp_reg__

[Bug c++/109958] [10/11/12/13/14 Regression] ICE: in build_ptrmem_type, at cp/decl.cc:11066 taking the address of bound static member function brought into derived class by using-declaration

2023-05-24 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109958

Marek Polacek  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
   Priority|P3  |P2
Summary|ICE: in build_ptrmem_type,  |[10/11/12/13/14 Regression]
   |at cp/decl.cc:11066 taking  |ICE: in build_ptrmem_type,
   |the address of bound static |at cp/decl.cc:11066 taking
   |member function brought |the address of bound static
   |into derived class by   |member function brought
   |using-declaration   |into derived class by
   ||using-declaration
   Target Milestone|--- |10.5

[Bug c++/109958] ICE: in build_ptrmem_type, at cp/decl.cc:11066 taking the address of bound static member function brought into derived class by using-declaration

2023-05-24 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109958

Marek Polacek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 CC||mpolacek at gcc dot gnu.org
   Last reconfirmed||2023-05-24

--- Comment #1 from Marek Polacek  ---
Confirmed.  r0-115460-g57910f3a9a81e9:

commit 57910f3a9a81e9ad122a814255197f6f24c6af08
Author: Jason Merrill 
Date:   Sat Mar 3 19:53:30 2012 -0500

class.c (add_method): Always build an OVERLOAD for using-decls.

* class.c (add_method): Always build an OVERLOAD for using-decls.
* search.c (lookup_member): Handle getting an OVERLOAD for a
single function.

From-SVN: r184873

[Bug c++/109958] New: ICE: in build_ptrmem_type, at cp/decl.cc:11066 taking the address of bound static member function brought into derived class by using-declaration

2023-05-24 Thread ed at catmur dot uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109958

Bug ID: 109958
   Summary: ICE: in build_ptrmem_type, at cp/decl.cc:11066 taking
the address of bound static member function brought
into derived class by using-declaration
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ed at catmur dot uk
  Target Milestone: ---

struct B { static int f(); };
struct D : B { using B::f; };
void f(D d) {  }

: In function 'void f(D)':
:3:18: error: ISO C++ forbids taking the address of a bound member
function to form a pointer to member function.  Say '::f' [-fpermissive]
3 | void f(D d) {  }
  |~~^
:3:18: internal compiler error: in build_ptrmem_type, at
cp/decl.cc:11066
3 | void f(D d) {  }
  |  ^
0x23a0cee internal_error(char const*, ...)
???:0
0xa95fae fancy_abort(char const*, int, char const*)
???:0
0xd31f7f build_x_unary_op(unsigned int, tree_code, cp_expr, tree_node*, int)
???:0
0xc7ab2f c_parse_file()
???:0
0xdb9519 c_common_parse_file()
???:0

This appears to have been broken somewhere between 4.7.4 and 4.8.1.

[Bug fortran/109948] [13/14 Regression] ICE(segfault) in gfc_expression_rank() from gfc_op_rank_conformable()

2023-05-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109948

--- Comment #4 from anlauf at gcc dot gnu.org ---
The following patch fixes NULL pointer dereference with the reduced
testcases:

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 83e45f1b693..89c62b3eb1e 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -5640,7 +5643,7 @@ gfc_expression_rank (gfc_expr *e)
   if (ref->type != REF_ARRAY)
continue;

-  if (ref->u.ar.type == AR_FULL)
+  if (ref->u.ar.type == AR_FULL && ref->u.ar.as)
{
  rank = ref->u.ar.as->rank;
  break;

Can you check if this works for you?

Still needs regtesting.

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread pascal_cuoq at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

--- Comment #3 from Pascal Cuoq  ---
@Andrew Pinski

You don't even need to invoke the fact that this is an extension. GCC could
reserve 17 bytes for each variable i of type “int”, and as long as “sizeof i”
continued to evaluate to 4 (4 being the value of “sizeof(int)” for x86), no-one
would be able to claim that GCC is not generating “correct” assembly code.

This ticket is pointing out that the current behavior for initialized FAMs is
suboptimal for programs that rely on the GCC extension, just like it would be
suboptimal to reserve 17 bytes for each “int” variable for standard C programs
(and I would open a ticket for it if I noticed such a behavior).

It's not breaking anything and it may be inconvenient to change, and as a
ticket that does not affect correctness, it can be ignored indefinitely. It's
just a suggestion for smaller binaries that might also end up marginally faster
as a result.

@Martin Uecker

Considering how casually phrased the description of FAMs was in C99 and
remained in later standards (see https://stackoverflow.com/q/73497572/139746
for me trying to make sense of some of the relevant words), I doubt that the
standard has anything to say about the compiler extension being discussed. But
if you have convincing arguments, you could spend a few minutes filing a bug
against Clang to tell them that they are making the binaries they generate too
small and efficient.

[Bug jit/66594] jitted code should use -mtune=native

2023-05-24 Thread schuchart at icl dot utk.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66594

Joseph  changed:

   What|Removed |Added

 CC||schuchart at icl dot utk.edu

--- Comment #10 from Joseph  ---
The lack of target-specific optimizations is biting us quite a bit and manually
specifying an architecture is not really an option, unless we duplicate the
detection mechanism of GCC, which is not ideal. I am not familiar with the GCC
code base and from the discussion below it's not clear what would be needed to
advance this. If someone could provide some hints on what is missing and
how/where it could be implemented we could probably take a stab at it. 

Would it be sufficient to add a macro to the header of the targets (as
suggested here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66594#c6) that
provide host_detect_local_cpu and ignore the ones that do not provide it? Or
would it be better to hard-code calls for the architectures that provide them,
like in the referenced patch but with architecture-specific pre-processor
guards? We mostly care about i386 and arm/aarch64 but covering all available
bases would be necessary, I guess.

[Bug tree-optimization/109957] New: Missing loop PHI optimization

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109957

Bug ID: 109957
   Summary: Missing loop PHI optimization
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
void foo();
int main() {
  _Bool c = 0;
  _Bool e = 1;
  int i;
  for (i = 0; i < 1; i++)
  {
c |= (e!=0);
e = 0;
  }
  if (c == 0)
foo();
  return 0;
}
```

This should be just optimized to just `return 0`.
The reason is once c is 1, it will always stay 1.
But currently we don't notice that.

Note this code is reduced from PR 108352 testcase after a phiopt improvement
that provided the above form and ran into a testcase failure because of that.

[Bug fortran/109948] [13/14 Regression] ICE(segfault) in gfc_expression_rank() from gfc_op_rank_conformable()

2023-05-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109948

--- Comment #3 from anlauf at gcc dot gnu.org ---
(In reply to Rimvydas (RJ) from comment #1)
> More trivial testcase resulting in similar ICE.

Yep, even smaller:

subroutine foo(k_2d)
  implicit none
  integer :: k_2d(:)
  integer :: i
  associate(k=>k_2d)
i = k(1)
if (any(k==1)) i = 1
  end associate
end subroutine foo

The associate is apparently one of the common components that is needed.

[Bug fortran/109948] [13/14 Regression] ICE(segfault) in gfc_expression_rank() from gfc_op_rank_conformable()

2023-05-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109948

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||anlauf at gcc dot gnu.org
Summary|ICE(segfault) in|[13/14 Regression]
   |gfc_expression_rank() from  |ICE(segfault) in
   |gfc_op_rank_conformable()   |gfc_expression_rank() from
   ||gfc_op_rank_conformable()
   Last reconfirmed||2023-05-24

--- Comment #2 from anlauf at gcc dot gnu.org ---
Confirmed.

Further reduced:

subroutine foo(y, x)
  implicit none
  real :: y(:)
  real :: x(:)

  associate(z=>y)
where ( z < 0.0 ) x(:) = z(:)
where ( z < 0.0 ) x(:) = z(:)
  end associate

end subroutine foo

[Bug fortran/109948] ICE(segfault) in gfc_expression_rank() from gfc_op_rank_conformable()

2023-05-24 Thread rimvydas.jas at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109948

--- Comment #1 from Rimvydas (RJ)  ---
More trivial testcase resulting in similar ICE.

$ cat test_associate2.f90 
subroutine foo(grib)
implicit none
type b
  integer, allocatable :: k_2d(:)
end type
type(b) :: grib
integer :: i
associate(k=>grib%k_2d)
i = k(1)
if (any(k==1)) i = 1
end associate
end subroutine foo

[Bug middle-end/109840] [14 Regression] internal compiler error: in expand_fn_using_insn, at internal-fn.cc:153 when building graphite2

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109840

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:2738955004256c2e9753364d78a7be340323b74b

commit r14-1171-g2738955004256c2e9753364d78a7be340323b74b
Author: Roger Sayle 
Date:   Wed May 24 17:32:20 2023 +0100

PR middle-end/109840: Preserve popcount/parity type in match.pd.

PR middle-end/109840 is a regression introduced by my recent patch to
fold popcount(bswap(x)) as popcount(x).  When the bswap and the popcount
have the same precision, everything works fine, but this optimization also
allowed a zero-extension between the two.  The oversight is that we need
to be strict with type conversions, both to avoid accidentally changing
the argument type to popcount, and also to reflect the effects of
argument/return-value promotion in the call to bswap, so this zero
extension
needs to be preserved/explicit in the optimized form.

Interestingly, match.pd should (in theory) be able to narrow calls to
popcount and parity, removing a zero-extension from its argument, but
that is an independent optimization, that needs to check IFN_ support.
Many thanks to Andrew Pinski for his help/fixes with these transformations.

2023-05-24  Roger Sayle  

gcc/ChangeLog
PR middle-end/109840
* match.pd : Preserve zero-extension when
optimizing popcount((T)bswap(x)) and popcount((T)rotate(x,y)) as
popcount((T)x), so the popcount's argument keeps the same type.
:  Likewise preserve extensions when
simplifying parity((T)bswap(x)) and parity((T)rotate(x,y)) as
parity((T)x), so that the parity's argument type is the same.

gcc/testsuite/ChangeLog
PR middle-end/109840
* gcc.dg/fold-parity-8.c: New test.
* gcc.dg/fold-popcount-11.c: Likewise.

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread muecker at gwdg dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

Martin Uecker  changed:

   What|Removed |Added

 CC||muecker at gwdg dot de

--- Comment #2 from Martin Uecker  ---
To me it seems that the C standard requires that the object has 
size sizeof(struct s) + n * sizeof(struct t) if you want to store n elements
even when the array then starts at a smaller offset.

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |trivial

--- Comment #1 from Andrew Pinski  ---
Considering this is an extension, I think GCC is still correct.

[Bug c/109956] New: GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread pascal_cuoq at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956

Bug ID: 109956
   Summary: GCC reserves 9 bytes for struct s { int a; char b;
char t[]; } x = {1, 2, 3};
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pascal_cuoq at hotmail dot com
  Target Milestone: ---

Static-lifetime variables of type “struct with FAM” (flexible array member)
with an initializer for the FAM are a GCC extension.

As of GCC 13.1 and Compiler Explorer “trunk”, targeting x86, the definition
“struct s { int a; char b; char t[]; } x = {1, 2, 3};” reserves 9 bytes for x,
and in fact, with various initializers, the trailing padding for variables of
type “struct s” is always 3, as if the size to reserve for the variable was
computed as “sizeof (struct s) + n * sizeof(element)”.

Input file:
struct s { int a; char b; char t[]; } x = {1, 2, 3};

Command:
gcc -S fam_init.c

Result (with Ubuntu 9.4.0-1ubuntu1~20.04.1 which exhibits the same behavior as
the recent versions on Compiler Explorer):
.align 8
.type   x, @object
.size   x, 9
x:
.long   1
.byte   2
.byte   3
.zero   3


Clang up to version 14 used to round up the size of the variable to a multiple
of the alignment of the struct, but even this is not necessary. It is only
necessary that the size reserved for a variable of type t is at least
“sizeof(t)” bytes, and also to reserve enough space for the initializer. Clang
15 and later uses the optimal formula:

max(sizeof (struct s), offsetof(struct s, t[n]))

Compiler Explorer link: https://gcc.godbolt.org/z/5W7h4KWT1

This ticket is to suggest that GCC uses the same optimal formula as Clang 15
and later.

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2023-05-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #55148|0   |1
is obsolete||

--- Comment #49 from Jakub Jelinek  ---
Created attachment 55151
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55151=edit
gcc14-bitint-wip.patch

Added a testcase with various operations with _BitInt(N) operands and tweaked
c-typeck.cc/fold-const.cc to accept those.

[Bug libstdc++/109949] new test case experimental/simd/pr109261_constexpr_simd.cc in r12-9647-g3acbaf1b253215 fails

2023-05-24 Thread mkretz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109949

--- Comment #8 from Matthias Kretz (Vir)  ---
Created attachment 55150
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55150=edit
proposed solution

This patch allows unsigned intrinsic types and calls vec_cntm correctly.

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-24 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #38 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #37)
> 
> As far as I understand from GCC sources, function I patched
> 'expand_ashiftrt' process only constant values of shift. As you can see
> earlier, I added your and other examples to tests.

OK, thanks for the additional test cases.  It really looks like the way the
constant shift is expanded (via ashrsi3_n insn) on SH1/SH2 is getting in the
way.

The tst insn is mainly formed by the combine pass, which relies on certain insn
patterns and combinations thereof.  See also sh.md, around line 530.

You can look at the debug output with the -fdump-rtl-all option to see what's
happening in the RTL passes.

What your patch is doing is to make it not emit the ashrsi3_n insn for constant
shifts altogether?  I guess it will make code that actually needs those real
shifts larger, as it will always emit the whole shift stitching sequence.  That
might be a good thing or not.


> It looks like really
> dynamic shifts translate to library calls.

So the option name '-mdisable-dynshift-libcall' doesn't make sense.
What it actually does is more like '-mdisable-constshift-libcall'.

> 
> Should I test more exotic situations? If so, could you please help me with
> really exotic or weired examples?

Have you had a look at the existing test cases for this in
gcc/testsuite/gcc.target/sh ?

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2023-05-24 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #48 from rguenther at suse dot de  ---
> Am 24.05.2023 um 16:18 schrieb jakub at gcc dot gnu.org 
> :
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989
> 
> --- Comment #47 from Jakub Jelinek  ---
> But then the pass effectively has to do lifetime analysis of the _BitInt(N) 
> for
> N > 128 etc. SSA_NAMEs and perform the partitioning of those SSA_NAMEs into
> VAR_DECLs/PARM_DECLs/RESULT_DECLs, so that we don't blow away the local stack;
> perhaps as you wrote with some local subgraphs turned into a loop which will
> handle multiple operations together instead of just one operation per loop.
> Or just use different VAR_DECLs but stick in clobbers where they will be dead
> and hope out of ssa can merge those.
> Anyway, more work than I hoped.
> Though, perhaps it can be also done incrementally, with bare minimum first and
> improvements later.

Sure, this is just what I think users will expect.  We don’t have the high
level infrastructure to do this afterwards such as loop fusion and variable
contraction (well, in theory graphite can do it but even there we lack actual
transform bits).

[Bug target/109949] new test case experimental/simd/pr109261_constexpr_simd.cc in r12-9647-g3acbaf1b253215 fails

2023-05-24 Thread mkretz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109949

--- Comment #7 from Matthias Kretz (Vir)  ---
> You should backport to N-1 first [...]

That was my intent. My workflow had not yet adapted to the existence of
releases/gcc-13. Fixed.

> never use -mpower9-vector and friends

I use -mpcu in my dejagnu boards (and the equivalent for 'check-simd'). IIUC
the -maltivec -mpower9-vector flags are added by
check_vect_support_and_set_flags in lib/target-supports.exp.

The problem was a branch that I apparently never tested (because the check-simd
testsuite wants to compile *and* run).

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/experimental/bits/simd_ppc.h;h=eca1b34241bb4efdbbb6490550750d81aee248b3;hb=HEAD#l133

The `vec_cntm(__to_intrin(__kv), 1)` call uses an incorrect type for the first
argument. The compiler message isn't very helpful, though. Patch coming up.

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2023-05-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #47 from Jakub Jelinek  ---
But then the pass effectively has to do lifetime analysis of the _BitInt(N) for
N > 128 etc. SSA_NAMEs and perform the partitioning of those SSA_NAMEs into
VAR_DECLs/PARM_DECLs/RESULT_DECLs, so that we don't blow away the local stack;
perhaps as you wrote with some local subgraphs turned into a loop which will
handle multiple operations together instead of just one operation per loop.
Or just use different VAR_DECLs but stick in clobbers where they will be dead
and hope out of ssa can merge those.
Anyway, more work than I hoped.
Though, perhaps it can be also done incrementally, with bare minimum first and
improvements later.

[Bug target/109933] __atomic_test_and_set is broken for BIG ENDIAN riscv targets

2023-05-24 Thread rory.bolt at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109933

--- Comment #8 from Rory Bolt  ---
So...

The logic for this is simple:

For little endian the shift amount is ((address & 3) * 8)

For big endian the shift amount is ((3 -(address & 3)) * 8)

Unfortunately I have ZERO experience modifying GCC, and the mechanism to
determine if it is generating big endian code or little endian code is not
obvious to me...

So working on this in my spare time it will be a while for me to create a
patch. That said, I do have a full big endian linux environment so I can test a
patch (relatively quickly - it takes a while to build GCC ;-)) if some one
beats me to this.

[Bug target/109949] new test case experimental/simd/pr109261_constexpr_simd.cc in r12-9647-g3acbaf1b253215 fails

2023-05-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109949

--- Comment #6 from Segher Boessenkool  ---
(In reply to Matthias Kretz (Vir) from comment #4)
> With -mcpu=power10 I see the issue. The problem has been there all the time
> and only surfaced with this test. (It should also have shown on `make
> check-simd` in libstdc++.)

Yup, you should never use -mpower9-vector and friends.  Such options are handy
*during development* but are heavily problematic later; they should never have
existed in mainline.

What is the actual problem here?  Or do you want to build up the suspense and
only show it in the patch you will send :-)

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-24 Thread amacleod at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

Andrew Macleod  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #42 from Andrew Macleod  ---
I think we can close this now, I think everything we plan to do has been done.

[Bug target/109949] new test case experimental/simd/pr109261_constexpr_simd.cc in r12-9647-g3acbaf1b253215 fails

2023-05-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109949

--- Comment #5 from Segher Boessenkool  ---
(In reply to Matthias Kretz (Vir) from comment #2)
> Yes, I stopped my backporting efforts when I became aware that it's failing
> on ARM. I'll get to PPC ASAP and then continue with the backports.

You should backport to N-1 first, only then to N-2, etc.  Sanity is nice :-)

Next time :-)

[Bug target/99195] Optimise away vec_concat of 64-bit AdvancedSIMD operations with zeroes in aarch64

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99195

--- Comment #16 from CVS Commits  ---
The master branch has been updated by Kyrylo Tkachov :

https://gcc.gnu.org/g:b30ab0dcf9db2ac6d81fb3743add1fbfa0d18f6e

commit r14-1167-gb30ab0dcf9db2ac6d81fb3743add1fbfa0d18f6e
Author: Kyrylo Tkachov 
Date:   Wed May 24 14:52:34 2023 +0100

aarch64: PR target/99195 Annotate vector shift patterns for vec-concat-zero

Continuing the series of straightforward annotations, this one handles the
normal (not widening or narrowing) vector shifts.
Tests included.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_simd_lshr): Rename
to...
(aarch64_simd_lshr): ... This.
(aarch64_simd_ashr): Rename to...
(aarch64_simd_ashr): ... This.
(aarch64_simd_imm_shl): Rename to...
(aarch64_simd_imm_shl): ... This.
(aarch64_simd_reg_sshl): Rename to...
(aarch64_simd_reg_sshl): ... This.
(aarch64_simd_reg_shl_unsigned): Rename to...
(aarch64_simd_reg_shl_unsigned): ... This.
(aarch64_simd_reg_shl_signed): Rename to...
(aarch64_simd_reg_shl_signed): ... This.
(vec_shr_): Rename to...
(vec_shr_): ... This.
(aarch64_shl): Rename to...
(aarch64_shl): ... This.
(aarch64_qshl): Rename to...
(aarch64_qshl): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add testing for shifts.
* gcc.target/aarch64/simd/pr99195_6.c: Likewise.
* gcc.target/aarch64/simd/pr99195_8.c: New test.

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-24 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #37 from Alexander Klepikov  
---
> Can you also compile for little endian, and most of all, use -O2
> optimization level.  Some optimizations are not done below -O2.

Here's source file, I added functions with non-constant shifts

$ cat f.c
#define ADDR 0x
#define P ((unsigned char *)ADDR)
#define FLAG 0x40
#define S 7

unsigned char f_char_var(char v){
return (v & FLAG) == FLAG;
}

unsigned char f_unsigned_char_var(unsigned char v){
return (v & FLAG) == FLAG;
}

unsigned char f_symbol(void){
return (*P & FLAG) == FLAG;
}

unsigned char f_symbol_zero(void){
return (*P & FLAG) == 0;
}

unsigned char f_symbol_non_zero(void){
return (*P & FLAG) != 0;
}

unsigned int dyn_lshift (unsigned int x, unsigned int y)
{
  return x << (y & 31);
}

unsigned int dyn_rshift (unsigned int x, unsigned int y)
{
  return x >> (y & 31);
}

unsigned int really_dyn_lshift (unsigned int x, unsigned int y)
{
  return x << y;
}

unsigned int really_dyn_rshift (unsigned int x, unsigned int y)
{
  return x >> y;
}

With patch disabled, -O2 -mb:

$ cat f.s
.file   "f.c"
.text
.text
.align 1
.align 2
.global _f_char_var
.type   _f_char_var, @function
_f_char_var:
mov.l   .L4,r1
sts.l   pr,@-r15
jsr @r1
exts.b  r4,r4
mov r4,r0
and #1,r0
lds.l   @r15+,pr
rts
nop
.L5:
.align 2
.L4:
.long   ___ashiftrt_r4_6
.size   _f_char_var, .-_f_char_var
.align 1
.align 2
.global _f_unsigned_char_var
.type   _f_unsigned_char_var, @function
_f_unsigned_char_var:
mov.l   .L8,r1
sts.l   pr,@-r15
jsr @r1
exts.b  r4,r4
mov r4,r0
and #1,r0
lds.l   @r15+,pr
rts
nop
.L9:
.align 2
.L8:
.long   ___ashiftrt_r4_6
.size   _f_unsigned_char_var, .-_f_unsigned_char_var
.align 1
.align 2
.global _f_symbol
.type   _f_symbol, @function
_f_symbol:
mov.l   .L12,r1
sts.l   pr,@-r15
mov.b   @r1,r4
mov.l   .L13,r1
jsr @r1
nop
mov r4,r0
and #1,r0
lds.l   @r15+,pr
rts
nop
.L14:
.align 2
.L12:
.long   -65536
.L13:
.long   ___ashiftrt_r4_6
.size   _f_symbol, .-_f_symbol
.align 1
.align 2
.global _f_symbol_zero
.type   _f_symbol_zero, @function
_f_symbol_zero:
mov.l   .L16,r1
mov.b   @r1,r0
tst #64,r0
rts
movtr0
.L17:
.align 2
.L16:
.long   -65536
.size   _f_symbol_zero, .-_f_symbol_zero
.align 1
.align 2
.global _f_symbol_non_zero
.type   _f_symbol_non_zero, @function
_f_symbol_non_zero:
mov.l   .L20,r1
sts.l   pr,@-r15
mov.b   @r1,r4
mov.l   .L21,r1
jsr @r1
nop
mov r4,r0
and #1,r0
lds.l   @r15+,pr
rts
nop
.L22:
.align 2
.L20:
.long   -65536
.L21:
.long   ___ashiftrt_r4_6
.size   _f_symbol_non_zero, .-_f_symbol_non_zero
.align 1
.align 2
.global _dyn_lshift
.type   _dyn_lshift, @function
_dyn_lshift:
mov.l   .L25,r1
sts.l   pr,@-r15
jsr @r1
mov r5,r0
lds.l   @r15+,pr
rts
nop
.L26:
.align 2
.L25:
.long   ___ashlsi3_r0
.size   _dyn_lshift, .-_dyn_lshift
.align 1
.align 2
.global _dyn_rshift
.type   _dyn_rshift, @function
_dyn_rshift:
mov.l   .L29,r1
sts.l   pr,@-r15
jsr @r1
mov r5,r0
lds.l   @r15+,pr
rts
nop
.L30:
.align 2
.L29:
.long   ___lshrsi3_r0
.size   _dyn_rshift, .-_dyn_rshift
.align 1
.align 2
.global _really_dyn_lshift
.type   _really_dyn_lshift, @function
_really_dyn_lshift:
mov.l   .L33,r1
sts.l   pr,@-r15
jsr @r1
mov r5,r0
lds.l   @r15+,pr
rts
nop
.L34:
.align 2
.L33:
.long   ___ashlsi3_r0
.size   _really_dyn_lshift, .-_really_dyn_lshift
.align 1
.align 2
.global _really_dyn_rshift
.type   _really_dyn_rshift, @function
_really_dyn_rshift:
mov.l   .L37,r1
sts.l   pr,@-r15
jsr @r1
mov r5,r0
lds.l   @r15+,pr
rts
nop
.L38:
.align 2
.L37:
.long   ___lshrsi3_r0
.size   _really_dyn_rshift, .-_really_dyn_rshift
.ident  "GCC: (GNU) 12.3.0"

With patch disabled, -O2 -ml

$ cat f.s
.file   "f.c"
.text
.little
.text
.align 1
.align 2

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2023-05-24 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #46 from rguenther at suse dot de  ---
On Wed, 24 May 2023, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989
> 
> --- Comment #45 from Jakub Jelinek  ---
> Let's consider some simple testcase (where one doesn't really mix different
> _BitInt sizes etc.).
> _BitInt(512)
> foo (_BitInt(512) a, _BitInt(512) b, _BitInt(512) c, _BitInt(512) d)
> {
>   return (a + b) - (c + d);
> }
> With the patch, this now ICEs during expansion, because while we can handle
> copying of even the larger _BitInt vars, we don't handle (nor plan to) +/- 
> etc.
> during expansion for that, it would be in the earlier lowering pass.
> If I'd emit straight line code here, I suppose I could use
> BIT_FIELD_REFs/BIT_INSERT_EXPRs, but if I want loopy code, as you wrote 
> perhaps
> ARRAY_REF on VCE could work fine for the input operands, but dunno what to use
> for the
> result of the operation, forcing it into a VAR_DECL I'm afraid will mean we
> can't coalesce it much, the above would force the 2 + results and 1 - result
> into VAR_DECLs.
> Could we e.g. allow BIT_INSERT_EXPRs or have some new ref for this purpose to
> update a single limb in a BITTYPE_INT SSA_NAME?

I think for complex expressions that involve SSA temporaries the lowering
pass has to be more complex as well and gather as much of the expression
as possible so it can avoid _BitInt typed temporaries but instead create

 for (...)
  {
limb_t tem1 = a[i] + b[i];
limb_t tem2 = c[i] + d[i];
limb_t tem3 = tem1 - tem2;
res[i] = tem3;
  }

but yes, for the result you want to force a VAR_DECL (I suppose
DECL_RESULT for the above example will be one).  I'd probably avoid
rewriting user variables into SSA form and only have temporaries
created by gimplifications in SSA form.  You should be able to use
DECL_NOT_GIMPLE_REG_P to force this and make sure update-address-taken
leaves things this way unless, say, the user variable is only
initialized by a constant?

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2023-05-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #45 from Jakub Jelinek  ---
Let's consider some simple testcase (where one doesn't really mix different
_BitInt sizes etc.).
_BitInt(512)
foo (_BitInt(512) a, _BitInt(512) b, _BitInt(512) c, _BitInt(512) d)
{
  return (a + b) - (c + d);
}
With the patch, this now ICEs during expansion, because while we can handle
copying of even the larger _BitInt vars, we don't handle (nor plan to) +/- etc.
during expansion for that, it would be in the earlier lowering pass.
If I'd emit straight line code here, I suppose I could use
BIT_FIELD_REFs/BIT_INSERT_EXPRs, but if I want loopy code, as you wrote perhaps
ARRAY_REF on VCE could work fine for the input operands, but dunno what to use
for the
result of the operation, forcing it into a VAR_DECL I'm afraid will mean we
can't coalesce it much, the above would force the 2 + results and 1 - result
into VAR_DECLs.
Could we e.g. allow BIT_INSERT_EXPRs or have some new ref for this purpose to
update a single limb in a BITTYPE_INT SSA_NAME?

Now, looking what we do right now, detailed expand dump before emergency dump
shows:
Partition map

Partition 0 (_1 - 1 )
Partition 1 (_2 - 2 )
Partition 2 (_3 - 3 )
Partition 3 (a_4(D) - 4 )
Partition 4 (b_5(D) - 5 )
Partition 5 (c_6(D) - 6 )
Partition 6 (d_7(D) - 7 )
which I believe means it didn't actually coalesce anything at all.  For the
larger BITINT_TYPEs it will be very much desirable to coalesce as much as
possible, given that none of the default def SSA_NAMEs are really use I'd think
ideally we'd do
a += b
c += d
result = a - c

For at least multiplication/division and I assume conversions to/from floating
point (and decimal), we'll need some library calls.
One question is what ABI to use for them, whether to e.g. pass pointer to the
limbs
(and when -fbuilding-libgcc predefine macros on what mode is the limb mode,
whether the limbs are ordered from least significant to most or vice versa,
etc.) and in addition to that precision in bits for each argument and whether
it is zero or sign extended from that, so that we could e.g. handle more
efficiently
_BitInt(16384)
foo (unsigned _BitInt(2048) a, _BitInt(1024) b)
{
  return (_BitInt(16384) a) * b;
}
by passing e.g. _mulwhatever (, 16384, , 2048, , -1024)
where -1024 would mean 1024 bits sign extended, 2048 2048 bits zero extended,
result
is 16384 bits.  And for GIMPLE a question is how to express it before
expansion, whether
we use some ifn that is then lowered.

[Bug libstdc++/109921] c++17/floating_from_chars.cc: compile error: ‘from_chars_strtod’ was not declared in this scope

2023-05-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109921

--- Comment #1 from Jonathan Wakely  ---
The proposed change would result in ABI changes for some targets.

I think the correct fix is something more like this:

--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -64,7 +64,7 @@
 // strtold for __ieee128
 extern "C" __ieee128 __strtoieee128(const char*, char**);
 #elif __FLT128_MANT_DIG__ == 113 && __LDBL_MANT_DIG__ != 113 \
-  && defined(__GLIBC_PREREQ)
+  && defined(__GLIBC_PREREQ) && defined(USE_STRTOD_FOR_FROM_CHARS)
 #define USE_STRTOF128_FOR_FROM_CHARS 1
 extern "C" _Float128 __strtof128(const char*, char**)
   __asm ("strtof128")
@@ -77,10 +77,6 @@ extern "C" _Float128 __strtof128(const char*, char**)
 #if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \
 && __SIZE_WIDTH__ >= 32
 # define USE_LIB_FAST_FLOAT 1
-# if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__
-// No need to use strtold.
-#  undef USE_STRTOD_FOR_FROM_CHARS
-# endif
 #endif

 #if USE_LIB_FAST_FLOAT
@@ -1261,7 +1257,7 @@ from_chars_result
 from_chars(const char* first, const char* last, long double& value,
   chars_format fmt) noexcept
 {
-#if ! USE_STRTOD_FOR_FROM_CHARS
+#if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ || !defined
USE_STRTOD_FOR_FROM_CHARS
   // Either long double is the same as double, or we can't use strtold.
   // In the latter case, this might give an incorrect result (e.g. values
   // out of range of double give an error, even if they fit in long double).
@@ -1329,13 +1325,23 @@ _ZSt10from_charsPKcS0_RDF128_St12chars_format(const
char* first,
  __ieee128& value,
  chars_format fmt) noexcept
 __attribute__((alias ("_ZSt10from_charsPKcS0_Ru9__ieee128St12chars_format")));
-#elif defined(USE_STRTOF128_FOR_FROM_CHARS)
+#else
 from_chars_result
 from_chars(const char* first, const char* last, _Float128& value,
   chars_format fmt) noexcept
 {
+#ifdef USE_STRTOF128_FOR_FROM_CHARS
   // fast_float doesn't support IEEE binary128 format, but we can use strtold.
   return from_chars_strtod(first, last, value, fmt);
+#else
+  // Read a long double. This might give an incorrect result (e.g. values
+  // out of range of long double give an error, even if they fit in
_Float128).
+  long double ldbl_val;
+  auto res = std::from_chars(first, last, ldbl_val, fmt);
+  if (rec.ec == errc{})
+value = ldbl_val;
+  return res;
+#endif
 }
 #endif


We should not use strtof128 unless we can use strtod.

We should not #undef USE_STRTOD_FOR_FROM_CHARS on line 82 just because we don't
need it for long double, as we might still need it for _Float128.

[Bug target/109944] vector CTOR with byte elements and SSE2 has STLF fail

2023-05-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109944

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Richard Biener  ---
Summary is fixed now.  Any other changes require actual benchmarking I think.

[Bug target/109944] vector CTOR with byte elements and SSE2 has STLF fail

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109944

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:affee7dcfa1ee272d43ac7cb68cf423dbd956fd8

commit r14-1166-gaffee7dcfa1ee272d43ac7cb68cf423dbd956fd8
Author: Richard Biener 
Date:   Wed May 24 10:07:36 2023 +0200

target/109944 - avoid STLF fail for V16QImode CTOR expansion

The following dispatches to V2DImode CTOR expansion instead of
using sets of (subreg:DI (reg:V16QI 146) [08]) which causes
LRA to spill DImode and reload V16QImode.  The same applies for
V8QImode or V4HImode construction from SImode parts which happens
during 32bit libgcc build.

PR target/109944
* config/i386/i386-expand.cc (ix86_expand_vector_init_general):
Perform final vector composition using
ix86_expand_vector_init_general instead of setting
the highpart and lowpart which causes spilling.

* gcc.target/i386/pr109944-1.c: New testcase.
* gcc.target/i386/pr109944-2.c: Likewise.

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2023-05-24 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

--- Comment #44 from rguenther at suse dot de  ---
On Wed, 24 May 2023, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989
> 
> Jakub Jelinek  changed:
> 
>What|Removed |Added
> 
>   Attachment #55141|0   |1
> is obsolete||
> 
> --- Comment #43 from Jakub Jelinek  ---
> Created attachment 55148
>   --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55148=edit
> gcc14-bitint-wip.patch
> 
> Another update.  This version can emit _BitInt(N) values in non-automatic
> variable initializers, handles passing/returning _BitInt(N) and for N <= 64
> (i.e. what fits into a single limb) from what I can see handling it in GIMPLE
> passes and and even expansion/RTL seems to work.
> Now, as discussed earlier, for N > GET_MODE_PRECISION (limb_mode) I think we
> want to lower it in some pass in between IPA and vectorization.  For N which
> fits into DImode if limb is 32-bit (currently no target does that as we have
> just x86-64 support) or which fits into TImode for 64-bit if TImode is
> supported, I guess we want to map arithmetics
> to TImode arithmetics, for say 2-4x larger emit code for arithmetics (except
> perhaps multiplication/division) inline as straight line code and for even
> larger as loops.
> In the last case, a question is if we could use e.g. TARGET_MEM_REF for the
> variable offset in those loops on the vars even when they aren't
> TREE_ADDRESSABLE (but would force them into memory during expansion).

Note you should use TARGET_MEM_REF only when it describes the actual
addressing mode you want to use.  Otherwise just synthesize ARRAY_REFs
like ARRAY_REF , index> with
an appropriate VLA libm[] array type.

I'd do the lowering right before pass_complete_unrolli and generally
emit loopy form (another pass placement required in the -Og pipeline).

[Bug target/109955] Should be possible to remove vcond{,u,eq} expanders

2023-05-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109955

--- Comment #2 from Richard Biener  ---
One thing I see is

-(insn 11 10 15 2 (set (subreg:V16QI (reg:V2DI 83 [  ]) 0)
-(unspec:V16QI [
-(reg:V16QI 92)
-(reg:V16QI 91)
-(lt:V16QI (reg:V16QI 90)
-(const_vector:V16QI [
-(const_int 0 [0]) repeated x16
-]))
-] UNSPEC_BLENDV))
"/space/rguenther/src/gcc/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c":22:10
discrim 1 7431 {*sse4_1_pblendvb_lt}
 (nil)

vs

+(insn 8 5 9 2 (set (reg:V16QI 89)
+(const_vector:V16QI [
+(const_int -1 [0x]) repeated x16
+]))
"/spc/abuild/rguenther/obj-gcc-g/gcc/include/smmintrin.h":181:20 1838
{movv16qi_internal}
+ (nil))
+(insn 9 8 11 2 (set (reg:V16QI 90)
+(gt:V16QI (reg:V16QI 92)
+(reg:V16QI 89)))
"/spc/abuild/rguenther/obj-gcc-g/gcc/include/smmintrin.h":181:20 6749
{*sse2_gtv16qi3}
  (expr_list:REG_DEAD (reg:V16QI 92)
+(expr_list:REG_DEAD (reg:V16QI 89)
+(nil
+(note 11 9 12 2 NOTE_INSN_DELETED)
+(insn 12 11 16 2 (set (subreg:V16QI (reg:V2DI 84 [  ]) 0)
+(unspec:V16QI [
+(reg:V16QI 93)
+(reg:V16QI 94)
+(reg:V16QI 90)
+] UNSPEC_BLENDV))
"/space/rguenther/src/gcc/gcc/testsuite/gcc.target/i386/sse4_1-pr99908.c":22:10
discrim 1 7429 {sse4_1_pblendvb}
+ (expr_list:REG_DEAD (reg:V16QI 93)
+(expr_list:REG_DEAD (reg:V16QI 90)
+(expr_list:REG_DEAD (reg:V16QI 94)
 (nil)

after the combiner which seems to be a missing simplification of


(insn 8 5 9 2 (set (reg:V16QI 89)
(const_vector:V16QI [
(const_int -1 [0x]) repeated x16
]))
(insn 9 8 11 2 (set (reg:V16QI 90)
   (gt:V16QI (reg:V16QI 92)
(reg:V16QI 89)))

to

(lt:V16QI (reg:V16QI 90)
(const_vector:V16QI [
(const_int 0 [0]) repeated x16
])

Trying 8 -> 9:
8: r89:V16QI=const_vector
9: r90:V16QI=r92:V16QI>r89:V16QI
  REG_DEAD r92:V16QI
  REG_DEAD r89:V16QI
Failed to match this instruction:
(set (reg:V16QI 90)
(gt:V16QI (reg:V16QI 92)
(const_vector:V16QI [
(const_int -1 [0x]) repeated x16
])))

Trying 8, 9 -> 12:
8: r89:V16QI=const_vector
9: r90:V16QI=r92:V16QI>r89:V16QI
  REG_DEAD r92:V16QI
  REG_DEAD r89:V16QI
   12: r84:V2DI#0=unspec[r93:V16QI,r94:V16QI,r90:V16QI] 47
  REG_DEAD r93:V16QI
  REG_DEAD r90:V16QI
  REG_DEAD r94:V16QI
Failed to match this instruction:
(set (subreg:V16QI (reg:V2DI 84 [  ]) 0)
(unspec:V16QI [
(reg:V16QI 93)
(reg:V16QI 94) 
(gt:V16QI (reg:V16QI 92)
(const_vector:V16QI [
(const_int -1 [0x]) repeated x16
]))
] UNSPEC_BLENDV))

not sure if the lt is a standalone thing.  Maybe we just need a
define-insn-and-split for _gt as well.  All those seem to be somewhat
tuned to the exact way RTL expansion works when the vcond patterns are there.

Getting rid of vcond* (but not vcond_mask) would allow quite some
simplification
in middle-end code and the vectorizer.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #41 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:257c2be7ff8dfdc610202a1e1f5a8a668b939bdb

commit r14-1165-g257c2be7ff8dfdc610202a1e1f5a8a668b939bdb
Author: Andrew MacLeod 
Date:   Tue May 23 15:41:03 2023 -0400

Only update global value if it changes.

Do not update and propagate a global value if it hasn't changed.

PR tree-optimization/109695
* gimple-range-cache.cc (ranger_cache::get_global_range): Add
changed param.
* gimple-range-cache.h (ranger_cache::get_global_range): Ditto.
* gimple-range.cc (gimple_ranger::range_of_stmt): Pass changed
flag to set_global_range.
(gimple_ranger::prefill_stmt_dependencies): Ditto.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #40 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:cfd6569e9c41181231a8427235d0c0a7ad9262e4

commit r14-1164-gcfd6569e9c41181231a8427235d0c0a7ad9262e4
Author: Andrew MacLeod 
Date:   Tue May 23 15:20:56 2023 -0400

Use negative values to reflect always_current in the temporal cache.

Instead of using 0, use negative timestamps to reflect always_current
state.
If the value doesn't change, keep the timestamp rather than creating a new
one and invalidating any dependencies.

PR tree-optimization/109695
* gimple-range-cache.cc (temporal_cache::temporal_value): Return
a positive int.
(temporal_cache::current_p): Check always_current method.
(temporal_cache::set_always_current): Add param and set value
appropriately.
(temporal_cache::always_current_p): New.
(ranger_cache::get_global_range): Adjust.
(ranger_cache::set_global_range): set always current first.

[Bug tree-optimization/109695] [14 Regression] crash in gimple_ranger::range_of_expr since r14-377-gc92b8be9b52b7e

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #39 from CVS Commits  ---
The master branch has been updated by Andrew Macleod :

https://gcc.gnu.org/g:d8b058d3ca4ebbef5575105164417f125696f5ce

commit r14-1163-gd8b058d3ca4ebbef5575105164417f125696f5ce
Author: Andrew MacLeod 
Date:   Tue May 23 15:11:44 2023 -0400

Choose better initial values for ranger.

Instead of defaulting to VARYING, fold the stmt using just global ranges.

PR tree-optimization/109695
* gimple-range-cache.cc (ranger_cache::get_global_range): Call
fold_range with global query to choose an initial value.

[Bug fortran/109684] compiling failure: complaining about a final subroutine of a type being not PURE (while it is indeed PURE)

2023-05-24 Thread neil.n.carlson at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109684

--- Comment #8 from Neil Carlson  ---
We've been bitten by what looks to be the same bug in our large Fortran code:

  245 | end module kuprat_mapper_type
  | 1
Error: Contained procedure ‘__final_integer_set_type_wavl_Integer_set’ at (1)
of a PURE procedure must also be PURE

This one really had me baffled.  The kuprat_mapper type has no component (or
component of component) of the integer_set type, nor any pure procedures. At
most, some procedure associated with the kuprat_mapper type has a local
integer_set variable. In any event, the integer_set type does have a final
procedure and it is pure! What's more baffling is why this error occurred at
this point; the integer_set module compiled without error as did many other
module files that use it.  Note that the code compiles fine with the oneAPI
ifort and NAG compilers (and also with gfortran 12.2 and earlier).

I haven't attempted yet to try and pare things down to a reportable reproducer,
but if it would help I could try to do so.

[Bug libstdc++/109261] std::experimental::simd is not usable in several constant expressions

2023-05-24 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109261

--- Comment #11 from Christophe Lyon  ---
Thanks, trunk is now OK on both arm and aarch64.

[Bug target/109944] vector CTOR with byte elements and SSE2 has STLF fail

2023-05-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109944

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #5 from Alexander Monakov  ---
(In reply to Richard Biener from comment #3)
> so we're building SImode elements in %xmm regs and then
> unpack them - that's probably better than a series of
> pinsrw due to dependences.  For uarchs where grp->xmm
> moves are costly it might be better to do
> 
>   pxor %xmm0, %xmm0
>   pinsrw $0, (%rsi), %xmm0
>   pinsrw $1, 32(%rsi), %xmm0
> 
> though?

I'm afraid that is impossible, pinsrw will attempt to load 2 bytes, but only 1
is accessible (if at end of page).

[Bug target/109955] Should be possible to remove vcond{,u,eq} expanders

2023-05-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109955

--- Comment #1 from Richard Biener  ---
Created attachment 55149
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55149=edit
patch I tested

This is the patch I tested.  I have not yet investigated any of the FAILs.

Causes might be missing/differing vec_cmp or vcond_mask patterns or different
behavior of the vectorizer or RTL expander.

[Bug target/109955] New: Should be possible to remove vcond{,u,eq} expanders

2023-05-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109955

Bug ID: 109955
   Summary: Should be possible to remove vcond{,u,eq} expanders
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

It should be possible to remove all vcond, vcondu and vcondeq expanders and
have the functionality be implemented via the vec_cmp and vcond_mask expanders.
But when removing them a bootstrap & regtest reveals

=== g++ tests ===


Running target unix
FAIL: g++.target/i386/avx-pr54700-1.C   scan-assembler-not vpcmpgt[bdq]
FAIL: g++.target/i386/avx-pr54700-1.C   scan-assembler-times vblendvpd 4
FAIL: g++.target/i386/avx-pr54700-1.C   scan-assembler-times vblendvps 4
FAIL: g++.target/i386/avx-pr54700-1.C   scan-assembler-times vpblendvb 2
FAIL: g++.target/i386/avx2-pr54700-1.C   scan-assembler-not vpcmpgt[bdq]
FAIL: g++.target/i386/avx2-pr54700-1.C   scan-assembler-times vblendvpd 4
FAIL: g++.target/i386/avx2-pr54700-1.C   scan-assembler-times vblendvps 4
FAIL: g++.target/i386/avx2-pr54700-1.C   scan-assembler-times vpblendvb 2
FAIL: g++.target/i386/avx512fp16-vcondmn-minmax.C  -std=gnu++14 
scan-assembler-times vmaxph 3
FAIL: g++.target/i386/avx512fp16-vcondmn-minmax.C  -std=gnu++14 
scan-assembler-times vminph 3
FAIL: g++.target/i386/pr100738-1.C  -std=gnu++14  scan-assembler-not vpcmpeqd[
t]
FAIL: g++.target/i386/pr100738-1.C  -std=gnu++14  scan-assembler-not vpxor[
t]
FAIL: g++.target/i386/pr100738-1.C  -std=gnu++14  scan-assembler-times
vblendvps[ t] 2
FAIL: g++.target/i386/sse4_1-pr54700-1.C   scan-assembler-not pcmpgt[bdq]
FAIL: g++.target/i386/sse4_1-pr54700-1.C   scan-assembler-times blendvpd 4
FAIL: g++.target/i386/sse4_1-pr54700-1.C   scan-assembler-times blendvps 4
FAIL: g++.target/i386/sse4_1-pr54700-1.C   scan-assembler-times pblendvb 2

=== gcc tests ===


Running target unix
FAIL: gcc.dg/vect/pr109011-3.c -flto -ffat-lto-objects  scan-tree-dump-times
optimized " = .POPCOUNT (vect" 3
FAIL: gcc.dg/vect/pr109011-3.c scan-tree-dump-times optimized " = .POPCOUNT
(vect" 3
FAIL: gcc.dg/vect/pr109011-5.c -flto -ffat-lto-objects  scan-tree-dump-times
optimized " = .POPCOUNT (vect" 3
FAIL: gcc.dg/vect/pr109011-5.c scan-tree-dump-times optimized " = .POPCOUNT
(vect" 3
FAIL: gcc.target/i386/avx2-pr99908.c scan-assembler-not \\tvpcmpeq
FAIL: gcc.target/i386/avx512bw-pr96891-1.c scan-assembler-not %k[0-7]
FAIL: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-not %k[0-9]
FAIL: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsb[\\t ] 2
FAIL: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsd[\\t ] 2
FAIL: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsq[\\t ] 2
FAIL: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminsw[\\t ] 2
FAIL: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminub[\\t ] 2
FAIL: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminud[\\t ] 2
FAIL: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminuq[\\t ] 2
FAIL: gcc.target/i386/avx512vl-pr88547-1.c scan-assembler-times vpminuw[\\t ] 2
FAIL: gcc.target/i386/pr109011-b1.c scan-assembler-times vpopcntb[ \\t]+ 4
FAIL: gcc.target/i386/pr109011-w1.c scan-assembler-times vpopcntw[ \\t]+ 4
FAIL: gcc.target/i386/sse4_1-pr99908.c scan-assembler-not \\tpcmpeq

[Bug libstdc++/109889] [13/14 Regression] Segfault in __run_exit_handlers since r13-5309-gc3c6c307792026

2023-05-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109889

--- Comment #12 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #0)
> I see this on power 9 fedora 37 (glibc-2.36) but not on power 8 centos 7.9
> (glibc-2.17).

Also seen on power 9 rhel 9 (glibc-2.34-60.el9.ppc64le)

Not reproduced on Fedora 38 (glibc-2.37-4.fc38.ppc64le) for power 8 or power 9.

[Bug target/109949] new test case experimental/simd/pr109261_constexpr_simd.cc in r12-9647-g3acbaf1b253215 fails

2023-05-24 Thread mkretz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109949

--- Comment #4 from Matthias Kretz (Vir)  ---
With -mcpu=power10 I see the issue. The problem has been there all the time and
only surfaced with this test. (It should also have shown on `make check-simd`
in libstdc++.)

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-24 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #36 from Oleg Endo  ---
(In reply to Alexander Klepikov from comment #35)
> 
> As I understand, you meant the following (I added new functions at the end
> of file):
> 
> $ cat f.c
> #define ADDR 0x
> #define P ((unsigned char *)ADDR)
> #define FLAG 0x40
> #define S 7
> 

Yes, that's what I meant, thanks.

Can you also compile for little endian, and most of all, use -O2 optimization
level.  Some optimizations are not done below -O2.

> 
> I choose that name because I wanted to disable dynamic shift instructions
> for all CPUs. I did not hope that it will affect SH-2E code in such way.
> 
> I can rewrite the patch so that it only affects CPUs that do not support
> dynamic shifts and disables library call for dynamic shifts. I'll do it
> anyway because I need it badly. How do you think, what name of option would
> be better: '-mdisable-dynshift-libcall' or '-mhw-shift'? Or if you want,
> please suggest another one. Thank you!

'-mdisable-dynshift-libcall' would be more appropriate for what it tries to do,
I think.  Although that is a whole different issue ... but what is it going to
do for real dynamic shifts on SH2?

What kind of code is it supposed to emit for things like

unsigned int dyn_shift (unsigned int x, unsigned int y)
{
  return x << (y & 31);
}

[Bug c/102989] Implement C2x's n2763 (_BitInt)

2023-05-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #55141|0   |1
is obsolete||

--- Comment #43 from Jakub Jelinek  ---
Created attachment 55148
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55148=edit
gcc14-bitint-wip.patch

Another update.  This version can emit _BitInt(N) values in non-automatic
variable initializers, handles passing/returning _BitInt(N) and for N <= 64
(i.e. what fits into a single limb) from what I can see handling it in GIMPLE
passes and and even expansion/RTL seems to work.
Now, as discussed earlier, for N > GET_MODE_PRECISION (limb_mode) I think we
want to lower it in some pass in between IPA and vectorization.  For N which
fits into DImode if limb is 32-bit (currently no target does that as we have
just x86-64 support) or which fits into TImode for 64-bit if TImode is
supported, I guess we want to map arithmetics
to TImode arithmetics, for say 2-4x larger emit code for arithmetics (except
perhaps multiplication/division) inline as straight line code and for even
larger as loops.
In the last case, a question is if we could use e.g. TARGET_MEM_REF for the
variable offset in those loops on the vars even when they aren't
TREE_ADDRESSABLE (but would force them into memory during expansion).

[Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction

2023-05-24 Thread klepikov.alex+bugs at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263

--- Comment #35 from Alexander Klepikov  
---
(In reply to Oleg Endo from comment #34)
> Bit-tests of char and unsigned char should be covered by the test-suite and
> should work -- at least originally.  However, what might be triggering this
> problem is the '== FLAG' comparison.  When I was working on this issue I
> only used '== 0' or '!= 0' comparison.  I can imagine that your test code
> triggers some other middle end optimizations and hence we get this.

Yes, I am sure that the problem is the '== FLAG' comparison. Before I reported
that bug, I tried to bypass it and this macro does not produce shift
instructions even on GCC 4.7:

#define BIT_MASK_IS_SET_(VALUE, BITMASK)\
({int _value = VALUE & BITMASK,\
_result;\
if (_value == BITMASK){\
_result = 1;\
}\
else {\
_result = 0;\
}\
_result;})

So this is definitely the comparison.

> 
> Can you try to rewrite your test code to something like this?
> 
> unsigned int f(char v){
> return (v & FLAG) != 0;
> }
> 
> ... and see if it generates the tst instruction as expected?
> 

As I understand, you meant the following (I added new functions at the end of
file):

$ cat f.c
#define ADDR 0x
#define P ((unsigned char *)ADDR)
#define FLAG 0x40
#define S 7

unsigned char f_char_var(char v){
return (v & FLAG) == FLAG;
}

unsigned char f_unsigned_char_var(unsigned char v){
return (v & FLAG) == FLAG;
}

unsigned char f_symbol(void){
return (*P & FLAG) == FLAG;
}

unsigned char f_symbol_zero(void){
return (*P & FLAG) == 0;
}

unsigned char f_symbol_non_zero(void){
return (*P & FLAG) != 0;
}

Compiler flags: -c -mrenesas -m2e -mb -O -fno-toplevel-reorder

With patch disabled:

$ cat f_clean.s
.file   "f.c"
.text
.text
.align 1
.global _f_char_var
.type   _f_char_var, @function
_f_char_var:
sts.l   pr,@-r15
mov.l   .L3,r1
jsr @r1
exts.b  r4,r4
mov r4,r0
and #1,r0
lds.l   @r15+,pr
rts
nop
.L4:
.align 2
.L3:
.long   ___ashiftrt_r4_6
.size   _f_char_var, .-_f_char_var
.align 1
.global _f_unsigned_char_var
.type   _f_unsigned_char_var, @function
_f_unsigned_char_var:
sts.l   pr,@-r15
mov.l   .L7,r1
jsr @r1
exts.b  r4,r4
mov r4,r0
and #1,r0
lds.l   @r15+,pr
rts
nop
.L8:
.align 2
.L7:
.long   ___ashiftrt_r4_6
.size   _f_unsigned_char_var, .-_f_unsigned_char_var
.align 1
.global _f_symbol
.type   _f_symbol, @function
_f_symbol:
sts.l   pr,@-r15
mov.l   .L11,r1
mov.b   @r1,r4
mov.l   .L12,r1
jsr @r1
nop
mov r4,r0
and #1,r0
lds.l   @r15+,pr
rts
nop
.L13:
.align 2
.L11:
.long   -65536
.L12:
.long   ___ashiftrt_r4_6
.size   _f_symbol, .-_f_symbol
.align 1
.global _f_symbol_zero
.type   _f_symbol_zero, @function
_f_symbol_zero:
mov.l   .L15,r1
mov.b   @r1,r0
tst #64,r0
rts
movtr0
.L16:
.align 2
.L15:
.long   -65536
.size   _f_symbol_zero, .-_f_symbol_zero
.align 1
.global _f_symbol_non_zero
.type   _f_symbol_non_zero, @function
_f_symbol_non_zero:
sts.l   pr,@-r15
mov.l   .L19,r1
mov.b   @r1,r4
mov.l   .L20,r1
jsr @r1
nop
mov r4,r0
and #1,r0
lds.l   @r15+,pr
rts
nop
.L21:
.align 2
.L19:
.long   -65536
.L20:
.long   ___ashiftrt_r4_6
.size   _f_symbol_non_zero, .-_f_symbol_non_zero
.ident  "GCC: (GNU) 12.3.0"

With patch enabled:

$ cat f.s
.file   "f.c"
.text
.text
.align 1
.global _f_char_var
.type   _f_char_var, @function
_f_char_var:
mov r4,r0
tst #64,r0
mov #-1,r0
rts
negcr0,r0
.size   _f_char_var, .-_f_char_var
.align 1
.global _f_unsigned_char_var
.type   _f_unsigned_char_var, @function
_f_unsigned_char_var:
mov r4,r0
tst #64,r0
mov #-1,r0
rts
negcr0,r0
.size   _f_unsigned_char_var, .-_f_unsigned_char_var
.align 1
.global _f_symbol
.type   _f_symbol, @function
_f_symbol:
mov.l   .L4,r1
mov.b   @r1,r0
tst #64,r0
mov #-1,r0
rts
negcr0,r0
.L5:
.align 2
.L4:
.long   -65536
.size   _f_symbol, .-_f_symbol
.align 1
.global _f_symbol_zero
.type   _f_symbol_zero, @function
_f_symbol_zero:

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:5476de2618ffb77f3a52e59e2c9f10b018329689

commit r14-1161-g5476de2618ffb77f3a52e59e2c9f10b018329689
Author: Richard Biener 
Date:   Wed May 24 12:36:28 2023 +0200

tree-optimization/109849 - fix fallout of PRE hoisting change

The PR109849 fix made us no longer hoist some memory loads because
of the expression set intersection.  We can still avoid to compute
the union by simply taking the first sets expressions and leave
the pruning of expressions with values not suitable for hoisting
to sorted_array_from_bitmap_set.

PR tree-optimization/109849
* tree-ssa-pre.cc (do_hoist_insertion): Do not intersect
expressions but take the first sets.

* gcc.dg/tree-ssa/ssa-hoist-9.c: New testcase.

[Bug libstdc++/109921] c++17/floating_from_chars.cc: compile error: ‘from_chars_strtod’ was not declared in this scope

2023-05-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109921

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed||2023-05-24
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

[Bug rtl-optimization/101188] [AVR] Miscompilation and function pointers

2023-05-24 Thread gjl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101188

--- Comment #5 from Georg-Johann Lay  ---
It happens in postreload.cc::reload_cse_move2add() when


(insn 45 16 17 2 (set (reg/f:HI 30 r30 [60])
(reg/v/f:HI 16 r16 [orig:51 self ] [51])) "fail1.c":29:9 101
{*movhi_split} (nil))
(insn 17 45 18 2 (parallel [
(set (reg/f:HI 30 r30 [60])
(plus:HI (reg/f:HI 30 r30 [60])
(const_int 66 [0x42])))
(clobber (scratch:QI))
]) "fail1.c":29:9 175 {addhi3_clobber} (nil))

is transformed to:

(insn 17 16 18 2 (set (reg/f:HI 30 r30 [60])
(plus:HI (reg/f:HI 30 r30 [60])
(const_int 2 [0x2]))) "fail1.c":29:9 165 {*addhi3_split} (nil))

The wrong setting of "success" is in postreload.cc:2028 as of the following, so
the condition that leads to there is bogus.

https://gcc.gnu.org/git/?p=gcc.git;a=blame;f=gcc/postreload.cc;h=fb392651e1b6a60e12bf3d36bc302bf9be8bc608;hb=03c7c418baa01f0642817bc9b44192d134102aa9#l2028

[Bug libstdc++/109261] std::experimental::simd is not usable in several constant expressions

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109261

--- Comment #10 from CVS Commits  ---
The master branch has been updated by Matthias Kretz :

https://gcc.gnu.org/g:aa8b363171a95b8f867a74f29c75f9577e9087e1

commit r14-1160-gaa8b363171a95b8f867a74f29c75f9577e9087e1
Author: Matthias Kretz 
Date:   Wed May 24 12:50:46 2023 +0200

libstdc++: Fix SFINAE for __is_intrinsic_type on ARM

On ARM NEON doesn't support double, so __is_intrinsic_type_v should say false (instead of being ill-formed).

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/109261
* include/experimental/bits/simd.h (__intrinsic_type):
Specialize __intrinsic_type and
__intrinsic_type in any case, but provide the member
type only with __aarch64__.

[Bug libstdc++/109261] std::experimental::simd is not usable in several constant expressions

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109261

--- Comment #9 from CVS Commits  ---
The master branch has been updated by Matthias Kretz :

https://gcc.gnu.org/g:b0a483b0a011f9cbc8b25053eae809c77dae2a12

commit r14-1159-gb0a483b0a011f9cbc8b25053eae809c77dae2a12
Author: Matthias Kretz 
Date:   Tue May 23 23:48:49 2023 +0200

libstdc++: Add missing constexpr to simd_neon

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/109261
* include/experimental/bits/simd_neon.h (_S_reduce): Add
constexpr and make NEON implementation conditional on
not __builtin_is_constant_evaluated.

[Bug rtl-optimization/109940] [14 Regression] ICE in decide_candidate_validity since g:53dddbfeb213ac4ec39f550aa81eaa4264375d2c

2023-05-24 Thread peter.waller at arm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109940

--- Comment #7 from Peter Waller  ---
I can confirm that the original (not reduced) program no longer hits an ICE
with 
ee2a8b373a88bae4c533aa68bed56bf01afea0e2 (but does with the parent commit).
Thanks.

[Bug testsuite/109951] [14 Regression] libgomp, testsuite: non-native multilib c++ tests fail on Darwin.

2023-05-24 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109951

--- Comment #2 from Iain Sandoe  ---
OK so the best bracket I've been able to get without doing surgery to make a
branch with a back port for the bootstrap break;

r14-803-g20ca33db817cec OK
r14-857-g30adfb85ff994c NOT OK,

My analysis could well also be flawed:
 * perhaps the bug is actually that GXX_UNDER_TEST should not contain
multi-lib-specific paths.
 * also maybe the include paths are not problematical - the issue might be
limited to the -L ones.

[Bug modula2/109952] Inconsistent HIGH values with 'ARRAY OF CHAR'

2023-05-24 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109952

Gaius Mulley  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Gaius Mulley  ---
Closing as patch has been applied.

[Bug modula2/109952] Inconsistent HIGH values with 'ARRAY OF CHAR'

2023-05-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109952

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Gaius Mulley :

https://gcc.gnu.org/g:b4df098647b687ca4e43952ec4a198b2816732ba

commit r14-1158-gb4df098647b687ca4e43952ec4a198b2816732ba
Author: Gaius Mulley 
Date:   Wed May 24 11:14:07 2023 +0100

PR modula2/109952 Inconsistent HIGH values with 'ARRAY OF CHAR'

This patch fixes the case when a single character constant literal is
passed as a string actual parameter to an ARRAY OF CHAR formal parameter.
To be consistent a single character is promoted to a string and nul
terminated (and its high value is 1).  Previously a single character
string would not be nul terminated and the high value was 0.
The documentation now includes a section describing the expected behavior
and included in this patch is some regression test code matching the
table inside the documentation.

gcc/ChangeLog:

PR modula2/109952
* doc/gm2.texi (High procedure function): New node.
(Using): New menu entry for High procedure function.

gcc/m2/ChangeLog:

PR modula2/109952
* Make-maintainer.in: Change header to include emacs file mode.
* gm2-compiler/M2GenGCC.mod (BuildHighFromChar): Check whether
operand is a constant string and is nul terminated then return one.
* gm2-compiler/PCSymBuild.mod (WalkFunction): Add default return
TRUE.  Static analysis missing return path fix.
* gm2-libs/IO.mod (Init): Rewrite to help static analysis.
* target-independent/m2/gm2-libs.texi: Rebuild.

gcc/testsuite/ChangeLog:

PR modula2/109952
* gm2/pim/run/pass/hightests.mod: New test.

Signed-off-by: Gaius Mulley 

[Bug fortran/109684] compiling failure: complaining about a final subroutine of a type being not PURE (while it is indeed PURE)

2023-05-24 Thread trnka at scm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109684

--- Comment #7 from Tomáš Trnka  ---
(In reply to Paul Thomas from comment #5)
> Created attachment 55144 [details]
> Fix for this PR
> 
> Thanks for reporting this. The patch "fingered" in comment #4 is certainly
> responsible for this regression. In particular, it is the first chunk in
> resolve.cc that is the culprit.
> 
> The attached patch feels to be a bit of sticking plaster on top of sticking
> plaster and so I will go back to hunt down the root cause of these
> namespace-less symbols.

Thanks for the patch. It seems to mostly do the trick for our huge proprietary
F2008 codebase, but some files ultimately fail to compile with the following
error (not sure if related or a different bug):

in gfc_format_decoder, at fortran/error.cc:1078
0xb01b5a gfc_format_decoder
../../gcc/fortran/error.cc:1078
0x1594c0c pp_format(pretty_printer*, text_info*)
../../gcc/pretty-print.cc:1475
0x10f0c5e diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*)
../../gcc/diagnostic.cc:1592
0x1789c5d gfc_report_diagnostic
../../gcc/fortran/error.cc:890
0x1789c5d gfc_warning
../../gcc/fortran/error.cc:923
0x1789da7 gfc_warning(int, char const*, ...)
../../gcc/fortran/error.cc:954
0x1852c41 resolve_procedure_expression
../../gcc/fortran/resolve.cc:1957
0x1852c41 resolve_variable
../../gcc/fortran/resolve.cc:6030
0x1852c41 gfc_resolve_expr(gfc_expr*)
../../gcc/fortran/resolve.cc:7266
0x1806880 gfc_resolve_expr(gfc_expr*)
../../gcc/fortran/resolve.cc:7231
0x1806880 resolve_structure_cons
../../gcc/fortran/resolve.cc:1341
0x1858969 resolve_values
../../gcc/fortran/resolve.cc:12771
0x1869492 do_traverse_symtree
../../gcc/fortran/symbol.cc:4190
0x185b02f gfc_traverse_ns(gfc_namespace*, void (*)(gfc_symbol*))
../../gcc/fortran/symbol.cc:4215
0x185b02f resolve_types
../../gcc/fortran/resolve.cc:17899
0x184cf93 gfc_resolve(gfc_namespace*)
../../gcc/fortran/resolve.cc:17996
0x184fb47 resolve_symbol
../../gcc/fortran/resolve.cc:16567
0x1869492 do_traverse_symtree
../../gcc/fortran/symbol.cc:4190
0x185aee0 gfc_traverse_ns(gfc_namespace*, void (*)(gfc_symbol*))
../../gcc/fortran/symbol.cc:4215
0x185aee0 resolve_types
../../gcc/fortran/resolve.cc:17880


This seems to be the following assert:

gcc_assert (loc->nextc - loc->lb->line >= 0);

The backtrace I get from gdb is a little different (there's no
resolve_structure_cons in it, for example; I guess that it might be due to
LTO):

#0  gfc_warning (opt=0,
gmsgid=0x1e55748 "Non-RECURSIVE procedure %qs at %L is possibly calling
itself recursively.  Declare it RECURSIVE or use %<-frecursive%>")
at ../../gcc/fortran/error.cc:950
#1  0x01852c42 in resolve_procedure_expression (expr=0x2aefc80) at
../../gcc/fortran/resolve.cc:1957
#2  resolve_variable (e=0x2aefc80) at ../../gcc/fortran/resolve.cc:6030
#3  gfc_resolve_expr (e=0x2aefc80) at ../../gcc/fortran/resolve.cc:7266
#4  0x01806881 in gfc_resolve_expr (e=0x2aefc80) at
../../gcc/fortran/resolve.cc:7231
#5  resolve_structure_cons (expr=, init=1) at
../../gcc/fortran/resolve.cc:1341
#6  0x0185896a in resolve_values (sym=0x2ad30c0) at
../../gcc/fortran/resolve.cc:12771
#7  0x01869493 in do_traverse_symtree (st=, st_func=0x0,
sym_func=0x1858900 )
at ../../gcc/fortran/symbol.cc:4190
#8  0x0185b030 in gfc_traverse_ns (sym_func=0x1858900
, ns=0x3ae65e0) at
../../gcc/fortran/symbol.cc:4215
#9  resolve_types (ns=0x3ae65e0) at ../../gcc/fortran/resolve.cc:17899
#10 0x0184cf94 in gfc_resolve (ns=0x3ae65e0) at
../../gcc/fortran/resolve.cc:17996
#11 0x0184d022 in gfc_resolve (ns=) at
../../gcc/fortran/resolve.cc:17983
#12 0x0184fb48 in resolve_symbol (sym=) at
../../gcc/fortran/resolve.cc:16567
#13 0x01869493 in do_traverse_symtree (st=, st_func=0x0,
sym_func=0x184d030 )
at ../../gcc/fortran/symbol.cc:4190
#14 0x0185aee1 in gfc_traverse_ns (sym_func=0x184d030
, ns=0x3697bb0) at
../../gcc/fortran/symbol.cc:4215
#15 resolve_types (ns=0x3697bb0) at ../../gcc/fortran/resolve.cc:17880
#16 0x0184cf94 in gfc_resolve (ns=0x3697bb0) at
../../gcc/fortran/resolve.cc:17996
#17 0x0184d022 in gfc_resolve (ns=) at
../../gcc/fortran/resolve.cc:17983
#18 0x0184fb48 in resolve_symbol (sym=) at
../../gcc/fortran/resolve.cc:16567
#19 0x01869493 in do_traverse_symtree (st=, st_func=0x0,
sym_func=0x184d030 )
at ../../gcc/fortran/symbol.cc:4190
#20 0x0185aee1 in gfc_traverse_ns (sym_func=0x184d030
, ns=0x3238a50) at
../../gcc/fortran/symbol.cc:4215
#21 resolve_types (ns=0x3238a50) at ../../gcc/fortran/resolve.cc:17880
#22 0x0184cf94 in gfc_resolve (ns=0x3238a50) at
../../gcc/fortran/resolve.cc:17996
#23 0x0184d022 in gfc_resolve (ns=) at
../../gcc/fortran/resolve.cc:17983
#24 0x0184fb48 in 

[Bug tree-optimization/109945] Escape analysis hates copy elision: different result with -O1 vs -O2

2023-05-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109945

--- Comment #19 from Richard Biener  ---
(In reply to Richard Biener from comment #13)
> (In reply to Richard Biener from comment #12)
> > For the fun of it I'm testing
> > 
> > diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
> > index 56021c59cb9..1e7f0383371 100644
> > --- a/gcc/tree-ssa-structalias.cc
> > +++ b/gcc/tree-ssa-structalias.cc
> > @@ -4366,7 +4366,7 @@ handle_rhs_call (gcall *stmt, vec *results,
> >/* And if we applied NRV the address of the return slot escapes as well. 
> > */
> >if (gimple_call_return_slot_opt_p (stmt)
> >&& gimple_call_lhs (stmt) != NULL_TREE
> > -  && TREE_ADDRESSABLE (TREE_TYPE (gimple_call_lhs (stmt
> > +  && aggregate_value_p (gimple_call_lhs (stmt), gimple_call_fntype
> > (stmt)))
> >  {
> >int flags = gimple_call_retslot_flags (stmt);
> >const int relevant_flags = EAF_NO_DIRECT_ESCAPE
> 
> FAIL: g++.dg/opt/pr91838.C  -std=c++14  scan-assembler
> pxors+%xmm0,s+%xmm0

The above is present even without the patch so it's only the tailcall
testcase that's fallout.

> FAIL: gcc.dg/tree-ssa/tailcall-7.c scan-tree-dump-times tailc "Found tail
> call" 5
> 
> with the former for -m64 and the latter for -m32 only seems to be the
> only fallout here.

[Bug target/109954] x86-64's -m32 does not conform to documentation

2023-05-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109954

--- Comment #8 from Jonathan Wakely  ---
Yeah, my suggestion doesn't try to explain the full details that you pointed
out, just adds a brief note to avoid the pitfall of not overriding the default
arch, for a probably quite common case.

I chose i486 to avoid any confusion that could arise from -march=i386 being
interpreted as "any generic x86" system. IMHO i486 seems more obviously "a
specific CPU family" and not just a generic term for x86, e.g., Debian still
uses "i386" even though their x86 packages are actually built for i686.

  1   2   >