[Bug target/27619] wrong code for mixed-mode division with -mpowerpc64 -O1

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27619

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|INVALID |MOVED
   See Also||https://sourceware.org/bugz
   ||illa/show_bug.cgi?id=14758

[Bug target/27619] wrong code for mixed-mode division with -mpowerpc64 -O1

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27619

Andrew Pinski  changed:

   What|Removed |Added

 CC||vincent-gcc at vinc17 dot net

--- Comment #19 from Andrew Pinski  ---
*** Bug 58429 has been marked as a duplicate of this bug. ***

[Bug target/58429] _Decimal64 support is broken on powerpc64 with the mode32 ABI (-m32 -mpowerpc64)

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58429

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
This is most likely a dup of bug 27619 which was a bug in binutils.

*** This bug has been marked as a duplicate of bug 27619 ***

[Bug testsuite/101902] [12 regression] g++.dg/warn/uninit-1.C has excess errors after r12-2898

2021-08-15 Thread hubicka at ucw dot cz via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101902

--- Comment #1 from Jan Hubicka  ---
Hi,
i am testing

diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index 5d7bc800419..d89ab5423cd 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -641,7 +641,7 @@ maybe_warn_pass_by_reference (gcall *stmt, wlimits )
wlims.always_executed = false;

   /* Ignore args we are not going to read from.  */
-  if (gimple_call_arg_flags (stmt, argno - 1) & EAF_UNUSED)
+  if (gimple_call_arg_flags (stmt, argno - 1) & (EAF_UNUSED | EAF_NOREAD))
continue;

   tree arg = gimple_call_arg (stmt, argno - 1);

[Bug bootstrap/53468] debian/ubuntu changed the location of libraries on the disk which broke bootstrap

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53468

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |4.8.0
   Keywords||build

--- Comment #4 from Andrew Pinski  ---
Fixed a long time ago by r0-120287.

[Bug target/101929] r12-2549 regress x264_r by 4% on CLX.

2021-08-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929

--- Comment #2 from Hongtao.liu  ---
W/o accurate info provided by vectorizer, the backend can do nothing about this
regression except reverting the patch, that's why i marked the bugzilla ad
tree-optimization component.

[Bug tree-optimization/101929] r12-2549 regress x264_r by 4% on CLX.

2021-08-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929

--- Comment #1 from Hongtao.liu  ---
Considering this, I'm debating whether to revert my patch.

[Bug tree-optimization/101929] New: r12-2549 regress x264_r by 4% on CLX.

2021-08-15 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929

Bug ID: 101929
   Summary: r12-2549 regress x264_r by 4% on CLX.
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: crazylht at gmail dot com
CC: hjl.tools at gmail dot com, wwwhhhyyy333 at gmail dot com
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: x86_64-*-* i?86-*-*

The regression is in x264_pixel_satd_8x4

typedef unsigned char uint8_t;
typedef unsigned int uint32_t;
typedef unsigned short uint16_t;

// in: a pseudo-simd number of the form x+(y<<16)
// return: abs(x)+(abs(y)<<16)
static inline
uint32_t abs2( uint32_t a )
{
uint32_t s = ((a>>15)&0x10001)*0x;
return (a+s)^s;
}

#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\
int t0 = s0 + s1;\
int t1 = s0 - s1;\
int t2 = s2 + s3;\
int t3 = s2 - s3;\
d0 = t0 + t2;\
d2 = t0 - t2;\
d1 = t1 + t3;\
d3 = t1 - t3;\
}

int
x264_pixel_satd_8x4( uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2 )
{
uint32_t tmp[4][4];
uint32_t a0, a1, a2, a3;
int sum = 0;
for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 )
{
a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16);
a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16);
a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16);
a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16);
HADAMARD4( tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0,a1,a2,a3 );
}
for( int i = 0; i < 4; i++ )
{
HADAMARD4( a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i]
);
sum += abs2(a0) + abs2(a1) + abs2(a2) + abs2(a3);
}
return (((uint16_t)sum) + ((uint32_t)sum>>16)) >> 1;
}

after increase cost of vector CTOR, slp1 won't vector for below
git diff my.slp1 original.slp1

-  _820 = {_187, _189, _187, _189};
-  vect_t2_188.65_821 = VIEW_CONVERT_EXPR(_820);
-  vect__200.67_823 = vect_t0_184.64_819 - vect_t2_188.65_821;
-  vect__191.66_822 = vect_t0_184.64_819 + vect_t2_188.65_821;
-  _824 = VEC_PERM_EXPR ;
-  vect__192.68_825 = VIEW_CONVERT_EXPR(_824);
   t3_190 = (int) _189;
   _191 = t0_184 + t2_188;
   _192 = (unsigned int) _191;
+  tmp[0][0] = _192;
   _194 = t0_184 - t2_188;
   _195 = (unsigned int) _194;
+  tmp[0][2] = _195;
   _197 = t1_186 + t3_190;
   _198 = (unsigned int) _197;
+  tmp[0][1] = _198;
   _200 = t1_186 - t3_190;
   _201 = (unsigned int) _200;
-  MEM  [(unsigned int *)] = vect__192.68_825;
+  tmp[0][3] = _201;

but the vectorized version can somehow help fre to eliminate redundant vector
load and then got even better performace.

git diff dump.veclower21 dump.fre5

   MEM  [(unsigned int *) + 48B] = vect__54.89_852;
-  vect__63.9_482 = MEM  [(unsigned int *)];
-  vect__64.12_478 = MEM  [(unsigned int *) + 16B];
-  vect__65.13_477 = vect__63.9_482 + vect__64.12_478;
+  vect__65.13_477 = vect__192.68_825 + vect__273.75_834;
   vect_t0_100.14_476 = VIEW_CONVERT_EXPR(vect__65.13_477);
-  vect__67.15_475 = vect__63.9_482 - vect__64.12_478;
+  vect__67.15_475 = vect__192.68_825 - vect__273.75_834;
   vect_t1_101.16_474 = VIEW_CONVERT_EXPR(vect__67.15_475);
-  vect__68.19_470 = MEM  [(unsigned int *) + 32B];
-  vect__69.22_466 = MEM  [(unsigned int *) + 48B];
-  vect__70.23_465 = vect__68.19_470 + vect__69.22_466;
+  vect__70.23_465 = vect__354.82_843 + vect__54.89_852;

If slp1 can realize this and add the upper part to comparison of scalar cost vs
vector cost, gcc should do vectorization, but currently it doesn't.

[Bug debug/101928] New: Incorrect argument list for varardic functions

2021-08-15 Thread liyd2021 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101928

Bug ID: 101928
   Summary: Incorrect argument list for varardic functions
   Product: gcc
   Version: 11.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: liyd2021 at gmail dot com
  Target Milestone: ---

Affected versions: gcc 11.1.0 with gdb (Ubuntu 20.04.2)

(terminal) $ cat simple.c && gcc -g -O2 simple.c
static void varargs(int q0, int q1, ...) {
  va_list ap;
  va_start(ap, q1);
}
int main() {
  varargs(0, 1, 2);
}



(terminal) $ cat run.gdb
b varargs
r
ptype varargs
q

(terminal) $ gdb -x run.gdb a.out
Breakpoint 1, varargs (q0=0, q1=1, q1=1, q0=0) at simple.c:2
2   static void varargs(int q0, int q1, ...) {
type = void (int, int, int, int)  <-- BUG, duplicated arguments



Compile with O0/Og will not trigger this behavior. The static for `varargs` is
also required. LLDB rejected this debug info:

(terminal) $ lldb a.out
(lldb) b varargs 
error: simple {0x000c}: DIE has DW_AT_ranges(0xc) attribute, but range
extraction failed (missing or invalid range list table), please file a bug and
attach the file at the start of this error message

[Bug target/101927] New: There is no vector mode popcount for aarch64

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101927

Bug ID: 101927
   Summary: There is no vector mode popcount for aarch64
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

Take:

#include 
#include 

size_t hd (const uint8_t *restrict a, const uint8_t *restrict b, size_t l) {
  size_t r = 0, x;
  for (x = 0; x < l; x++)
r += __builtin_popcount (a[x] ^ b[x]);

  return r;
}

at -O3 we don't vectorize this.
Clang/LLVM does:
.LBB0_5:// =>This Inner Loop Header: Depth=1
ld1 { v3.b }[0], [x8]
sub x12, x8, #2
ld1 { v5.b }[0], [x10]
ld1 { v4.b }[0], [x12]
sub x12, x10, #2
ld1 { v6.b }[0], [x12]
add x12, x8, #1
ld1 { v3.b }[4], [x12]
add x12, x10, #1
ld1 { v5.b }[4], [x12]
sub x12, x8, #1
ld1 { v4.b }[4], [x12]
sub x12, x10, #1
ld1 { v6.b }[4], [x12]
eor v3.8b, v5.8b, v3.8b
ushll   v3.2d, v3.2s, #0
and v3.16b, v3.16b, v1.16b
eor v4.8b, v6.8b, v4.8b
ushll   v4.2d, v4.2s, #0
and v4.16b, v4.16b, v1.16b
cnt v3.16b, v3.16b
cnt v4.16b, v4.16b
uaddlp  v3.8h, v3.16b
uaddlp  v4.8h, v4.16b
uaddlp  v3.4s, v3.8h
uaddlp  v4.4s, v4.8h
add x8, x8, #4
subsx11, x11, #4
uadalp  v2.2d, v3.4s
uadalp  v0.2d, v4.4s
add x10, x10, #4
b.ne.LBB0_5

-- CUT 
Note I think we could be better.

[Bug tree-optimization/68109] GCC fails to vectorize popcount on x86_64

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68109

Andrew Pinski  changed:

   What|Removed |Added

  Component|target  |tree-optimization

--- Comment #2 from Andrew Pinski  ---
Could there be generic support for popcount added?

[Bug tree-optimization/54978] Add ability to provide vectorized functions

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54978

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |6.0
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Andrew Pinski  ---
Fixed in GCC 6 with r6-4931.

[Bug tree-optimization/47860] is vectorization of "condition in nested loop" supported

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47860

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-08-16

--- Comment #6 from Andrew Pinski  ---
Confirmed, ICC is able to vectorize this loop even without AVX (GCC can do the
vectorize the loop currently with AVX).


movdqa%xmm0, %xmm11 #10.11
lea   1(%r14), %r15d#9.31
movups(%rdx,%r15,8), %xmm9  #9.27
movups(%rcx,%r14,8), %xmm10 #10.24
cmpltpd   %xmm1, %xmm10 #10.24
pxor  %xmm2, %xmm10 #10.24
movmskpd  %xmm10, %r15d #10.24
testl %r15d, %r15d  #10.24
je..B1.14   # Prob 50%  #10.24
# LOE rax rdx rcx rbx rsi rdi ebp r8d r9d r10d
r11d r12d r13d r14d xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 xmm8 xmm9 xmm10
xmm11
..B1.13:# Preds ..B1.12
# Execution count [1.25e+01]
pshufd$8, %xmm10, %xmm11#10.24
movaps%xmm9, %xmm8  #5.21
pand  %xmm6, %xmm11 #10.24
# LOE rax rdx rcx rbx rsi rdi ebp r8d r9d r10d
r11d r12d r13d r14d xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 xmm8 xmm11
..B1.14:# Pre

[Bug rtl-optimization/46391] false dependencies are computed after vectorization (#2)

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46391

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||alias, missed-optimization
 Blocks||53947

--- Comment #4 from Andrew Pinski  ---
I suspect this has been long fixed.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug other/36395] TARGET_VECTOR_ALIGNMENT_REACHABLE isn't documented

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36395

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |4.5.0
 Status|NEW |RESOLVED

--- Comment #3 from Andrew Pinski  ---
It was renamed to TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE and documented
r0-98468.

[Bug middle-end/101926] [meta-bug] struct/complex argument passing and return should be improved

2021-08-15 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101926
Bug 101926 depends on bug 88496, which changed state.

Bug 88496 Summary: Unnecessary stack adjustment with -mavx512f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88496

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

[Bug target/88483] Unnecessary stack alignment

2021-08-15 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88483

--- Comment #7 from H.J. Lu  ---
*** Bug 88496 has been marked as a duplicate of this bug. ***

[Bug target/88496] Unnecessary stack adjustment with -mavx512f

2021-08-15 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88496

H.J. Lu  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from H.J. Lu  ---
Dup.

*** This bug has been marked as a duplicate of bug 88483 ***

[Bug middle-end/31271] Missing simple optimization

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31271

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
  Known to work|4.7.1   |

--- Comment #2 from Andrew Pinski  ---
We produce in 4.7.0+

in_canforward(unsigned int):
.LFB0:
.cfi_startproc
andl$224, %edi
xorl%eax, %eax
cmpl$224, %edi
setne   %al
ret


That is:
  D.2201_1 = in_2(D) & 224;
  D.2199_10 = D.2201_1 != 224;

I think we could do slightly better
((~in_2(D)) & 224) == 0

But only at exand time.
This gives:
notl%edi
xorl%eax, %eax
testb   $-32, %dil
setne   %al

Or for aarch64:
mov w8, #224
bicswzr, w8, w0
csetw0, ne
ret

[Bug middle-end/90216] Stack Pointer decrementing even when not loading extra data to stack.

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90216

--- Comment #3 from Andrew Pinski  ---
Testcase:
#include 

template
struct Neighbourhood {
using datatype = DataType;
};


template
struct Building {
using datatype = typename N::datatype;

operator datatype() const{
return (static_cast(this)->contam_level);
}
void operator=(datatype const x) volatile {
static_cast(this)->contam_level = x;
}
};

struct Cincin : public Neighbourhood
{
struct Apartment : Building{ 

using Building::operator=;
Apartment(Apartment volatile & a) : contam_level(a.contam_level) {}
Apartment(uint32_t const c = 0) : contam_level (c) {}

union {
struct
{
uint32_t laundry   :3;
uint32_t   :5;
uint32_t lobby :8;
uint32_t   :16;
};
uint32_t contam_level;
};

Apartment& Laundry(uint32_t val){
this->laundry = val;
return *this;
}
};
Apartment volatile apartment;
};

Cincin cincin[2];


int main(){

  cincin[0].apartment = Cincin::Apartment().Laundry(0x7);
  //cincin[0].apartment = Cincin::Apartment(0x7);
  //cincin[0].apartment = 9;
}

[Bug middle-end/101926] New: [meta-bug] struct/complex argument passing and return should be improved

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101926

Bug ID: 101926
   Summary: [meta-bug] struct/complex argument passing and return
should be improved
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

There are many of these bugs, for x86_64, aarch64, powerpc, etc.

[Bug middle-end/95756] Failure to optimize memory operations with _Complex

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95756

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed|2020-06-19 00:00:00 |2021-8-15
  Component|rtl-optimization|middle-end
   Severity|normal  |enhancement

[Bug fortran/101871] Array of strings of different length passed as an argument produces invalid result.

2021-08-15 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101871

--- Comment #6 from Steve Kargl  ---
On Sun, Aug 15, 2021 at 07:21:42PM +, anlauf at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101871
> 
> --- Comment #5 from anlauf at gcc dot gnu.org ---
> In array.c:gfc_match_array_constructor there's the following code:
> 
> 1335  /* Walk the constructor, and if possible, do type conversion for
> 1336 numeric types.  */
> 1337  if (gfc_numeric_ts ())
> 1338{
> 1339  m = walk_array_constructor (, head);
> 1340  if (m == MATCH_ERROR)
> 1341return m;
> 1342}
> 
> Steve, you were the last one to work on this block.
> It appears that non-numeric ts are not handled (here).
> Can you give some insight?
> 

Unfortunately, I can't remember why it's confined to numeric
types.  I did the simply thing of commenting out the if-stmt
and got an ICE.  I also tried explicitly setting the typespec
of each array element to the typespec of array constructor
and that also ICE'd.

I haven't had time to polk further.  I think at some point the
actual arg list is reduced to a formal argument list.  This
might loose the array constructor typespec when reducing/resolving
the arg list.

[Bug middle-end/87650] suboptimal codegen for testing low bit

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87650

--- Comment #2 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #1)
> If I saw these two statements:
s/saw/swap/
I don't know why I wrote the wrong word there. I was thinking swap and still
wrote saw.

[Bug middle-end/87650] suboptimal codegen for testing low bit

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87650

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
  Component|target  |middle-end
 Ever confirmed|0   |1
   Last reconfirmed||2021-08-16
   Severity|normal  |enhancement

--- Comment #1 from Andrew Pinski  ---
If I saw these two statements:
auto m = n%2;
n = n/2;

GCC is able to produce the testb.  This is due to the shift clobbering the
flags.
I wonder if we could produce better code during expand.

[Bug middle-end/78947] sub-optimal code for (bool)(int ? int : int)

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78947

Andrew Pinski  changed:

   What|Removed |Added

  Known to fail||8.5.0
  Known to work||9.1.0

--- Comment #2 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #1)
> Confirmed, this is fold-cost folding:
> (bool)(a?b:c)
> into a ? (bool) b : (bool)c;
> early.

This was removed in GCC 9.

[Bug tree-optimization/101925] reversed storage order when compiling with -O3 only

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101925

Andrew Pinski  changed:

   What|Removed |Added

  Component|c   |tree-optimization
   Last reconfirmed||2021-08-15
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Keywords||wrong-code

--- Comment #1 from Andrew Pinski  ---
Looks like the SLP vectorizer this.
-O3 -fno-tree-slp-vectorize works.

[Bug c/101925] New: reversed storage order when compiling with -O3 only

2021-08-15 Thread george.thopas at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101925

Bug ID: 101925
   Summary: reversed storage order when compiling with -O3 only
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: george.thopas at gmail dot com
  Target Milestone: ---

/*
 reversed storage order when compiling with -O3 only. 
 this time sets up an all big-endian struct 
 from an all little-endian one
 no warnings 

 Target: x86_64-pc-linux-gnu
 gcc versie 11.1.0 (Gentoo 11.1.0-r2 p3) 
 gcc-trunk too

 $ gcc -Wall -Wextra -O3 test.c 
 $ ./a.out 
 Abort 

 */


#define BIG_ENDIAN   __attribute__((scalar_storage_order("big-endian")))

/* host order version (little endian)*/
struct _ip6_addr {
union {
char addr8[16];
int  addr32[4];
} u;
};

typedef struct _ip6_addr t_ip6_addr;

struct _net_addr {
char is_v4;
union {
intaddr;
t_ip6_addr addr6;
} u;
};

typedef struct _net_addr t_net_addr;

/* big endian version */
struct _be_ip6_addr {
union {
char addr8[16];
} BIG_ENDIAN u;
} BIG_ENDIAN;

typedef struct _be_ip6_addr t_be_ip6_addr;

struct _be_net_addr {
char is_v4;
union {
t_be_ip6_addr addr6;
int   addr;
} BIG_ENDIAN u;
} BIG_ENDIAN;

typedef struct _be_net_addr t_be_net_addr;

/* convert */
t_be_ip6_addr be_ip6_addr(const t_ip6_addr ip6)
{
t_be_ip6_addr rc = {
.u.addr8[0] = ip6.u.addr8[0],
.u.addr8[1] = ip6.u.addr8[1],
.u.addr8[2] = ip6.u.addr8[2],
.u.addr8[3] = ip6.u.addr8[3],
.u.addr8[4] = ip6.u.addr8[4],
.u.addr8[5] = ip6.u.addr8[5],
.u.addr8[6] = ip6.u.addr8[6],
.u.addr8[7] = ip6.u.addr8[7],
.u.addr8[8] = ip6.u.addr8[8],
.u.addr8[9] = ip6.u.addr8[9],
.u.addr8[10] = ip6.u.addr8[10],
.u.addr8[11] = ip6.u.addr8[11],
.u.addr8[12] = ip6.u.addr8[12],
.u.addr8[13] = ip6.u.addr8[13],
.u.addr8[14] = ip6.u.addr8[14],
.u.addr8[15] = ip6.u.addr8[15],
};
return rc;
}

t_be_net_addr be_net_addr(const t_net_addr ip)
{
t_be_net_addr rc = {.is_v4 = ip.is_v4 };
if (ip.is_v4) {
rc.u.addr = ip.u.addr;
} else {
rc.u.addr6 = be_ip6_addr(ip.u.addr6);
}
return rc;
}

int main(void)
{
t_be_net_addr out = { };

t_net_addr in = {
.is_v4 = 0,
.u.addr6.u.addr8 =
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }
};

out = be_net_addr(in);

// actually first 4 bytes are swapped
if (in.u.addr6.u.addr8[0] != out.u.addr6.u.addr8[0])
__builtin_abort();

return 0;
}

[Bug middle-end/80261] Worse code generated compared to clang with modulus operation

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80261

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
 Status|UNCONFIRMED |NEW
  Component|target  |middle-end
   Last reconfirmed||2021-08-15
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Confirmed:
  ptr.0_1 = (long int) ptr_5(D);
  _4 = ptr.0_1 & 4294967295;
  _2 = (long unsigned int) _4;
  _3 = _2 % 131;

We could do the %131 in 32bits while we do it for 64bit.
The second issue is we expand out *131 which might be ok.

[Bug middle-end/80006] loss of range information due to spurious widening conversion

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80006

--- Comment #6 from Andrew Pinski  ---
> On x86_64, this conversion from signed char to int is for some reason 
> performed even in function f, so the test program triggers no warnings.

Oh yes the promotion happens because of a target hook.

[Bug target/31667] Integer extensions vectorization could be improved

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667

--- Comment #5 from Andrew Pinski  ---
We produce this now:

movdqa  x(%rip), %xmm1
pxor%xmm0, %xmm0
movdqa  %xmm1, %xmm2
punpckhbw   %xmm0, %xmm1
movaps  %xmm1, y+16(%rip)
movdqa  x+16(%rip), %xmm1
punpcklbw   %xmm0, %xmm2
movaps  %xmm2, y(%rip)
movdqa  %xmm1, %xmm2
punpckhbw   %xmm0, %xmm1
movaps  %xmm1, y+48(%rip)
movdqa  x+32(%rip), %xmm1
punpcklbw   %xmm0, %xmm2
movaps  %xmm2, y+32(%rip)
movdqa  %xmm1, %xmm2
punpckhbw   %xmm0, %xmm1
movaps  %xmm1, y+80(%rip)
movdqa  x+48(%rip), %xmm1
punpcklbw   %xmm0, %xmm2
movaps  %xmm2, y+64(%rip)
movdqa  %xmm1, %xmm2
punpckhbw   %xmm0, %xmm1
punpcklbw   %xmm0, %xmm2
movaps  %xmm1, y+112(%rip)
movaps  %xmm2, y+96(%rip)

And even ICC produce a similar thing except scheduled differently.

[Bug tree-optimization/78327] Improve VRP for ranges for compares which do ranges of [-TYPE_MAX + N, N]

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78327

Andrew Pinski  changed:

   What|Removed |Added

  Known to fail||10.3.0
  Known to work||11.1.0

--- Comment #8 from Andrew Pinski  ---
I suspect r11-4134 fixed the issue for GCC 11.

[Bug tree-optimization/64567] missed optimization: redundant test before clearing bit(s)

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64567

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=96237

--- Comment #4 from Andrew Pinski  ---
related to PR 96237, maybe the same.

[Bug tree-optimization/60575] inefficient vectorization of compare into bytes on amd64

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60575

--- Comment #1 from Andrew Pinski  ---
We produce now since GCC 5+:
.L4:
movdqu  (%rsi,%rax,2), %xmm0
movdqu  16(%rsi,%rax,2), %xmm1
pcmpgtw %xmm4, %xmm0
pcmpgtw %xmm4, %xmm1
pand%xmm3, %xmm0
pand%xmm3, %xmm1
pand%xmm2, %xmm0
pand%xmm2, %xmm1
packuswb%xmm1, %xmm0
movups  %xmm0, (%rdi,%rax)
addq$16, %rax
cmpq$1024, %rax
jne .L4

Note I removed __builtin_assume_aligned.

Also I note there are two extra pand's.  The second pand is not needed.

[Bug target/91569] Optimisation test case and unnecessary XOR-OR pair instead of MOV.

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91569

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Andrew Pinski  ---
Fixed.

[Bug tree-optimization/63271] Should commute arithmetic with vector load

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271

--- Comment #3 from Andrew Pinski  ---
So the two functions are not the same (because __m128i is Vector of 2 long long
[at least now]).
Here is a better testcase:

#define vector __attribute__((vector_size(16)))
typedef vector  char __m128i ;

static inline __m128i _mm_set_epi8(char a, char b, char c, char d, char e, char
f,
 char g, char h, char i, char j, char k, char l,
 char m, char n, char o, char p)
{
  return (__m128i){a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p};
}


__m128i foo(char C)
{
  return _mm_set_epi8(   0,C,  2*C,  3*C,
   4*C,  5*C,  6*C,  7*C,
   8*C,  9*C, 10*C, 11*C,
  12*C, 13*C, 14*C, 15*C);
}

__m128i bar(char C)
{
  __m128i v = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7,
   8, 9,10,11,12,13,14,15);
  vector unsigned char d = (vector unsigned char)v;
  d *= C;
  return (__m128i)d;
}
-CUT 

So take the above, on aarch64 SLP does not do it because it does not recongize
0 and C as being able to SLPed.  If I change them to be both to 2*C, then SLP
will do the right thing.

[Bug target/91569] Optimisation test case and unnecessary XOR-OR pair instead of MOV.

2021-08-15 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91569

--- Comment #3 from H.J. Lu  ---
It is fixed by r11-165.

[Bug tree-optimization/51780] Missed optimization for ==/!= comparison

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51780

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |8.0
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #3 from Andrew Pinski  ---
Fixed with r8-3771.

[Bug tree-optimization/51780] Missed optimization for ==/!= comparison

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51780

Andrew Pinski  changed:

   What|Removed |Added

  Known to work||8.1.0
  Known to fail||7.5.0

--- Comment #2 from Andrew Pinski  ---

  _1 = ar[a_4(D)];
  _2 = _1 != 0;
  _6 = (int) _2;

[Bug target/91569] Optimisation test case and unnecessary XOR-OR pair instead of MOV.

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91569

Andrew Pinski  changed:

   What|Removed |Added

  Known to work||11.1.0

--- Comment #2 from Andrew Pinski  ---
For test3 GCC 11+ produces:

opt_test3(int):
.LFB2:
.cfi_startproc
movslq  %edi, %rax
movb$4, %ah
ret

Which is exactly what you should see.

Gimple level looks the same between GCC 10 and GCC 11:
  _1 = (long int) num_2(D);
  a = _1;
  MEM[(char *) + 1B] = 4;
  _6 = a;

I suspect a decl no longer has address taken so it is not going to the stack
right away.

[Bug target/94871] Failure to convert cmpeqpd+pxor with -1 into cmpneqpd

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94871

--- Comment #2 from Andrew Pinski  ---

v2di cmpneq_pd1(v2df a, v2df b)
{
return ((v2di)(a==b) ^ set1_epi8(0xFF));
}
Produces the correct thing on gimple level:
  _5 = .VCOND (a_2(D), b_3(D), { 0, 0 }, { -1, -1 }, 113);

But the RTL during combine (even with -ffast-math) produces:
(set (reg:V2DI 82 [  ])
(not:V2DI (eq:V2DI (reg:V2DF 89)
(reg:V2DF 90

[Bug target/88712] Optimization: mov edx, 0 not replaced with xor edx, edx in this case

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88712

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
  Known to work||11.1.0
   Target Milestone|--- |11.0
  Known to fail||10.3.0

--- Comment #3 from Andrew Pinski  ---
(In reply to Jakub Jelinek from comment #2)
> so the clearing of %edx is sandwiched in between the cmp and cmov.  Later on
> in this case sched2 reorders those and so we at that point could replace it,
> but we don't have another peephole2 pass and passes after sched2 don't have
> the needed infrastructure to check if the flags are dead (because movl $0,
> reg doesn't clobber flags, but xorl reg, reg does).

Right and the peephole2 that implemented that was done in r11-2588 so closing
as fixed in GCC 11+.

[Bug tree-optimization/96697] Failure to optimize mod+div to 0

2021-08-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96697

--- Comment #6 from Jakub Jelinek  ---
For signed x and y, x % y == x % -y, x % y has the sign of x.  So for x in
non-negative you can use x % y < abs(y) and generally -abs(y) < x % y < abs(y)

[Bug modula2/101387] Unconditional use of

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101387

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Andrew Pinski  ---
.

[Bug modula2/101388] Unconditional use of __MAX_BAUD

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101388

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Andrew Pinski  ---
.

[Bug tree-optimization/85366] Failure to use both div and mod results of one IDIV in a prime-factor loop while(n%i==0) { n/=i; }

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85366

Andrew Pinski  changed:

   What|Removed |Added

 Depends on||96697

--- Comment #4 from Andrew Pinski  ---

  n_17 = n_24 / i_30;
  _3 = n_17 % i_30;

So basically PR 96697.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96697
[Bug 96697] Failure to optimize mod+div to 0

[Bug tree-optimization/96697] Failure to optimize mod+div to 0

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96697

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Keywords||missed-optimization
   Last reconfirmed|2020-08-25 00:00:00 |2021-8-15

[Bug sanitizer/95244] [10 Regression] GCC 10 no longer builds on RHEL5 [trivial patch]

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95244

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |10.4
Summary|GCC 10 no longer builds on  |[10 Regression] GCC 10 no
   |RHEL5 [trivial patch]   |longer builds on RHEL5
   ||[trivial patch]

--- Comment #4 from Andrew Pinski  ---
Fixed in GCC 11 by a merge from upstream; r11-781.

[Bug c++/101904] Wrong result of decltype during instantiation of std::result_of

2021-08-15 Thread officesamurai at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101904

--- Comment #2 from Mikhail Kremniov  ---
I see, thanks.
But I must mention that Clang is able to compile this code somehow.

[Bug sanitizer/95244] GCC 10 no longer builds on RHEL5 [trivial patch]

2021-08-15 Thread nightstrike at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95244

nightstrike  changed:

   What|Removed |Added

 CC||nightstrike at gmail dot com

--- Comment #3 from nightstrike  ---
That link leads to:

https://reviews.llvm.org/D80648

Which was approved, and eventually merged into gcc here at 3c6331c2 and later
improved with 0b997f6e.

This can be marked as a regression fixed in 11.1 but still in 10 as of 10.3.

[Bug fortran/101871] Array of strings of different length passed as an argument produces invalid result.

2021-08-15 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101871

--- Comment #5 from anlauf at gcc dot gnu.org ---
In array.c:gfc_match_array_constructor there's the following code:

1335  /* Walk the constructor, and if possible, do type conversion for
1336 numeric types.  */
1337  if (gfc_numeric_ts ())
1338{
1339  m = walk_array_constructor (, head);
1340  if (m == MATCH_ERROR)
1341return m;
1342}

Steve, you were the last one to work on this block.
It appears that non-numeric ts are not handled (here).
Can you give some insight?

[Bug target/100293] MinGW-w64 of nvptx offload engine fails

2021-08-15 Thread brechtsanders at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100293

--- Comment #9 from Brecht Sanders  
---
Any update on this?
Issue still exists today (in GCC 11.2.0 and in latest snapshot
11.2.1-20210814).

Both when building gcc on Windows for nvptx as well as the offload engine for
nvptx there is an error like this in nvptx-none/libatomic/config.log:

configure:3736: $? = 1
configure:3756: checking whether the C compiler works
configure:3778:
/R/winlibs64_stage/nvptx-gcc-11-20210814/gcc-11-20210814/build_nvptx_gcc/./gcc/xgcc
-B/R/winlibs64_stage/nvptx-gcc-11-20210814/gcc-11-20210814/build_nvptx_gcc/./gcc/
-nostdinc
-B/R/winlibs64_stage/nvptx-gcc-11-20210814/gcc-11-20210814/build_nvptx_gcc/nvptx-none/newlib/
-isystem
/R/winlibs64_stage/nvptx-gcc-11-20210814/gcc-11-20210814/build_nvptx_gcc/nvptx-none/newlib/targ-include
-isystem
/R/winlibs64_stage/nvptx-gcc-11-20210814/gcc-11-20210814/newlib/libc/include
-B/R/winlibs64_stage/inst_nvptx-gcc-11-20210814/share/nvptx-gcc/nvptx-none/bin/
-B/R/winlibs64_stage/inst_nvptx-gcc-11-20210814/share/nvptx-gcc/nvptx-none/lib/
-isystem
/R/winlibs64_stage/inst_nvptx-gcc-11-20210814/share/nvptx-gcc/nvptx-none/include
-isystem
/R/winlibs64_stage/inst_nvptx-gcc-11-20210814/share/nvptx-gcc/nvptx-none/sys-include
   -g -O2   conftest.c  >&5
error reading C:\Temp\ccqpNwjZ.o
collect2.exe: error: ld returned 1 exit status

[Bug fortran/99351] ICE in gfc_finish_var_decl, at fortran/trans-decl.c:695

2021-08-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99351

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:bbf19f9c20515da9fcd23f08c8139427374e8d77

commit r12-2915-gbbf19f9c20515da9fcd23f08c8139427374e8d77
Author: Harald Anlauf 
Date:   Sun Aug 15 20:13:11 2021 +0200

Fortran: fix checks for STAT= and ERRMSG= arguments of SYNC ALL/SYNC IMAGES

gcc/fortran/ChangeLog:

PR fortran/99351
* match.c (sync_statement): Replace %v code by %e in gfc_match to
allow for function references as STAT and ERRMSG arguments.
* resolve.c (resolve_sync): Adjust checks of STAT= and ERRMSG= to
being definable arguments.  Function references with a data
pointer result are accepted.
* trans-stmt.c (gfc_trans_sync): Adjust assertion.

gcc/testsuite/ChangeLog:

PR fortran/99351
* gfortran.dg/coarray_sync.f90: New test.
* gfortran.dg/coarray_3.f90: Adjust error messages.

[Bug libstdc++/101923] std::function's move ctor is slower than the copy one for empty source objects

2021-08-15 Thread nok.raven at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

Nikita Kniazev  changed:

   What|Removed |Added

 CC||nok.raven at gmail dot com

--- Comment #2 from Nikita Kniazev  ---
There is no difference in the produced code on trunk (except move ops order)
https://godbolt.org/z/esfjhr9ae

[Bug ada/101924] /usr/ccs/bin/ld: Unsatisfied symbols: U_get_unwind_entry, U_IS_STUB_OR_CALLX, U_get_shLib_text_addr, U_is_shared_pc, U_init_frame_record, U_prep_frame_rec_for_unwind, U_get_shLib_unw_

2021-08-15 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101924

John David Anglin  changed:

   What|Removed |Added

Summary|/usr/ccs/bin/ld:|/usr/ccs/bin/ld:
   |Unsatisfied symbols |Unsatisfied symbols:
   |referenced  |U_get_unwind_entry,
   ||U_IS_STUB_OR_CALLX,
   ||U_get_shLib_text_addr,
   ||U_is_shared_pc,
   ||U_init_frame_record,
   ||U_prep_frame_rec_for_unwind
   ||, U_get_shLib_unw_tbl,
   ||U_get_previous_frame_x and
   ||U_get_unwind_table

--- Comment #1 from John David Anglin  ---
g++ -std=c++11 -no-pie -g -DIN_GCC -fno-exceptions -fno-rtti
-fasynchronous-unwi
nd-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format
-Wmiss
ing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long
-Wno-variadic
-macros -Wno-overlength-strings -DHAVE_CONFIG_H -static-libstdc++
-static-libgcc
  -o gnatbind -g ada/b_gnatb.o ada/ali-util.o ada/ali.o ada/alloc.o
ada/aspects.
o ada/atree.o ada/bcheck.o ada/binde.o ada/binderr.o ada/bindgen.o ada/bindo.o
a
da/bindo-augmentors.o ada/bindo-builders.o ada/bindo-diagnostics.o
ada/bindo-ela
borators.o ada/bindo-graphs.o ada/bindo-units.o ada/bindo-validators.o
ada/bindo
-writers.o ada/bindusg.o ada/butil.o ada/casing.o ada/csets.o ada/debug.o
ada/ei
nfo-entities.o ada/einfo-utils.o ada/einfo.o ada/elists.o ada/err_vars.o
ada/err
out.o ada/erroutc.o ada/exit.o ada/final.o ada/fmap.o ada/fname-uf.o
ada/fname.o
 ada/gnatbind.o ada/gnatvsn.o ada/hostparm.o ada/krunch.o ada/lib.o ada/link.o
a
da/namet.o ada/nlists.o ada/opt.o ada/osint-b.o ada/osint.o ada/output.o
ada/res
trict.o ada/rident.o ada/scans.o ada/scil_ll.o ada/scng.o ada/sdefault.o
ada/sei
nfo.o ada/sem_aux.o ada/sinfo.o ada/sinfo-nodes.o ada/sinfo-utils.o
ada/sinput-c
.o ada/sinput.o ada/snames.o ada/stand.o ada/stringt.o ada/style.o ada/styleg.o
ada/stylesw.o ada/switch-b.o ada/switch.o ada/table.o ada/targparm.o
ada/types.o
 ada/uintp.o ada/uname.o ada/urealp.o ada/widechar.o ada/gnat.o ada/g-dynhta.o
a
da/g-lists.o ada/g-graphs.o ada/g-sets.o ada/s-casuti.o ada/s-os_lib.o
ada/s-res
fil.o ada/s-utf_32.o ada/adaint.o ada/argv.o ada/cio.o ada/cstreams.o ada/env.o
ada/errno.o ada/targext.o ada/version.o ggc-none.o libcommon-target.a
libcommon.
a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a
../libiberty/libiber
ty.a ../libdecnumber/libdecnumber.a  
/home/opt/gnu/gcc/gcc-8/lib/gcc/hppa2.0w-h
p-hpux11.11/8.5.0/adalib/libgnat.a
/usr/ccs/bin/ld: Unsatisfied symbols:
   U_get_unwind_entry (first referenced in
/home/opt/gnu/gcc/gcc-8/lib/gcc/hppa2
.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code)
   U_IS_STUB_OR_CALLX (first referenced in
/home/opt/gnu/gcc/gcc-8/lib/gcc/hppa2
.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code)
   U_get_shLib_text_addr (first referenced in
/home/opt/gnu/gcc/gcc-8/lib/gcc/hp
pa2.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code)
   U_is_shared_pc (first referenced in
/home/opt/gnu/gcc/gcc-8/lib/gcc/hppa2.0w-
hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code)
   U_init_frame_record (first referenced in
/home/opt/gnu/gcc/gcc-8/lib/gcc/hppa
2.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code)
   U_prep_frame_rec_for_unwind (first referenced in
/home/opt/gnu/gcc/gcc-8/lib/
gcc/hppa2.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code)
   U_get_shLib_unw_tbl (first referenced in
/home/opt/gnu/gcc/gcc-8/lib/gcc/hppa
2.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code)
   U_get_previous_frame_x (first referenced in
/home/opt/gnu/gcc/gcc-8/lib/gcc/h
ppa2.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code)
   U_get_unwind_table (first referenced in
/home/opt/gnu/gcc/gcc-8/lib/gcc/hppa2
.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code)
collect2: error: ld returned 1 exit status
make[3]: *** [../../gcc/gcc/ada/gcc-interface/Make-lang.in:746: gnatbind] Error
1
make[3]: *** Waiting for unfinished jobs

This was introduced by the following change:

commit abcf5174979bcb91ac4c921eaa19a5b37f231ae4 (HEAD, refs/bisect/bad)
Author: Arnaud Charlet 
Date:   Wed Jan 13 08:49:15 2021 -0500

[Ada] Use runtime from base compiler during stage1

gcc/ada/

* Make-generated.in: Add rule to copy runtime files needed
during stage1.
* raise.c: Remove obsolete symbols used during bootstrap.
* gcc-interface/Make-lang.in: Do not use libgnat sources during
stage1.
(GNAT_ADA_OBJS, GNATBIND_OBJS): Split in two 

[Bug ada/101924] New: /usr/ccs/bin/ld: Unsatisfied symbols referenced

2021-08-15 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101924

Bug ID: 101924
   Summary: /usr/ccs/bin/ld: Unsatisfied symbols referenced
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ada
  Assignee: unassigned at gcc dot gnu.org
  Reporter: danglin at gcc dot gnu.org
  Target Milestone: ---
  Host: hppa2.0w-hp-hpux11.11
Target: hppa2.0w-hp-hpux11.11
 Build: hppa2.0w-hp-hpux11.11

Re: [PATCH] lib: bitmap: Mute some odd section mismatch warning in xtensa kernel build

2021-08-15 Thread Yury Norov via Gcc-bugs
On Sun, Aug 15, 2021 at 03:21:32PM +1200, Barry Song wrote:
> From: Barry Song 
> 
> Constanly there are some section mismatch issues reported in test_bitmap
> for xtensa platform such as:
> 
>   Section mismatch in reference from the function bitmap_equal() to the
>   variable .init.data:initcall_level_names
>   The function bitmap_equal() references the variable __initconst
>   __setup_str_initcall_blacklist. This is often because bitmap_equal
>   lacks a __initconst annotation or the annotation of
>   __setup_str_initcall_blacklist is wrong.
> 
>   Section mismatch in reference from the function bitmap_copy_clear_tail()
>   to the variable .init.rodata:__setup_str_initcall_blacklist
>   The function bitmap_copy_clear_tail() references the variable __initconst
>   __setup_str_initcall_blacklist.
>   This is often because bitmap_copy_clear_tail lacks a __initconst
>   annotation or the annotation of __setup_str_initcall_blacklist is wrong.
> 
> To be honest, hardly to believe kernel code is wrong since bitmap_equal is
> always called in __init function in test_bitmap.c just like __bitmap_equal.
> But gcc doesn't report any issue for __bitmap_equal even when bitmap_equal
> and __bitmap_equal show in the same function such as:
> 
>   static void noinline __init test_mem_optimisations(void)
>   {
>   ...
>   for (start = 0; start < 1024; start += 8) {
>   for (nbits = 0; nbits < 1024 - start; nbits += 8) {
>   if (!bitmap_equal(bmap1, bmap2, 1024)) {
>   failed_tests++;
>   }
>   if (!__bitmap_equal(bmap1, bmap2, 1024)) {
>   failed_tests++;
>   }
>   ...
>   }
>   }
>   }
> 
> The different between __bitmap_equal() and bitmap_equal() is that the
> former is extern and a EXPORT_SYMBOL. So noinline, and probably in fact
> noclone. But the later is static and unfortunately not inlined at this
> time though it has a "inline" flag.
> 
> bitmap_copy_clear_tail(), on the other hand, seems more innocent as it is
> accessing stack only by its wrapper bitmap_from_arr32() in function
> test_bitmap_arr32():
> static void __init test_bitmap_arr32(void)
> {
> unsigned int nbits, next_bit;
> u32 arr[EXP1_IN_BITS / 32];
> DECLARE_BITMAP(bmap2, EXP1_IN_BITS);
> 
> memset(arr, 0xa5, sizeof(arr));
> 
> for (nbits = 0; nbits < EXP1_IN_BITS; ++nbits) {
> bitmap_to_arr32(arr, exp1, nbits);
> bitmap_from_arr32(bmap2, arr, nbits);
> expect_eq_bitmap(bmap2, exp1, nbits);
>   ...
> }
> }
> Looks like gcc optimized arr, bmap2 things to .init.data but it seems
> nothing is wrong in kernel since test_bitmap_arr32() is __init.
> 
> Max Filippov reported a bug to gcc but gcc people don't ack. So here
> this patch removes the involved symbols by forcing inline. It might
> not be that elegant but I don't see any harm as bitmap_equal() and
> bitmap_copy_clear_tail() are both quite small. In addition, kernel
> doc also backs this modification "We don't use the 'inline' keyword
> because it's broken": www.kernel.org/doc/local/inline.html

This is a 2006 article. Are you sure nothing has been changed over the
last 15 years?
 
> Another possible way to "fix" the warning is moving the involved
> symboms to lib/bitmap.c:

So, it's a GCC issue already reported to GCC? For me it sounds like
nothing to fix in kernel. If I was a GCC developer, I'd prefer to have
all bugs clearly reproducible. 

Let's wait for GCC and xtensa people comments. (CC xtensa and GCC
lists)

Yury

>   +int bitmap_equal(const unsigned long *src1,
>   +   const unsigned long *src2, unsigned int nbits)
>   +{
>   +   if (small_const_nbits(nbits))
>   +   return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
>   +   if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
>   +   IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
>   +   return !memcmp(src1, src2, nbits / 8);
>   +   return __bitmap_equal(src1, src2, nbits);
>   +}
>   +EXPORT_SYMBOL(bitmap_equal);
> 
> This is harmful to the performance.
> 
> Reported-by: kernel test robot 
> Cc: Andy Shevchenko 
> Cc: Max Filippov 
> Cc: Andrew Pinski 
> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92938
> Signed-off-by: Barry Song 
> ---
>  include/linux/bitmap.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 37f36dad18bd..3eec9f68a0b6 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -258,7 +258,7 @@ static inline void bitmap_copy(unsigned long *dst, const 
> unsigned long *src,
>  /*
>   * Copy bitmap and clear tail bits in last word.
>   */
> -static inline void bitmap_copy_clear_tail(unsigned long *dst,
> +static 

[Bug fortran/101918] LTO type mismatches for runtime library functions in mixed -fdefault-real-8 projects

2021-08-15 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101918

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kargl at gcc dot gnu.org

--- Comment #2 from kargl at gcc dot gnu.org ---
(In reply to Rimvydas (RJ) from comment #0)

> $ gfortran -Wall -Wextra -flto -fdefault-real-8 -c foo.f90
> $ gfortran -flto -Wall -Wextra foo.o bar.f90

This should be closed as WONTFIX.  If you compile foo.f90
with -fdefault-real-8, then you must compile bar.f90 with
-fdefault-real-8.  You're changing the ABI for foo.f90, 
but not bar.f90.

> Does this mean -flto cannot be used in mixed -fdefault-real-8
> and usual modes?

It means "Don't use -fdefault-real-8".  It is a broken
unfixable option that I tried to remove years ago, but
that was voted down.

If you have code that requires this option, then the
code should be properly ported to REAL(8).

[Bug modula2/101387] Unconditional use of

2021-08-15 Thread gaiusmod2 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101387

--- Comment #1 from Gaius Mulley  ---
many thanks for the bug report - now fixed in the git repro.
The bugfix emits a prototype for throw (if required) rather than use a non
portable header file.

[Bug target/82883] eax register unnecessary consumed

2021-08-15 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82883

H.J. Lu  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from H.J. Lu  ---
It is done on purpose by r0-116075:

[hjl@gnu-cfl-2 tmp]$ gcc -S -O2 x.c -mtune-ctrl=^lcp_stall
[hjl@gnu-cfl-2 tmp]$ cat x.s
.file   "x.c"
.text
.p2align 4
.globl  foo
.type   foo, @function
foo:
.LFB0:
.cfi_startproc
movl$1819043144, (%rdi)
movw$8303, 4(%rdi)
ret
.cfi_endproc
.LFE0:
.size   foo, .-foo
.ident  "GCC: (GNU) 11.2.1 20210728 (Red Hat 11.2.1-1)"
.section.note.GNU-stack,"",@progbits
[hjl@gnu-cfl-2 tmp]$

[Bug libstdc++/101923] std::function's move ctor is slower than the copy one for empty source objects

2021-08-15 Thread dartdart26 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

--- Comment #1 from Petar Ivanov  ---
Benchmark code (using Google Benchmark):

#include 

#include 
#include 

struct Car {};

static void copy(benchmark::State& state) {
  for (auto _ : state) {
const auto f = std::function{};
const auto copied = f;
benchmark::DoNotOptimize(copied);
  }
}

static void move(benchmark::State& state) {
  for (auto _ : state) {
auto f = std::function{};
const auto moved = std::move(f);
benchmark::DoNotOptimize(moved);
  }
}

BENCHMARK(copy);
BENCHMARK(move);

BENCHMARK_MAIN();

[Bug modula2/101388] Unconditional use of __MAX_BAUD

2021-08-15 Thread gaiusmod2 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101388

--- Comment #1 from Gaius Mulley  ---
"ro at gcc dot gnu.org"  writes:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101388
>
> Bug ID: 101388
>Summary: Unconditional use of __MAX_BAUD
>Product: gcc
>Version: 12.0
> Status: UNCONFIRMED
>   Severity: normal
>   Priority: P3
>  Component: modula2
>   Assignee: unassigned at gcc dot gnu.org
>   Reporter: ro at gcc dot gnu.org
> CC: gaiusmod2 at gmail dot com
>   Target Milestone: ---
> Target: *-*-solaris2.11
>
> Building the devel/modula-2 branch on Solaris 11 fails with undefined
> references
> to __MAX_BAUD in two places:
>
> /vol/gcc/src/git/modula-2/gcc/m2/mc-boot-ch/Gtermios.c: In function
> 'termios_GetFlag':
> /vol/gcc/src/git/modula-2/gcc/m2/mc-boot-ch/Gtermios.c:872:27: error:
> '__MAX_BAUD' undeclared (first use in this function)
>*b = ((t->c_cflag & __MAX_BAUD) == __MAX_BAUD);
>^~
>
> /vol/gcc/src/git/modula-2/gcc/m2/gm2-libs-ch/termios.c: In function
> 'termios_GetFlag':
> /vol/gcc/src/git/modula-2/gcc/m2/gm2-libs-ch/termios.c:877:27: error:
> '__MAX_BAUD' undeclared (first use in this function)
>   877 |   *b = ((t->c_cflag & __MAX_BAUD) == __MAX_BAUD);
>   |   ^~
> __MAX_BAUD seems to be Linux/glibc specific, but the current problem is
> obviously
> cause by a wrong guard which checks for defined(MAX) instead of
> defined(__MAX_BAUD).
>
> Correcting this lets the build continue.

many thanks for the report - now fixed in the git repro,


regards,
Gaius

[Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects

2021-08-15 Thread dartdart26 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923

Bug ID: 101923
   Summary: std::function's move ctor is slower than the copy one
for empty source objects
   Product: gcc
   Version: 9.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dartdart26 at gmail dot com
  Target Milestone: ---

std::function's move constructor calls swap() irrespective of whether the
source object is empty or not. In contrast, the copy constructor first checks
if the source object is empty and if it is, nothing is being done as the `this`
object is constructed in an empty state by _Function_base().

Calling swap() on an empty source requires more work, because some data needs
to be copied - for example, the POD data cannot be moved.

Could the move constructor check if the source is empty too, as the copy one
does? Please let me know if I am missing a rule that prevents that.

I have noticed that on version 9.3.0, but I see the code is the same in current
master at:
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/bits/std_function.h;hb=c22bcfd2f7dc9bb5ad394720f4a612327dc898ba#l391

I have tested on a MacBook M1 and the copy ctor for empty sources is almost 2x
faster than the move ctor:

-
Benchmark   Time CPU   Iterations
-
copy0.945 ns0.945 ns555789159
move 1.83 ns 1.83 ns382183169

I have made an YouTube video for describing my findings and the benchmark
results:
https://www.youtube.com/watch?v=WA3mKab-tn8

[Bug target/91591] Arc: ICE in trunc_int_for_mode, at explow.c:60

2021-08-15 Thread giulio.benetti at benettiengineering dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91591

--- Comment #4 from Giulio Benetti  ---
This bug is pretty old and need to retest if it still shows up. Maybe it’s been
fixed with gcc minor versions. I will let you know.

[Bug target/101922] mips: illegal instruction at -O3 with -mmsa -mloongson-mmi

2021-08-15 Thread xry111 at mengyan1223 dot wang via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101922

--- Comment #1 from Xi Ruoyao  ---
Technically the testcase above invokes UB, but this is reduced from a file in
openssl-1.1.1k.

[Bug target/101922] New: mips: illegal instruction at -O3 with -mmsa -mloongson-mmi

2021-08-15 Thread xry111 at mengyan1223 dot wang via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101922

Bug ID: 101922
   Summary: mips: illegal instruction at -O3 with -mmsa
-mloongson-mmi
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: xry111 at mengyan1223 dot wang
  Target Milestone: ---

$ cat test.c
int x = 0x;
char d[16];

void f() {
  int i;
  for (i = 0; i < 16; i++) {
int t = d[i] >> 8;
x &= t;
  }
}
$ ~/git-repos/gcc-test-mips/gcc/cc1 test.c -O3 -mmsa -mloongson-mmi -nostdinc
$ mips64el-unknown-linux-gnu-as test.s -mmsa -mloongson-mmi -mips64r2
test.s: Assembler messages:
test.s:29: Error: operand 3 out of range `srai.b $w0,$w0,8'

[Bug middle-end/17958] expand_divmod fails to optimize division of 64-bit quantity by small constant when BITS_PER_WORD is 32

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17958

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |11.0
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Andrew Pinski  ---
Implemented by r11-5533, r11-5614 (PPC improvement), and r11-5648.

[Bug target/61030] PowerPC 128 bit integer divide

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61030

Andrew Pinski  changed:

   What|Removed |Added

 Depends on||100809

--- Comment #3 from Andrew Pinski  ---
PR 100809 implemented udivti3, divti3, umodti3, and modti3.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100809
[Bug 100809] PPC: __int128 divide/modulo does not use P10 instructions
vdivsq/vdivuq

[Bug middle-end/101521] -ftrapv should become something like -fsanitize=undefined -fsanitize-undefined-trap-on-error

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101521

Andrew Pinski  changed:

   What|Removed |Added

 Depends on||78473

--- Comment #4 from Andrew Pinski  ---
the request for a division overflow function is PR 78473.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78473
[Bug 78473] Enhancement request:  __builtin_div_overflow

[Bug c++/51178] FAIL: g++.dg/lookup/builtin5.C scan-assembler _ZSt5atanhd

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51178

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |4.8.0
  Component|target  |c++

--- Comment #2 from Andrew Pinski  ---
Fixed by r0-120302.

[Bug c++/101873] Compilation error of valid code with return local variable in C++20 mode

2021-08-15 Thread fchelnokov at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101873

--- Comment #4 from Fedor Chelnokov  ---
If this question is to me, then actually I am not absolutely sure. I initially
thought that GCC was right in this code example. But later a high reputy C++
expert from stackoverflow dissuaded me. According to him, the code example was
valid in C++17 and remains valid in C++20. C++ standard is so complex nowdays.

[Bug libstdc++/57691] freestanding libstdc++ has compile error

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57691
Bug 57691 depends on bug 57699, which changed state.

Bug 57699 Summary: Disable empty parameter list misinterpretation in libstdc++ 
headers when !defined(NO_IMPLICIT_EXTERN_C)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57699

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug c++/57699] Disable empty parameter list misinterpretation in libstdc++ headers when !defined(NO_IMPLICIT_EXTERN_C)

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57699

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |9.0
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #9 from Andrew Pinski  ---
This was removed in GCC 9 by r9-2724.

[Bug target/37727] NO_IMPLICIT_EXTERN_C for newlib

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37727

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |9.0
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #3 from Andrew Pinski  ---
Fixed in GCC 9 by r9-1648.

[Bug middle-end/48580] missed optimization: integer overflow checks

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=59708
 CC||pinskia at gcc dot gnu.org

--- Comment #23 from Andrew Pinski  ---
Also the builtins have been in GCC since GCC 5; r5-4844, PR 59708.

[Bug middle-end/91072] does not reduce the size of a division by a constant on non-negative int / small unsigned long constant

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91072

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug middle-end/48580] missed optimization: integer overflow checks

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580

--- Comment #22 from Andrew Pinski  ---
For the original testcase in comment #0 we produce (in GCC 11+):
movl%edi, %eax
mull%esi
seto%dl
xorl%r8d, %r8d
movzbl  %dl, %edx
testl   %eax, %eax
jle .L1
testl   %edx, %edx
sete%r8b
.L1:
movl%r8d, %eax
ret

--- CUT 
I have a patch which I think improves the code even more.

The gimple level looks like this correctly:
  x.0_1 = (unsigned int) x_6(D);
  y.1_2 = (unsigned int) y_7(D);
  _11 = .MUL_OVERFLOW (x.0_1, y.1_2);
  tmp_8 = REALPART_EXPR <_11>;
  tmp.3_3 = (int) tmp_8;
  if (tmp.3_3 > 0)
goto ; [59.00%]
  else
goto ; [41.00%]

   [local count: 633507680]:
  _12 = IMAGPART_EXPR <_11>;
  _10 = _12 == 0;

   [local count: 1073741824]:
  # iftmp.2_5 = PHI <_10(3), 0(2)>

Notice no divide.  The _12 == 0 part really should just _12 ^ 1.

After my patch (which I need to finish up) we get:
movl%edi, %eax
mull%esi
seto%dl
xorl%r8d, %r8d
movzbl  %dl, %edx
xorl$1, %edx
testl   %eax, %eax
cmovg   %edx, %r8d
movl%r8d, %eax
ret
Which should be exactly what you wanted or very close.
There looks to be a few micro-optimizations needed still really.

[Bug c++/101921] G++ cannot find a template function with lambda as default template argument inside a template

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101921

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2021-08-15
Summary|G++ cannot find a template  |G++ cannot find a template
   |function with lambda as |function with lambda as
   |default template argument   |default template argument
   ||inside a template
 Status|UNCONFIRMED |NEW
   Keywords||rejects-valid

--- Comment #1 from Andrew Pinski  ---
if foo was not a template function, then bar would work.

Confirmed.

[Bug c++/101921] New: G++ cannot find a template function with lambda as default template argument

2021-08-15 Thread fchelnokov at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101921

Bug ID: 101921
   Summary: G++ cannot find a template function with lambda as
default template argument
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fchelnokov at gmail dot com
  Target Milestone: ---

Compilation of this valid code:
```
template void bar() {}
void foo(auto) { bar(); }
```

results in error:
```
error: no matching function for call to 'bar()'
2 | void foo(auto) { bar(); }
  |  ~~~^~
note: candidate: 'template void bar()'
1 | template void bar() {}
  | ^~~
note:   template argument deduction/substitution failed:
```

Other compilers accept it: https://gcc.godbolt.org/z/9GsPo8Pnb

[Bug middle-end/37443] fast 64-bit divide by constant on 32-bit platform

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37443

Andrew Pinski  changed:

   What|Removed |Added

  Build|i686-pc-cygwin  |
   Host|i686-pc-cygwin  |

--- Comment #5 from Andrew Pinski  ---
Must be a cost model issue because I can get /10u working but not /1220703125u
(in GCC 11+).

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug rtl-optimization/97282] division done twice for modulo and divsion for 128-bit integers

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97282

Andrew Pinski  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |11.0

--- Comment #5 from Andrew Pinski  ---
All fixed for GCC 11 by the patches for PR 97459 .

[Bug middle-end/89256] No optimized division by constant for __int128

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89256

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |11.0
 Resolution|--- |FIXED

--- Comment #3 from Andrew Pinski  ---
This is implemented in GCC 11+.
Note the cost of doing /1000 is too high so a call still happens. if you do
/100, you get the inlined.

[Bug target/84759] Calculation of quotient and remainder with constant denominator uses __umoddi3+__udivdi3 instead of __udivmoddi4

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84759

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=97459

--- Comment #3 from Andrew Pinski  ---
In GCC 11+, we expand the divide and mod by a constant.
Which was implemented by r11-5533.

I think we can close this as fixed for GCC 11+ with the expansion happening
inline.

[Bug target/60900] ICE: in emit_library_call_value_1, at calls.c:4187 with -mabi=ms -mlong-double-128

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60900

Andrew Pinski  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org

--- Comment #2 from Andrew Pinski  ---
*** Bug 82727 has been marked as a duplicate of this bug. ***

[Bug target/82727] ICE with -mabi=ms -mlong-double-128 and conversion from long double to double inside a sysv_abi marked function

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82727

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Andrew Pinski  ---
Dup of bug 60900.

*** This bug has been marked as a duplicate of bug 60900 ***

[Bug target/82883] eax register unnecessary consumed

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82883

--- Comment #1 from Andrew Pinski  ---
With -mtune=intel -O3, we produce:

movl$1819043144, (%rdi)
movw$8303, 4(%rdi)
ret

So it looks like a target tuning issue.

[Bug target/82730] extra store/reload of an XMM for every byte extracted

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82730

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Severity|normal  |enhancement
 Ever confirmed|0   |1
   Last reconfirmed||2021-08-15

--- Comment #1 from Andrew Pinski  ---
Note the gimple level looks good:

  _20 = BIT_FIELD_REF ;
  _1 = (int) _20;
  _21 = BIT_FIELD_REF ;
  _2 = (int) _21;
  _22 = BIT_FIELD_REF ;
  _3 = (int) _22;
  _23 = BIT_FIELD_REF ;
  _4 = (int) _23;
  _24 = BIT_FIELD_REF ;
  _5 = (int) _24;
  _25 = BIT_FIELD_REF ;
  _6 = (int) _25;
  _26 = BIT_FIELD_REF ;
  _7 = (int) _26;
  _27 = BIT_FIELD_REF ;
  _8 = (int) _27;
  _28 = BIT_FIELD_REF ;
  _9 = (int) _28;
  _29 = BIT_FIELD_REF ;
  _10 = (int) _29;
  _30 = BIT_FIELD_REF ;
  _11 = (int) _30;
  _31 = BIT_FIELD_REF ;
  _12 = (int) _31;
  _32 = BIT_FIELD_REF ;
  _13 = (int) _32;
  _33 = BIT_FIELD_REF ;
  _14 = (int) _33;
  _34 = BIT_FIELD_REF ;
  _15 = (int) _34;
  _35 = BIT_FIELD_REF ;
  _16 = (int) _35;
- CUT 

It is the way extractions are done for bytes is not good.
Note MSVC is the only one which does extractions in a register only and not do
a store to the stack.

[Bug target/82727] ICE in emit_library_call_value_1, at calls.c:4975

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82727

--- Comment #2 from Andrew Pinski  ---
Note -mabi=ms -mlong-double-128 is enough to reproduce the ICE.

[Bug target/82727] ICE in emit_library_call_value_1, at calls.c:4975

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82727

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2021-08-15

--- Comment #1 from Andrew Pinski  ---
Reduced testcase:
double __attribute__ ((sysv_abi))
func_native (long double a)
{
  return a;
}
 CUT --
Compile with -mabi=ms -mbionic

[Bug target/46357] Unnecessary movzx instruction

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46357

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #4 from Andrew Pinski  ---
I had meant to close this.  Basically there is a new pass added for GCC 4.7.0
which removes the redundant zero/sign extends.

[Bug target/46357] Unnecessary movzx instruction

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46357

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Target Milestone|--- |4.7.0
   Last reconfirmed||2021-08-15

--- Comment #3 from Andrew Pinski  ---
Fixed for GCC 4.7.0+ by r0-114134.

[Bug target/81813] Inefficient stack pointer adjustment

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81813

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2021-08-15

--- Comment #4 from Andrew Pinski  ---
So what is happening is reload produces:
(insn 136 135 187 11 (set (reg:DI 0 ax [orig:102 _38 ] [102])
(mem/v/c:DI (plus:DI (reg/f:DI 7 sp)
(const_int 32 [0x20])) [2 MEM[(volatile __u64 *) + 24B]+0
S8 A64])) "./include/linux/compiler.h":276 85 {*movdi_internal}
 (nil))
(insn 187 136 138 11 (set (reg:DI 2 cx [124])
(plus:DI (reg/f:DI 7 sp)
(const_int 8 [0x8]))) "fs/fs_pin.c":64 218 {*leadi}
 (nil))
(insn 138 187 139 11 (parallel [
(set (reg/f:DI 1 dx [122])
(plus:DI (reg:DI 2 cx [124])
(const_int 24 [0x18])))
(clobber (reg:CC 17 flags))
]) "fs/fs_pin.c":64 222 {*adddi_1}
 (nil))
(insn 139 138 140 11 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int 8 [0x8])))
(clobber (reg:CC 17 flags))
]) "fs/fs_pin.c":64 222 {*adddi_1}
 (expr_list:REG_ARGS_SIZE (const_int 0 [0])
(nil)))

Notice how cx is being used.

And then post_reload produces:
(insn 187 136 138 11 (set (reg:DI 2 cx [124])
(plus:DI (reg/f:DI 7 sp)
(const_int 8 [0x8]))) "fs/fs_pin.c":64 218 {*leadi}
 (nil))
(insn 138 187 139 11 (parallel [
(set (reg/f:DI 1 dx [122])
(plus:DI (reg/f:DI 7 sp)
(const_int 32 [0x20])))
(clobber (reg:CC 17 flags))
]) "fs/fs_pin.c":64 222 {*adddi_1}
 (nil))
(insn 139 138 140 11 (set (reg/f:DI 7 sp)
(reg:DI 2 cx [124])) "fs/fs_pin.c":64 85 {*movdi_internal}
 (expr_list:REG_ARGS_SIZE (const_int 0 [0])
(nil)))

But I don't understand why it did not prop (plus (reg/f:DI 7 sp)   
(const_int 8 [0x8])) into insn 139 and remove insn 187.

I think this is an issue for LRA/IRA really in the first place. We did not need
to push the variable to the stack in the first place as we are going to
Rematerialize the value after the pop anyways.


So Vlad might want to debug this to make sure this is not a latent bug.

[Bug target/81813] Inefficient stack pointer adjustment

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81813

--- Comment #3 from Andrew Pinski  ---
There are a 3 places where the 
calllock_acquire
calldebug_lockdep_rcu_enabled
movq32(%rsp), %rax
popq%rdx

Pattern exists and in GCC 7-8, only one of the 3 has the expanded pop for some
reason.

[Bug target/81813] Inefficient stack pointer adjustment

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81813

Andrew Pinski  changed:

   What|Removed |Added

  Known to fail||8.1.0

--- Comment #2 from Andrew Pinski  ---
In GCC 8.5 we had:

pushq   %r12
.cfi_def_cfa_offset 88
movl$2, %ecx
xorl%edx, %edx
xorl%r9d, %r9d
xorl%r8d, %r8d
xorl%esi, %esi
movl$rcu_lock_map, %edi
calllock_acquire
calldebug_lockdep_rcu_enabled
movq32(%rsp), %rax
leaq8(%rsp), %rcx
leaq32(%rsp), %rdx
movq%rcx, %rsp
.cfi_def_cfa_offset 80
cmpq%rax, %rdx
jne .L58


In GCC 9.1 (and the trunk) we have:

pushq   %r13
.cfi_def_cfa_offset 96
xorl%edx, %edx
xorl%r9d, %r9d
xorl%r8d, %r8d
movl$2, %ecx
xorl%esi, %esi
movl$rcu_lock_map, %edi
calllock_acquire
calldebug_lockdep_rcu_enabled
movq32(%rsp), %rax
popq%rdx
.cfi_def_cfa_offset 88
cmpq%rax, %rbx
jne .L58

[Bug c++/81760] attribute target uses the wrong default function argument

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81760

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2021-08-15
  Component|target  |c++
 Status|UNCONFIRMED |NEW
 CC||pinskia at gcc dot gnu.org

--- Comment #1 from Andrew Pinski  ---
Confirmed, I think this should have been rejected.

[Bug tree-optimization/50417] [9/10/11/12 regression]: memcpy with known alignment

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417

Andrew Pinski  changed:

   What|Removed |Added

 CC||dragan.mladjenovic at syrmia 
dot c
   ||om

--- Comment #34 from Andrew Pinski  ---
*** Bug 101920 has been marked as a duplicate of this bug. ***

[Bug middle-end/101920] memcpy expansion treats unknown pointers as unaligned

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101920

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
Dup of bug 50417.

*** This bug has been marked as a duplicate of bug 50417 ***

[Bug target/81496] AVX load from adjacent memory location followed by concatenation

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81496

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2021-08-15
 Ever confirmed|0   |1
   Severity|normal  |enhancement
 Status|UNCONFIRMED |NEW

--- Comment #6 from Andrew Pinski  ---
The first 2 examples (the __int128 ones) are due to:
(insn 4 3 5 2 (set (reg:TI 85)
(subreg:TI (reg:DI 86) 0)) "/app/example.cpp":8:31 -1
 (nil))
(insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8)
(reg:DI 87)) "/app/example.cpp":8:31 -1
 (nil))
Or rather:
(insn 11 8 12 2 (set (reg:V4DI 91)
(vec_concat:V4DI (subreg:V2DI (reg/v:TI 84 [ x ]) 0)
(subreg:V2DI (reg/v:TI 88 [ y ]) 0))) "/app/example.cpp":8:40 -1
 (nil))

clang produces interesting results too.
They sometimes do vpunpcklqdq and other times do vpinsrd

[Bug middle-end/101920] New: memcpy expansion treats unknown pointers as unaligned

2021-08-15 Thread dragan.mladjenovic at syrmia dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101920

Bug ID: 101920
   Summary: memcpy expansion treats unknown pointers as unaligned
   Product: gcc
   Version: 11.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dragan.mladjenovic at syrmia dot com
  Target Milestone: ---

I guess it is easiest to observe this on Aarch64 with the following code:

#include 

void test (int *dst, int *src)
{
(void)memcpy (dst, src, sizeof *src);
}


With -O1 -mno-strict-align we get:

ldr w1, [x1]
str w1, [x0]
ret

With -O1 -mstrict-align we get:

ldrbw2, [x1]
strbw2, [x0]
ldrbw2, [x1, 1]
strbw2, [x0, 1]
ldrbw2, [x1, 2]
strbw2, [x0, 2]
ldrbw1, [x1, 3]
strbw1, [x0, 3]
ret

Or with Os:

mov x2, 4
b   memcpy

It seems that builtins.c:get_pointer_alignment finds empty SSA_NAME_PTR_INFO
for both pointers and defaults to 8-bit alignment. This can be worked around by
applying __builtin_assume_alligned to both src and dest.

  1   2   >