[Bug target/27619] wrong code for mixed-mode division with -mpowerpc64 -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27619 Andrew Pinski changed: What|Removed |Added Resolution|INVALID |MOVED See Also||https://sourceware.org/bugz ||illa/show_bug.cgi?id=14758
[Bug target/27619] wrong code for mixed-mode division with -mpowerpc64 -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27619 Andrew Pinski changed: What|Removed |Added CC||vincent-gcc at vinc17 dot net --- Comment #19 from Andrew Pinski --- *** Bug 58429 has been marked as a duplicate of this bug. ***
[Bug target/58429] _Decimal64 support is broken on powerpc64 with the mode32 ABI (-m32 -mpowerpc64)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58429 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Andrew Pinski --- This is most likely a dup of bug 27619 which was a bug in binutils. *** This bug has been marked as a duplicate of bug 27619 ***
[Bug testsuite/101902] [12 regression] g++.dg/warn/uninit-1.C has excess errors after r12-2898
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101902 --- Comment #1 from Jan Hubicka --- Hi, i am testing diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c index 5d7bc800419..d89ab5423cd 100644 --- a/gcc/tree-ssa-uninit.c +++ b/gcc/tree-ssa-uninit.c @@ -641,7 +641,7 @@ maybe_warn_pass_by_reference (gcall *stmt, wlimits ) wlims.always_executed = false; /* Ignore args we are not going to read from. */ - if (gimple_call_arg_flags (stmt, argno - 1) & EAF_UNUSED) + if (gimple_call_arg_flags (stmt, argno - 1) & (EAF_UNUSED | EAF_NOREAD)) continue; tree arg = gimple_call_arg (stmt, argno - 1);
[Bug bootstrap/53468] debian/ubuntu changed the location of libraries on the disk which broke bootstrap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53468 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |4.8.0 Keywords||build --- Comment #4 from Andrew Pinski --- Fixed a long time ago by r0-120287.
[Bug target/101929] r12-2549 regress x264_r by 4% on CLX.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929 --- Comment #2 from Hongtao.liu --- W/o accurate info provided by vectorizer, the backend can do nothing about this regression except reverting the patch, that's why i marked the bugzilla ad tree-optimization component.
[Bug tree-optimization/101929] r12-2549 regress x264_r by 4% on CLX.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929 --- Comment #1 from Hongtao.liu --- Considering this, I'm debating whether to revert my patch.
[Bug tree-optimization/101929] New: r12-2549 regress x264_r by 4% on CLX.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101929 Bug ID: 101929 Summary: r12-2549 regress x264_r by 4% on CLX. Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com CC: hjl.tools at gmail dot com, wwwhhhyyy333 at gmail dot com Target Milestone: --- Host: x86_64-pc-linux-gnu Target: x86_64-*-* i?86-*-* The regression is in x264_pixel_satd_8x4 typedef unsigned char uint8_t; typedef unsigned int uint32_t; typedef unsigned short uint16_t; // in: a pseudo-simd number of the form x+(y<<16) // return: abs(x)+(abs(y)<<16) static inline uint32_t abs2( uint32_t a ) { uint32_t s = ((a>>15)&0x10001)*0x; return (a+s)^s; } #define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\ int t0 = s0 + s1;\ int t1 = s0 - s1;\ int t2 = s2 + s3;\ int t3 = s2 - s3;\ d0 = t0 + t2;\ d2 = t0 - t2;\ d1 = t1 + t3;\ d3 = t1 - t3;\ } int x264_pixel_satd_8x4( uint8_t *pix1, int i_pix1, uint8_t *pix2, int i_pix2 ) { uint32_t tmp[4][4]; uint32_t a0, a1, a2, a3; int sum = 0; for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 ) { a0 = (pix1[0] - pix2[0]) + ((pix1[4] - pix2[4]) << 16); a1 = (pix1[1] - pix2[1]) + ((pix1[5] - pix2[5]) << 16); a2 = (pix1[2] - pix2[2]) + ((pix1[6] - pix2[6]) << 16); a3 = (pix1[3] - pix2[3]) + ((pix1[7] - pix2[7]) << 16); HADAMARD4( tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3], a0,a1,a2,a3 ); } for( int i = 0; i < 4; i++ ) { HADAMARD4( a0, a1, a2, a3, tmp[0][i], tmp[1][i], tmp[2][i], tmp[3][i] ); sum += abs2(a0) + abs2(a1) + abs2(a2) + abs2(a3); } return (((uint16_t)sum) + ((uint32_t)sum>>16)) >> 1; } after increase cost of vector CTOR, slp1 won't vector for below git diff my.slp1 original.slp1 - _820 = {_187, _189, _187, _189}; - vect_t2_188.65_821 = VIEW_CONVERT_EXPR(_820); - vect__200.67_823 = vect_t0_184.64_819 - vect_t2_188.65_821; - vect__191.66_822 = vect_t0_184.64_819 + vect_t2_188.65_821; - _824 = VEC_PERM_EXPR ; - vect__192.68_825 = VIEW_CONVERT_EXPR(_824); t3_190 = (int) _189; _191 = t0_184 + t2_188; _192 = (unsigned int) _191; + tmp[0][0] = _192; _194 = t0_184 - t2_188; _195 = (unsigned int) _194; + tmp[0][2] = _195; _197 = t1_186 + t3_190; _198 = (unsigned int) _197; + tmp[0][1] = _198; _200 = t1_186 - t3_190; _201 = (unsigned int) _200; - MEM [(unsigned int *)] = vect__192.68_825; + tmp[0][3] = _201; but the vectorized version can somehow help fre to eliminate redundant vector load and then got even better performace. git diff dump.veclower21 dump.fre5 MEM [(unsigned int *) + 48B] = vect__54.89_852; - vect__63.9_482 = MEM [(unsigned int *)]; - vect__64.12_478 = MEM [(unsigned int *) + 16B]; - vect__65.13_477 = vect__63.9_482 + vect__64.12_478; + vect__65.13_477 = vect__192.68_825 + vect__273.75_834; vect_t0_100.14_476 = VIEW_CONVERT_EXPR(vect__65.13_477); - vect__67.15_475 = vect__63.9_482 - vect__64.12_478; + vect__67.15_475 = vect__192.68_825 - vect__273.75_834; vect_t1_101.16_474 = VIEW_CONVERT_EXPR(vect__67.15_475); - vect__68.19_470 = MEM [(unsigned int *) + 32B]; - vect__69.22_466 = MEM [(unsigned int *) + 48B]; - vect__70.23_465 = vect__68.19_470 + vect__69.22_466; + vect__70.23_465 = vect__354.82_843 + vect__54.89_852; If slp1 can realize this and add the upper part to comparison of scalar cost vs vector cost, gcc should do vectorization, but currently it doesn't.
[Bug debug/101928] New: Incorrect argument list for varardic functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101928 Bug ID: 101928 Summary: Incorrect argument list for varardic functions Product: gcc Version: 11.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: liyd2021 at gmail dot com Target Milestone: --- Affected versions: gcc 11.1.0 with gdb (Ubuntu 20.04.2) (terminal) $ cat simple.c && gcc -g -O2 simple.c static void varargs(int q0, int q1, ...) { va_list ap; va_start(ap, q1); } int main() { varargs(0, 1, 2); } (terminal) $ cat run.gdb b varargs r ptype varargs q (terminal) $ gdb -x run.gdb a.out Breakpoint 1, varargs (q0=0, q1=1, q1=1, q0=0) at simple.c:2 2 static void varargs(int q0, int q1, ...) { type = void (int, int, int, int) <-- BUG, duplicated arguments Compile with O0/Og will not trigger this behavior. The static for `varargs` is also required. LLDB rejected this debug info: (terminal) $ lldb a.out (lldb) b varargs error: simple {0x000c}: DIE has DW_AT_ranges(0xc) attribute, but range extraction failed (missing or invalid range list table), please file a bug and attach the file at the start of this error message
[Bug target/101927] New: There is no vector mode popcount for aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101927 Bug ID: 101927 Summary: There is no vector mode popcount for aarch64 Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64 Take: #include #include size_t hd (const uint8_t *restrict a, const uint8_t *restrict b, size_t l) { size_t r = 0, x; for (x = 0; x < l; x++) r += __builtin_popcount (a[x] ^ b[x]); return r; } at -O3 we don't vectorize this. Clang/LLVM does: .LBB0_5:// =>This Inner Loop Header: Depth=1 ld1 { v3.b }[0], [x8] sub x12, x8, #2 ld1 { v5.b }[0], [x10] ld1 { v4.b }[0], [x12] sub x12, x10, #2 ld1 { v6.b }[0], [x12] add x12, x8, #1 ld1 { v3.b }[4], [x12] add x12, x10, #1 ld1 { v5.b }[4], [x12] sub x12, x8, #1 ld1 { v4.b }[4], [x12] sub x12, x10, #1 ld1 { v6.b }[4], [x12] eor v3.8b, v5.8b, v3.8b ushll v3.2d, v3.2s, #0 and v3.16b, v3.16b, v1.16b eor v4.8b, v6.8b, v4.8b ushll v4.2d, v4.2s, #0 and v4.16b, v4.16b, v1.16b cnt v3.16b, v3.16b cnt v4.16b, v4.16b uaddlp v3.8h, v3.16b uaddlp v4.8h, v4.16b uaddlp v3.4s, v3.8h uaddlp v4.4s, v4.8h add x8, x8, #4 subsx11, x11, #4 uadalp v2.2d, v3.4s uadalp v0.2d, v4.4s add x10, x10, #4 b.ne.LBB0_5 -- CUT Note I think we could be better.
[Bug tree-optimization/68109] GCC fails to vectorize popcount on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68109 Andrew Pinski changed: What|Removed |Added Component|target |tree-optimization --- Comment #2 from Andrew Pinski --- Could there be generic support for popcount added?
[Bug tree-optimization/54978] Add ability to provide vectorized functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54978 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |6.0 Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #4 from Andrew Pinski --- Fixed in GCC 6 with r6-4931.
[Bug tree-optimization/47860] is vectorization of "condition in nested loop" supported
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47860 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2021-08-16 --- Comment #6 from Andrew Pinski --- Confirmed, ICC is able to vectorize this loop even without AVX (GCC can do the vectorize the loop currently with AVX). movdqa%xmm0, %xmm11 #10.11 lea 1(%r14), %r15d#9.31 movups(%rdx,%r15,8), %xmm9 #9.27 movups(%rcx,%r14,8), %xmm10 #10.24 cmpltpd %xmm1, %xmm10 #10.24 pxor %xmm2, %xmm10 #10.24 movmskpd %xmm10, %r15d #10.24 testl %r15d, %r15d #10.24 je..B1.14 # Prob 50% #10.24 # LOE rax rdx rcx rbx rsi rdi ebp r8d r9d r10d r11d r12d r13d r14d xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 ..B1.13:# Preds ..B1.12 # Execution count [1.25e+01] pshufd$8, %xmm10, %xmm11#10.24 movaps%xmm9, %xmm8 #5.21 pand %xmm6, %xmm11 #10.24 # LOE rax rdx rcx rbx rsi rdi ebp r8d r9d r10d r11d r12d r13d r14d xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 xmm8 xmm11 ..B1.14:# Pre
[Bug rtl-optimization/46391] false dependencies are computed after vectorization (#2)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46391 Andrew Pinski changed: What|Removed |Added Keywords||alias, missed-optimization Blocks||53947 --- Comment #4 from Andrew Pinski --- I suspect this has been long fixed. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug other/36395] TARGET_VECTOR_ALIGNMENT_REACHABLE isn't documented
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36395 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |4.5.0 Status|NEW |RESOLVED --- Comment #3 from Andrew Pinski --- It was renamed to TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE and documented r0-98468.
[Bug middle-end/101926] [meta-bug] struct/complex argument passing and return should be improved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101926 Bug 101926 depends on bug 88496, which changed state. Bug 88496 Summary: Unnecessary stack adjustment with -mavx512f https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88496 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE
[Bug target/88483] Unnecessary stack alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88483 --- Comment #7 from H.J. Lu --- *** Bug 88496 has been marked as a duplicate of this bug. ***
[Bug target/88496] Unnecessary stack adjustment with -mavx512f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88496 H.J. Lu changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #3 from H.J. Lu --- Dup. *** This bug has been marked as a duplicate of bug 88483 ***
[Bug middle-end/31271] Missing simple optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31271 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Known to work|4.7.1 | --- Comment #2 from Andrew Pinski --- We produce in 4.7.0+ in_canforward(unsigned int): .LFB0: .cfi_startproc andl$224, %edi xorl%eax, %eax cmpl$224, %edi setne %al ret That is: D.2201_1 = in_2(D) & 224; D.2199_10 = D.2201_1 != 224; I think we could do slightly better ((~in_2(D)) & 224) == 0 But only at exand time. This gives: notl%edi xorl%eax, %eax testb $-32, %dil setne %al Or for aarch64: mov w8, #224 bicswzr, w8, w0 csetw0, ne ret
[Bug middle-end/90216] Stack Pointer decrementing even when not loading extra data to stack.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90216 --- Comment #3 from Andrew Pinski --- Testcase: #include template struct Neighbourhood { using datatype = DataType; }; template struct Building { using datatype = typename N::datatype; operator datatype() const{ return (static_cast(this)->contam_level); } void operator=(datatype const x) volatile { static_cast(this)->contam_level = x; } }; struct Cincin : public Neighbourhood { struct Apartment : Building{ using Building::operator=; Apartment(Apartment volatile & a) : contam_level(a.contam_level) {} Apartment(uint32_t const c = 0) : contam_level (c) {} union { struct { uint32_t laundry :3; uint32_t :5; uint32_t lobby :8; uint32_t :16; }; uint32_t contam_level; }; Apartment& Laundry(uint32_t val){ this->laundry = val; return *this; } }; Apartment volatile apartment; }; Cincin cincin[2]; int main(){ cincin[0].apartment = Cincin::Apartment().Laundry(0x7); //cincin[0].apartment = Cincin::Apartment(0x7); //cincin[0].apartment = 9; }
[Bug middle-end/101926] New: [meta-bug] struct/complex argument passing and return should be improved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101926 Bug ID: 101926 Summary: [meta-bug] struct/complex argument passing and return should be improved Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- There are many of these bugs, for x86_64, aarch64, powerpc, etc.
[Bug middle-end/95756] Failure to optimize memory operations with _Complex
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95756 Andrew Pinski changed: What|Removed |Added Last reconfirmed|2020-06-19 00:00:00 |2021-8-15 Component|rtl-optimization|middle-end Severity|normal |enhancement
[Bug fortran/101871] Array of strings of different length passed as an argument produces invalid result.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101871 --- Comment #6 from Steve Kargl --- On Sun, Aug 15, 2021 at 07:21:42PM +, anlauf at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101871 > > --- Comment #5 from anlauf at gcc dot gnu.org --- > In array.c:gfc_match_array_constructor there's the following code: > > 1335 /* Walk the constructor, and if possible, do type conversion for > 1336 numeric types. */ > 1337 if (gfc_numeric_ts ()) > 1338{ > 1339 m = walk_array_constructor (, head); > 1340 if (m == MATCH_ERROR) > 1341return m; > 1342} > > Steve, you were the last one to work on this block. > It appears that non-numeric ts are not handled (here). > Can you give some insight? > Unfortunately, I can't remember why it's confined to numeric types. I did the simply thing of commenting out the if-stmt and got an ICE. I also tried explicitly setting the typespec of each array element to the typespec of array constructor and that also ICE'd. I haven't had time to polk further. I think at some point the actual arg list is reduced to a formal argument list. This might loose the array constructor typespec when reducing/resolving the arg list.
[Bug middle-end/87650] suboptimal codegen for testing low bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87650 --- Comment #2 from Andrew Pinski --- (In reply to Andrew Pinski from comment #1) > If I saw these two statements: s/saw/swap/ I don't know why I wrote the wrong word there. I was thinking swap and still wrote saw.
[Bug middle-end/87650] suboptimal codegen for testing low bit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87650 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Component|target |middle-end Ever confirmed|0 |1 Last reconfirmed||2021-08-16 Severity|normal |enhancement --- Comment #1 from Andrew Pinski --- If I saw these two statements: auto m = n%2; n = n/2; GCC is able to produce the testb. This is due to the shift clobbering the flags. I wonder if we could produce better code during expand.
[Bug middle-end/78947] sub-optimal code for (bool)(int ? int : int)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78947 Andrew Pinski changed: What|Removed |Added Known to fail||8.5.0 Known to work||9.1.0 --- Comment #2 from Andrew Pinski --- (In reply to Andrew Pinski from comment #1) > Confirmed, this is fold-cost folding: > (bool)(a?b:c) > into a ? (bool) b : (bool)c; > early. This was removed in GCC 9.
[Bug tree-optimization/101925] reversed storage order when compiling with -O3 only
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101925 Andrew Pinski changed: What|Removed |Added Component|c |tree-optimization Last reconfirmed||2021-08-15 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Keywords||wrong-code --- Comment #1 from Andrew Pinski --- Looks like the SLP vectorizer this. -O3 -fno-tree-slp-vectorize works.
[Bug c/101925] New: reversed storage order when compiling with -O3 only
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101925 Bug ID: 101925 Summary: reversed storage order when compiling with -O3 only Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: george.thopas at gmail dot com Target Milestone: --- /* reversed storage order when compiling with -O3 only. this time sets up an all big-endian struct from an all little-endian one no warnings Target: x86_64-pc-linux-gnu gcc versie 11.1.0 (Gentoo 11.1.0-r2 p3) gcc-trunk too $ gcc -Wall -Wextra -O3 test.c $ ./a.out Abort */ #define BIG_ENDIAN __attribute__((scalar_storage_order("big-endian"))) /* host order version (little endian)*/ struct _ip6_addr { union { char addr8[16]; int addr32[4]; } u; }; typedef struct _ip6_addr t_ip6_addr; struct _net_addr { char is_v4; union { intaddr; t_ip6_addr addr6; } u; }; typedef struct _net_addr t_net_addr; /* big endian version */ struct _be_ip6_addr { union { char addr8[16]; } BIG_ENDIAN u; } BIG_ENDIAN; typedef struct _be_ip6_addr t_be_ip6_addr; struct _be_net_addr { char is_v4; union { t_be_ip6_addr addr6; int addr; } BIG_ENDIAN u; } BIG_ENDIAN; typedef struct _be_net_addr t_be_net_addr; /* convert */ t_be_ip6_addr be_ip6_addr(const t_ip6_addr ip6) { t_be_ip6_addr rc = { .u.addr8[0] = ip6.u.addr8[0], .u.addr8[1] = ip6.u.addr8[1], .u.addr8[2] = ip6.u.addr8[2], .u.addr8[3] = ip6.u.addr8[3], .u.addr8[4] = ip6.u.addr8[4], .u.addr8[5] = ip6.u.addr8[5], .u.addr8[6] = ip6.u.addr8[6], .u.addr8[7] = ip6.u.addr8[7], .u.addr8[8] = ip6.u.addr8[8], .u.addr8[9] = ip6.u.addr8[9], .u.addr8[10] = ip6.u.addr8[10], .u.addr8[11] = ip6.u.addr8[11], .u.addr8[12] = ip6.u.addr8[12], .u.addr8[13] = ip6.u.addr8[13], .u.addr8[14] = ip6.u.addr8[14], .u.addr8[15] = ip6.u.addr8[15], }; return rc; } t_be_net_addr be_net_addr(const t_net_addr ip) { t_be_net_addr rc = {.is_v4 = ip.is_v4 }; if (ip.is_v4) { rc.u.addr = ip.u.addr; } else { rc.u.addr6 = be_ip6_addr(ip.u.addr6); } return rc; } int main(void) { t_be_net_addr out = { }; t_net_addr in = { .is_v4 = 0, .u.addr6.u.addr8 = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 } }; out = be_net_addr(in); // actually first 4 bytes are swapped if (in.u.addr6.u.addr8[0] != out.u.addr6.u.addr8[0]) __builtin_abort(); return 0; }
[Bug middle-end/80261] Worse code generated compared to clang with modulus operation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80261 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Status|UNCONFIRMED |NEW Component|target |middle-end Last reconfirmed||2021-08-15 Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Confirmed: ptr.0_1 = (long int) ptr_5(D); _4 = ptr.0_1 & 4294967295; _2 = (long unsigned int) _4; _3 = _2 % 131; We could do the %131 in 32bits while we do it for 64bit. The second issue is we expand out *131 which might be ok.
[Bug middle-end/80006] loss of range information due to spurious widening conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80006 --- Comment #6 from Andrew Pinski --- > On x86_64, this conversion from signed char to int is for some reason > performed even in function f, so the test program triggers no warnings. Oh yes the promotion happens because of a target hook.
[Bug target/31667] Integer extensions vectorization could be improved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31667 --- Comment #5 from Andrew Pinski --- We produce this now: movdqa x(%rip), %xmm1 pxor%xmm0, %xmm0 movdqa %xmm1, %xmm2 punpckhbw %xmm0, %xmm1 movaps %xmm1, y+16(%rip) movdqa x+16(%rip), %xmm1 punpcklbw %xmm0, %xmm2 movaps %xmm2, y(%rip) movdqa %xmm1, %xmm2 punpckhbw %xmm0, %xmm1 movaps %xmm1, y+48(%rip) movdqa x+32(%rip), %xmm1 punpcklbw %xmm0, %xmm2 movaps %xmm2, y+32(%rip) movdqa %xmm1, %xmm2 punpckhbw %xmm0, %xmm1 movaps %xmm1, y+80(%rip) movdqa x+48(%rip), %xmm1 punpcklbw %xmm0, %xmm2 movaps %xmm2, y+64(%rip) movdqa %xmm1, %xmm2 punpckhbw %xmm0, %xmm1 punpcklbw %xmm0, %xmm2 movaps %xmm1, y+112(%rip) movaps %xmm2, y+96(%rip) And even ICC produce a similar thing except scheduled differently.
[Bug tree-optimization/78327] Improve VRP for ranges for compares which do ranges of [-TYPE_MAX + N, N]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78327 Andrew Pinski changed: What|Removed |Added Known to fail||10.3.0 Known to work||11.1.0 --- Comment #8 from Andrew Pinski --- I suspect r11-4134 fixed the issue for GCC 11.
[Bug tree-optimization/64567] missed optimization: redundant test before clearing bit(s)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64567 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=96237 --- Comment #4 from Andrew Pinski --- related to PR 96237, maybe the same.
[Bug tree-optimization/60575] inefficient vectorization of compare into bytes on amd64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60575 --- Comment #1 from Andrew Pinski --- We produce now since GCC 5+: .L4: movdqu (%rsi,%rax,2), %xmm0 movdqu 16(%rsi,%rax,2), %xmm1 pcmpgtw %xmm4, %xmm0 pcmpgtw %xmm4, %xmm1 pand%xmm3, %xmm0 pand%xmm3, %xmm1 pand%xmm2, %xmm0 pand%xmm2, %xmm1 packuswb%xmm1, %xmm0 movups %xmm0, (%rdi,%rax) addq$16, %rax cmpq$1024, %rax jne .L4 Note I removed __builtin_assume_aligned. Also I note there are two extra pand's. The second pand is not needed.
[Bug target/91569] Optimisation test case and unnecessary XOR-OR pair instead of MOV.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91569 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |11.0 Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #4 from Andrew Pinski --- Fixed.
[Bug tree-optimization/63271] Should commute arithmetic with vector load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63271 --- Comment #3 from Andrew Pinski --- So the two functions are not the same (because __m128i is Vector of 2 long long [at least now]). Here is a better testcase: #define vector __attribute__((vector_size(16))) typedef vector char __m128i ; static inline __m128i _mm_set_epi8(char a, char b, char c, char d, char e, char f, char g, char h, char i, char j, char k, char l, char m, char n, char o, char p) { return (__m128i){a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p}; } __m128i foo(char C) { return _mm_set_epi8( 0,C, 2*C, 3*C, 4*C, 5*C, 6*C, 7*C, 8*C, 9*C, 10*C, 11*C, 12*C, 13*C, 14*C, 15*C); } __m128i bar(char C) { __m128i v = _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15); vector unsigned char d = (vector unsigned char)v; d *= C; return (__m128i)d; } -CUT So take the above, on aarch64 SLP does not do it because it does not recongize 0 and C as being able to SLPed. If I change them to be both to 2*C, then SLP will do the right thing.
[Bug target/91569] Optimisation test case and unnecessary XOR-OR pair instead of MOV.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91569 --- Comment #3 from H.J. Lu --- It is fixed by r11-165.
[Bug tree-optimization/51780] Missed optimization for ==/!= comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51780 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |8.0 Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #3 from Andrew Pinski --- Fixed with r8-3771.
[Bug tree-optimization/51780] Missed optimization for ==/!= comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51780 Andrew Pinski changed: What|Removed |Added Known to work||8.1.0 Known to fail||7.5.0 --- Comment #2 from Andrew Pinski --- _1 = ar[a_4(D)]; _2 = _1 != 0; _6 = (int) _2;
[Bug target/91569] Optimisation test case and unnecessary XOR-OR pair instead of MOV.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91569 Andrew Pinski changed: What|Removed |Added Known to work||11.1.0 --- Comment #2 from Andrew Pinski --- For test3 GCC 11+ produces: opt_test3(int): .LFB2: .cfi_startproc movslq %edi, %rax movb$4, %ah ret Which is exactly what you should see. Gimple level looks the same between GCC 10 and GCC 11: _1 = (long int) num_2(D); a = _1; MEM[(char *) + 1B] = 4; _6 = a; I suspect a decl no longer has address taken so it is not going to the stack right away.
[Bug target/94871] Failure to convert cmpeqpd+pxor with -1 into cmpneqpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94871 --- Comment #2 from Andrew Pinski --- v2di cmpneq_pd1(v2df a, v2df b) { return ((v2di)(a==b) ^ set1_epi8(0xFF)); } Produces the correct thing on gimple level: _5 = .VCOND (a_2(D), b_3(D), { 0, 0 }, { -1, -1 }, 113); But the RTL during combine (even with -ffast-math) produces: (set (reg:V2DI 82 [ ]) (not:V2DI (eq:V2DI (reg:V2DF 89) (reg:V2DF 90
[Bug target/88712] Optimization: mov edx, 0 not replaced with xor edx, edx in this case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88712 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Known to work||11.1.0 Target Milestone|--- |11.0 Known to fail||10.3.0 --- Comment #3 from Andrew Pinski --- (In reply to Jakub Jelinek from comment #2) > so the clearing of %edx is sandwiched in between the cmp and cmov. Later on > in this case sched2 reorders those and so we at that point could replace it, > but we don't have another peephole2 pass and passes after sched2 don't have > the needed infrastructure to check if the flags are dead (because movl $0, > reg doesn't clobber flags, but xorl reg, reg does). Right and the peephole2 that implemented that was done in r11-2588 so closing as fixed in GCC 11+.
[Bug tree-optimization/96697] Failure to optimize mod+div to 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96697 --- Comment #6 from Jakub Jelinek --- For signed x and y, x % y == x % -y, x % y has the sign of x. So for x in non-negative you can use x % y < abs(y) and generally -abs(y) < x % y < abs(y)
[Bug modula2/101387] Unconditional use of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101387 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Andrew Pinski --- .
[Bug modula2/101388] Unconditional use of __MAX_BAUD
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101388 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Andrew Pinski --- .
[Bug tree-optimization/85366] Failure to use both div and mod results of one IDIV in a prime-factor loop while(n%i==0) { n/=i; }
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85366 Andrew Pinski changed: What|Removed |Added Depends on||96697 --- Comment #4 from Andrew Pinski --- n_17 = n_24 / i_30; _3 = n_17 % i_30; So basically PR 96697. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96697 [Bug 96697] Failure to optimize mod+div to 0
[Bug tree-optimization/96697] Failure to optimize mod+div to 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96697 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Keywords||missed-optimization Last reconfirmed|2020-08-25 00:00:00 |2021-8-15
[Bug sanitizer/95244] [10 Regression] GCC 10 no longer builds on RHEL5 [trivial patch]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95244 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |10.4 Summary|GCC 10 no longer builds on |[10 Regression] GCC 10 no |RHEL5 [trivial patch] |longer builds on RHEL5 ||[trivial patch] --- Comment #4 from Andrew Pinski --- Fixed in GCC 11 by a merge from upstream; r11-781.
[Bug c++/101904] Wrong result of decltype during instantiation of std::result_of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101904 --- Comment #2 from Mikhail Kremniov --- I see, thanks. But I must mention that Clang is able to compile this code somehow.
[Bug sanitizer/95244] GCC 10 no longer builds on RHEL5 [trivial patch]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95244 nightstrike changed: What|Removed |Added CC||nightstrike at gmail dot com --- Comment #3 from nightstrike --- That link leads to: https://reviews.llvm.org/D80648 Which was approved, and eventually merged into gcc here at 3c6331c2 and later improved with 0b997f6e. This can be marked as a regression fixed in 11.1 but still in 10 as of 10.3.
[Bug fortran/101871] Array of strings of different length passed as an argument produces invalid result.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101871 --- Comment #5 from anlauf at gcc dot gnu.org --- In array.c:gfc_match_array_constructor there's the following code: 1335 /* Walk the constructor, and if possible, do type conversion for 1336 numeric types. */ 1337 if (gfc_numeric_ts ()) 1338{ 1339 m = walk_array_constructor (, head); 1340 if (m == MATCH_ERROR) 1341return m; 1342} Steve, you were the last one to work on this block. It appears that non-numeric ts are not handled (here). Can you give some insight?
[Bug target/100293] MinGW-w64 of nvptx offload engine fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100293 --- Comment #9 from Brecht Sanders --- Any update on this? Issue still exists today (in GCC 11.2.0 and in latest snapshot 11.2.1-20210814). Both when building gcc on Windows for nvptx as well as the offload engine for nvptx there is an error like this in nvptx-none/libatomic/config.log: configure:3736: $? = 1 configure:3756: checking whether the C compiler works configure:3778: /R/winlibs64_stage/nvptx-gcc-11-20210814/gcc-11-20210814/build_nvptx_gcc/./gcc/xgcc -B/R/winlibs64_stage/nvptx-gcc-11-20210814/gcc-11-20210814/build_nvptx_gcc/./gcc/ -nostdinc -B/R/winlibs64_stage/nvptx-gcc-11-20210814/gcc-11-20210814/build_nvptx_gcc/nvptx-none/newlib/ -isystem /R/winlibs64_stage/nvptx-gcc-11-20210814/gcc-11-20210814/build_nvptx_gcc/nvptx-none/newlib/targ-include -isystem /R/winlibs64_stage/nvptx-gcc-11-20210814/gcc-11-20210814/newlib/libc/include -B/R/winlibs64_stage/inst_nvptx-gcc-11-20210814/share/nvptx-gcc/nvptx-none/bin/ -B/R/winlibs64_stage/inst_nvptx-gcc-11-20210814/share/nvptx-gcc/nvptx-none/lib/ -isystem /R/winlibs64_stage/inst_nvptx-gcc-11-20210814/share/nvptx-gcc/nvptx-none/include -isystem /R/winlibs64_stage/inst_nvptx-gcc-11-20210814/share/nvptx-gcc/nvptx-none/sys-include -g -O2 conftest.c >&5 error reading C:\Temp\ccqpNwjZ.o collect2.exe: error: ld returned 1 exit status
[Bug fortran/99351] ICE in gfc_finish_var_decl, at fortran/trans-decl.c:695
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99351 --- Comment #3 from CVS Commits --- The master branch has been updated by Harald Anlauf : https://gcc.gnu.org/g:bbf19f9c20515da9fcd23f08c8139427374e8d77 commit r12-2915-gbbf19f9c20515da9fcd23f08c8139427374e8d77 Author: Harald Anlauf Date: Sun Aug 15 20:13:11 2021 +0200 Fortran: fix checks for STAT= and ERRMSG= arguments of SYNC ALL/SYNC IMAGES gcc/fortran/ChangeLog: PR fortran/99351 * match.c (sync_statement): Replace %v code by %e in gfc_match to allow for function references as STAT and ERRMSG arguments. * resolve.c (resolve_sync): Adjust checks of STAT= and ERRMSG= to being definable arguments. Function references with a data pointer result are accepted. * trans-stmt.c (gfc_trans_sync): Adjust assertion. gcc/testsuite/ChangeLog: PR fortran/99351 * gfortran.dg/coarray_sync.f90: New test. * gfortran.dg/coarray_3.f90: Adjust error messages.
[Bug libstdc++/101923] std::function's move ctor is slower than the copy one for empty source objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923 Nikita Kniazev changed: What|Removed |Added CC||nok.raven at gmail dot com --- Comment #2 from Nikita Kniazev --- There is no difference in the produced code on trunk (except move ops order) https://godbolt.org/z/esfjhr9ae
[Bug ada/101924] /usr/ccs/bin/ld: Unsatisfied symbols: U_get_unwind_entry, U_IS_STUB_OR_CALLX, U_get_shLib_text_addr, U_is_shared_pc, U_init_frame_record, U_prep_frame_rec_for_unwind, U_get_shLib_unw_
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101924 John David Anglin changed: What|Removed |Added Summary|/usr/ccs/bin/ld:|/usr/ccs/bin/ld: |Unsatisfied symbols |Unsatisfied symbols: |referenced |U_get_unwind_entry, ||U_IS_STUB_OR_CALLX, ||U_get_shLib_text_addr, ||U_is_shared_pc, ||U_init_frame_record, ||U_prep_frame_rec_for_unwind ||, U_get_shLib_unw_tbl, ||U_get_previous_frame_x and ||U_get_unwind_table --- Comment #1 from John David Anglin --- g++ -std=c++11 -no-pie -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwi nd-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wmiss ing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic -macros -Wno-overlength-strings -DHAVE_CONFIG_H -static-libstdc++ -static-libgcc -o gnatbind -g ada/b_gnatb.o ada/ali-util.o ada/ali.o ada/alloc.o ada/aspects. o ada/atree.o ada/bcheck.o ada/binde.o ada/binderr.o ada/bindgen.o ada/bindo.o a da/bindo-augmentors.o ada/bindo-builders.o ada/bindo-diagnostics.o ada/bindo-ela borators.o ada/bindo-graphs.o ada/bindo-units.o ada/bindo-validators.o ada/bindo -writers.o ada/bindusg.o ada/butil.o ada/casing.o ada/csets.o ada/debug.o ada/ei nfo-entities.o ada/einfo-utils.o ada/einfo.o ada/elists.o ada/err_vars.o ada/err out.o ada/erroutc.o ada/exit.o ada/final.o ada/fmap.o ada/fname-uf.o ada/fname.o ada/gnatbind.o ada/gnatvsn.o ada/hostparm.o ada/krunch.o ada/lib.o ada/link.o a da/namet.o ada/nlists.o ada/opt.o ada/osint-b.o ada/osint.o ada/output.o ada/res trict.o ada/rident.o ada/scans.o ada/scil_ll.o ada/scng.o ada/sdefault.o ada/sei nfo.o ada/sem_aux.o ada/sinfo.o ada/sinfo-nodes.o ada/sinfo-utils.o ada/sinput-c .o ada/sinput.o ada/snames.o ada/stand.o ada/stringt.o ada/style.o ada/styleg.o ada/stylesw.o ada/switch-b.o ada/switch.o ada/table.o ada/targparm.o ada/types.o ada/uintp.o ada/uname.o ada/urealp.o ada/widechar.o ada/gnat.o ada/g-dynhta.o a da/g-lists.o ada/g-graphs.o ada/g-sets.o ada/s-casuti.o ada/s-os_lib.o ada/s-res fil.o ada/s-utf_32.o ada/adaint.o ada/argv.o ada/cio.o ada/cstreams.o ada/env.o ada/errno.o ada/targext.o ada/version.o ggc-none.o libcommon-target.a libcommon. a ../libcpp/libcpp.a ../libbacktrace/.libs/libbacktrace.a ../libiberty/libiber ty.a ../libdecnumber/libdecnumber.a /home/opt/gnu/gcc/gcc-8/lib/gcc/hppa2.0w-h p-hpux11.11/8.5.0/adalib/libgnat.a /usr/ccs/bin/ld: Unsatisfied symbols: U_get_unwind_entry (first referenced in /home/opt/gnu/gcc/gcc-8/lib/gcc/hppa2 .0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code) U_IS_STUB_OR_CALLX (first referenced in /home/opt/gnu/gcc/gcc-8/lib/gcc/hppa2 .0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code) U_get_shLib_text_addr (first referenced in /home/opt/gnu/gcc/gcc-8/lib/gcc/hp pa2.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code) U_is_shared_pc (first referenced in /home/opt/gnu/gcc/gcc-8/lib/gcc/hppa2.0w- hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code) U_init_frame_record (first referenced in /home/opt/gnu/gcc/gcc-8/lib/gcc/hppa 2.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code) U_prep_frame_rec_for_unwind (first referenced in /home/opt/gnu/gcc/gcc-8/lib/ gcc/hppa2.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code) U_get_shLib_unw_tbl (first referenced in /home/opt/gnu/gcc/gcc-8/lib/gcc/hppa 2.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code) U_get_previous_frame_x (first referenced in /home/opt/gnu/gcc/gcc-8/lib/gcc/h ppa2.0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code) U_get_unwind_table (first referenced in /home/opt/gnu/gcc/gcc-8/lib/gcc/hppa2 .0w-hp-hpux11.11/8.5.0/adalib/libgnat.a(s-traceb.o)) (code) collect2: error: ld returned 1 exit status make[3]: *** [../../gcc/gcc/ada/gcc-interface/Make-lang.in:746: gnatbind] Error 1 make[3]: *** Waiting for unfinished jobs This was introduced by the following change: commit abcf5174979bcb91ac4c921eaa19a5b37f231ae4 (HEAD, refs/bisect/bad) Author: Arnaud Charlet Date: Wed Jan 13 08:49:15 2021 -0500 [Ada] Use runtime from base compiler during stage1 gcc/ada/ * Make-generated.in: Add rule to copy runtime files needed during stage1. * raise.c: Remove obsolete symbols used during bootstrap. * gcc-interface/Make-lang.in: Do not use libgnat sources during stage1. (GNAT_ADA_OBJS, GNATBIND_OBJS): Split in two
[Bug ada/101924] New: /usr/ccs/bin/ld: Unsatisfied symbols referenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101924 Bug ID: 101924 Summary: /usr/ccs/bin/ld: Unsatisfied symbols referenced Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: danglin at gcc dot gnu.org Target Milestone: --- Host: hppa2.0w-hp-hpux11.11 Target: hppa2.0w-hp-hpux11.11 Build: hppa2.0w-hp-hpux11.11
Re: [PATCH] lib: bitmap: Mute some odd section mismatch warning in xtensa kernel build
On Sun, Aug 15, 2021 at 03:21:32PM +1200, Barry Song wrote: > From: Barry Song > > Constanly there are some section mismatch issues reported in test_bitmap > for xtensa platform such as: > > Section mismatch in reference from the function bitmap_equal() to the > variable .init.data:initcall_level_names > The function bitmap_equal() references the variable __initconst > __setup_str_initcall_blacklist. This is often because bitmap_equal > lacks a __initconst annotation or the annotation of > __setup_str_initcall_blacklist is wrong. > > Section mismatch in reference from the function bitmap_copy_clear_tail() > to the variable .init.rodata:__setup_str_initcall_blacklist > The function bitmap_copy_clear_tail() references the variable __initconst > __setup_str_initcall_blacklist. > This is often because bitmap_copy_clear_tail lacks a __initconst > annotation or the annotation of __setup_str_initcall_blacklist is wrong. > > To be honest, hardly to believe kernel code is wrong since bitmap_equal is > always called in __init function in test_bitmap.c just like __bitmap_equal. > But gcc doesn't report any issue for __bitmap_equal even when bitmap_equal > and __bitmap_equal show in the same function such as: > > static void noinline __init test_mem_optimisations(void) > { > ... > for (start = 0; start < 1024; start += 8) { > for (nbits = 0; nbits < 1024 - start; nbits += 8) { > if (!bitmap_equal(bmap1, bmap2, 1024)) { > failed_tests++; > } > if (!__bitmap_equal(bmap1, bmap2, 1024)) { > failed_tests++; > } > ... > } > } > } > > The different between __bitmap_equal() and bitmap_equal() is that the > former is extern and a EXPORT_SYMBOL. So noinline, and probably in fact > noclone. But the later is static and unfortunately not inlined at this > time though it has a "inline" flag. > > bitmap_copy_clear_tail(), on the other hand, seems more innocent as it is > accessing stack only by its wrapper bitmap_from_arr32() in function > test_bitmap_arr32(): > static void __init test_bitmap_arr32(void) > { > unsigned int nbits, next_bit; > u32 arr[EXP1_IN_BITS / 32]; > DECLARE_BITMAP(bmap2, EXP1_IN_BITS); > > memset(arr, 0xa5, sizeof(arr)); > > for (nbits = 0; nbits < EXP1_IN_BITS; ++nbits) { > bitmap_to_arr32(arr, exp1, nbits); > bitmap_from_arr32(bmap2, arr, nbits); > expect_eq_bitmap(bmap2, exp1, nbits); > ... > } > } > Looks like gcc optimized arr, bmap2 things to .init.data but it seems > nothing is wrong in kernel since test_bitmap_arr32() is __init. > > Max Filippov reported a bug to gcc but gcc people don't ack. So here > this patch removes the involved symbols by forcing inline. It might > not be that elegant but I don't see any harm as bitmap_equal() and > bitmap_copy_clear_tail() are both quite small. In addition, kernel > doc also backs this modification "We don't use the 'inline' keyword > because it's broken": www.kernel.org/doc/local/inline.html This is a 2006 article. Are you sure nothing has been changed over the last 15 years? > Another possible way to "fix" the warning is moving the involved > symboms to lib/bitmap.c: So, it's a GCC issue already reported to GCC? For me it sounds like nothing to fix in kernel. If I was a GCC developer, I'd prefer to have all bugs clearly reproducible. Let's wait for GCC and xtensa people comments. (CC xtensa and GCC lists) Yury > +int bitmap_equal(const unsigned long *src1, > + const unsigned long *src2, unsigned int nbits) > +{ > + if (small_const_nbits(nbits)) > + return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits)); > + if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) && > + IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT)) > + return !memcmp(src1, src2, nbits / 8); > + return __bitmap_equal(src1, src2, nbits); > +} > +EXPORT_SYMBOL(bitmap_equal); > > This is harmful to the performance. > > Reported-by: kernel test robot > Cc: Andy Shevchenko > Cc: Max Filippov > Cc: Andrew Pinski > Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92938 > Signed-off-by: Barry Song > --- > include/linux/bitmap.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h > index 37f36dad18bd..3eec9f68a0b6 100644 > --- a/include/linux/bitmap.h > +++ b/include/linux/bitmap.h > @@ -258,7 +258,7 @@ static inline void bitmap_copy(unsigned long *dst, const > unsigned long *src, > /* > * Copy bitmap and clear tail bits in last word. > */ > -static inline void bitmap_copy_clear_tail(unsigned long *dst, > +static
[Bug fortran/101918] LTO type mismatches for runtime library functions in mixed -fdefault-real-8 projects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101918 kargl at gcc dot gnu.org changed: What|Removed |Added CC||kargl at gcc dot gnu.org --- Comment #2 from kargl at gcc dot gnu.org --- (In reply to Rimvydas (RJ) from comment #0) > $ gfortran -Wall -Wextra -flto -fdefault-real-8 -c foo.f90 > $ gfortran -flto -Wall -Wextra foo.o bar.f90 This should be closed as WONTFIX. If you compile foo.f90 with -fdefault-real-8, then you must compile bar.f90 with -fdefault-real-8. You're changing the ABI for foo.f90, but not bar.f90. > Does this mean -flto cannot be used in mixed -fdefault-real-8 > and usual modes? It means "Don't use -fdefault-real-8". It is a broken unfixable option that I tried to remove years ago, but that was voted down. If you have code that requires this option, then the code should be properly ported to REAL(8).
[Bug modula2/101387] Unconditional use of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101387 --- Comment #1 from Gaius Mulley --- many thanks for the bug report - now fixed in the git repro. The bugfix emits a prototype for throw (if required) rather than use a non portable header file.
[Bug target/82883] eax register unnecessary consumed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82883 H.J. Lu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #2 from H.J. Lu --- It is done on purpose by r0-116075: [hjl@gnu-cfl-2 tmp]$ gcc -S -O2 x.c -mtune-ctrl=^lcp_stall [hjl@gnu-cfl-2 tmp]$ cat x.s .file "x.c" .text .p2align 4 .globl foo .type foo, @function foo: .LFB0: .cfi_startproc movl$1819043144, (%rdi) movw$8303, 4(%rdi) ret .cfi_endproc .LFE0: .size foo, .-foo .ident "GCC: (GNU) 11.2.1 20210728 (Red Hat 11.2.1-1)" .section.note.GNU-stack,"",@progbits [hjl@gnu-cfl-2 tmp]$
[Bug libstdc++/101923] std::function's move ctor is slower than the copy one for empty source objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923 --- Comment #1 from Petar Ivanov --- Benchmark code (using Google Benchmark): #include #include #include struct Car {}; static void copy(benchmark::State& state) { for (auto _ : state) { const auto f = std::function{}; const auto copied = f; benchmark::DoNotOptimize(copied); } } static void move(benchmark::State& state) { for (auto _ : state) { auto f = std::function{}; const auto moved = std::move(f); benchmark::DoNotOptimize(moved); } } BENCHMARK(copy); BENCHMARK(move); BENCHMARK_MAIN();
[Bug modula2/101388] Unconditional use of __MAX_BAUD
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101388 --- Comment #1 from Gaius Mulley --- "ro at gcc dot gnu.org" writes: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101388 > > Bug ID: 101388 >Summary: Unconditional use of __MAX_BAUD >Product: gcc >Version: 12.0 > Status: UNCONFIRMED > Severity: normal > Priority: P3 > Component: modula2 > Assignee: unassigned at gcc dot gnu.org > Reporter: ro at gcc dot gnu.org > CC: gaiusmod2 at gmail dot com > Target Milestone: --- > Target: *-*-solaris2.11 > > Building the devel/modula-2 branch on Solaris 11 fails with undefined > references > to __MAX_BAUD in two places: > > /vol/gcc/src/git/modula-2/gcc/m2/mc-boot-ch/Gtermios.c: In function > 'termios_GetFlag': > /vol/gcc/src/git/modula-2/gcc/m2/mc-boot-ch/Gtermios.c:872:27: error: > '__MAX_BAUD' undeclared (first use in this function) >*b = ((t->c_cflag & __MAX_BAUD) == __MAX_BAUD); >^~ > > /vol/gcc/src/git/modula-2/gcc/m2/gm2-libs-ch/termios.c: In function > 'termios_GetFlag': > /vol/gcc/src/git/modula-2/gcc/m2/gm2-libs-ch/termios.c:877:27: error: > '__MAX_BAUD' undeclared (first use in this function) > 877 | *b = ((t->c_cflag & __MAX_BAUD) == __MAX_BAUD); > | ^~ > __MAX_BAUD seems to be Linux/glibc specific, but the current problem is > obviously > cause by a wrong guard which checks for defined(MAX) instead of > defined(__MAX_BAUD). > > Correcting this lets the build continue. many thanks for the report - now fixed in the git repro, regards, Gaius
[Bug libstdc++/101923] New: std::function's move ctor is slower than the copy one for empty source objects
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101923 Bug ID: 101923 Summary: std::function's move ctor is slower than the copy one for empty source objects Product: gcc Version: 9.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: dartdart26 at gmail dot com Target Milestone: --- std::function's move constructor calls swap() irrespective of whether the source object is empty or not. In contrast, the copy constructor first checks if the source object is empty and if it is, nothing is being done as the `this` object is constructed in an empty state by _Function_base(). Calling swap() on an empty source requires more work, because some data needs to be copied - for example, the POD data cannot be moved. Could the move constructor check if the source is empty too, as the copy one does? Please let me know if I am missing a rule that prevents that. I have noticed that on version 9.3.0, but I see the code is the same in current master at: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/bits/std_function.h;hb=c22bcfd2f7dc9bb5ad394720f4a612327dc898ba#l391 I have tested on a MacBook M1 and the copy ctor for empty sources is almost 2x faster than the move ctor: - Benchmark Time CPU Iterations - copy0.945 ns0.945 ns555789159 move 1.83 ns 1.83 ns382183169 I have made an YouTube video for describing my findings and the benchmark results: https://www.youtube.com/watch?v=WA3mKab-tn8
[Bug target/91591] Arc: ICE in trunc_int_for_mode, at explow.c:60
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91591 --- Comment #4 from Giulio Benetti --- This bug is pretty old and need to retest if it still shows up. Maybe it’s been fixed with gcc minor versions. I will let you know.
[Bug target/101922] mips: illegal instruction at -O3 with -mmsa -mloongson-mmi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101922 --- Comment #1 from Xi Ruoyao --- Technically the testcase above invokes UB, but this is reduced from a file in openssl-1.1.1k.
[Bug target/101922] New: mips: illegal instruction at -O3 with -mmsa -mloongson-mmi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101922 Bug ID: 101922 Summary: mips: illegal instruction at -O3 with -mmsa -mloongson-mmi Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: xry111 at mengyan1223 dot wang Target Milestone: --- $ cat test.c int x = 0x; char d[16]; void f() { int i; for (i = 0; i < 16; i++) { int t = d[i] >> 8; x &= t; } } $ ~/git-repos/gcc-test-mips/gcc/cc1 test.c -O3 -mmsa -mloongson-mmi -nostdinc $ mips64el-unknown-linux-gnu-as test.s -mmsa -mloongson-mmi -mips64r2 test.s: Assembler messages: test.s:29: Error: operand 3 out of range `srai.b $w0,$w0,8'
[Bug middle-end/17958] expand_divmod fails to optimize division of 64-bit quantity by small constant when BITS_PER_WORD is 32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17958 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |11.0 Status|ASSIGNED|RESOLVED --- Comment #4 from Andrew Pinski --- Implemented by r11-5533, r11-5614 (PPC improvement), and r11-5648.
[Bug target/61030] PowerPC 128 bit integer divide
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61030 Andrew Pinski changed: What|Removed |Added Depends on||100809 --- Comment #3 from Andrew Pinski --- PR 100809 implemented udivti3, divti3, umodti3, and modti3. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100809 [Bug 100809] PPC: __int128 divide/modulo does not use P10 instructions vdivsq/vdivuq
[Bug middle-end/101521] -ftrapv should become something like -fsanitize=undefined -fsanitize-undefined-trap-on-error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101521 Andrew Pinski changed: What|Removed |Added Depends on||78473 --- Comment #4 from Andrew Pinski --- the request for a division overflow function is PR 78473. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78473 [Bug 78473] Enhancement request: __builtin_div_overflow
[Bug c++/51178] FAIL: g++.dg/lookup/builtin5.C scan-assembler _ZSt5atanhd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51178 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |4.8.0 Component|target |c++ --- Comment #2 from Andrew Pinski --- Fixed by r0-120302.
[Bug c++/101873] Compilation error of valid code with return local variable in C++20 mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101873 --- Comment #4 from Fedor Chelnokov --- If this question is to me, then actually I am not absolutely sure. I initially thought that GCC was right in this code example. But later a high reputy C++ expert from stackoverflow dissuaded me. According to him, the code example was valid in C++17 and remains valid in C++20. C++ standard is so complex nowdays.
[Bug libstdc++/57691] freestanding libstdc++ has compile error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57691 Bug 57691 depends on bug 57699, which changed state. Bug 57699 Summary: Disable empty parameter list misinterpretation in libstdc++ headers when !defined(NO_IMPLICIT_EXTERN_C) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57699 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug c++/57699] Disable empty parameter list misinterpretation in libstdc++ headers when !defined(NO_IMPLICIT_EXTERN_C)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57699 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |9.0 Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #9 from Andrew Pinski --- This was removed in GCC 9 by r9-2724.
[Bug target/37727] NO_IMPLICIT_EXTERN_C for newlib
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37727 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |9.0 Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #3 from Andrew Pinski --- Fixed in GCC 9 by r9-1648.
[Bug middle-end/48580] missed optimization: integer overflow checks
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=59708 CC||pinskia at gcc dot gnu.org --- Comment #23 from Andrew Pinski --- Also the builtins have been in GCC since GCC 5; r5-4844, PR 59708.
[Bug middle-end/91072] does not reduce the size of a division by a constant on non-negative int / small unsigned long constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91072 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug middle-end/48580] missed optimization: integer overflow checks
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48580 --- Comment #22 from Andrew Pinski --- For the original testcase in comment #0 we produce (in GCC 11+): movl%edi, %eax mull%esi seto%dl xorl%r8d, %r8d movzbl %dl, %edx testl %eax, %eax jle .L1 testl %edx, %edx sete%r8b .L1: movl%r8d, %eax ret --- CUT I have a patch which I think improves the code even more. The gimple level looks like this correctly: x.0_1 = (unsigned int) x_6(D); y.1_2 = (unsigned int) y_7(D); _11 = .MUL_OVERFLOW (x.0_1, y.1_2); tmp_8 = REALPART_EXPR <_11>; tmp.3_3 = (int) tmp_8; if (tmp.3_3 > 0) goto ; [59.00%] else goto ; [41.00%] [local count: 633507680]: _12 = IMAGPART_EXPR <_11>; _10 = _12 == 0; [local count: 1073741824]: # iftmp.2_5 = PHI <_10(3), 0(2)> Notice no divide. The _12 == 0 part really should just _12 ^ 1. After my patch (which I need to finish up) we get: movl%edi, %eax mull%esi seto%dl xorl%r8d, %r8d movzbl %dl, %edx xorl$1, %edx testl %eax, %eax cmovg %edx, %r8d movl%r8d, %eax ret Which should be exactly what you wanted or very close. There looks to be a few micro-optimizations needed still really.
[Bug c++/101921] G++ cannot find a template function with lambda as default template argument inside a template
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101921 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2021-08-15 Summary|G++ cannot find a template |G++ cannot find a template |function with lambda as |function with lambda as |default template argument |default template argument ||inside a template Status|UNCONFIRMED |NEW Keywords||rejects-valid --- Comment #1 from Andrew Pinski --- if foo was not a template function, then bar would work. Confirmed.
[Bug c++/101921] New: G++ cannot find a template function with lambda as default template argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101921 Bug ID: 101921 Summary: G++ cannot find a template function with lambda as default template argument Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: fchelnokov at gmail dot com Target Milestone: --- Compilation of this valid code: ``` template void bar() {} void foo(auto) { bar(); } ``` results in error: ``` error: no matching function for call to 'bar()' 2 | void foo(auto) { bar(); } | ~~~^~ note: candidate: 'template void bar()' 1 | template void bar() {} | ^~~ note: template argument deduction/substitution failed: ``` Other compilers accept it: https://gcc.godbolt.org/z/9GsPo8Pnb
[Bug middle-end/37443] fast 64-bit divide by constant on 32-bit platform
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37443 Andrew Pinski changed: What|Removed |Added Build|i686-pc-cygwin | Host|i686-pc-cygwin | --- Comment #5 from Andrew Pinski --- Must be a cost model issue because I can get /10u working but not /1220703125u (in GCC 11+).
[Bug rtl-optimization/97459] __uint128_t remainder for division by 3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |11.0
[Bug rtl-optimization/97282] division done twice for modulo and divsion for 128-bit integers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97282 Andrew Pinski changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED Target Milestone|--- |11.0 --- Comment #5 from Andrew Pinski --- All fixed for GCC 11 by the patches for PR 97459 .
[Bug middle-end/89256] No optimized division by constant for __int128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89256 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Target Milestone|--- |11.0 Resolution|--- |FIXED --- Comment #3 from Andrew Pinski --- This is implemented in GCC 11+. Note the cost of doing /1000 is too high so a call still happens. if you do /100, you get the inlined.
[Bug target/84759] Calculation of quotient and remainder with constant denominator uses __umoddi3+__udivdi3 instead of __udivmoddi4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84759 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |11.0 Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=97459 --- Comment #3 from Andrew Pinski --- In GCC 11+, we expand the divide and mod by a constant. Which was implemented by r11-5533. I think we can close this as fixed for GCC 11+ with the expansion happening inline.
[Bug target/60900] ICE: in emit_library_call_value_1, at calls.c:4187 with -mabi=ms -mlong-double-128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60900 Andrew Pinski changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #2 from Andrew Pinski --- *** Bug 82727 has been marked as a duplicate of this bug. ***
[Bug target/82727] ICE with -mabi=ms -mlong-double-128 and conversion from long double to double inside a sysv_abi marked function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82727 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #3 from Andrew Pinski --- Dup of bug 60900. *** This bug has been marked as a duplicate of bug 60900 ***
[Bug target/82883] eax register unnecessary consumed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82883 --- Comment #1 from Andrew Pinski --- With -mtune=intel -O3, we produce: movl$1819043144, (%rdi) movw$8303, 4(%rdi) ret So it looks like a target tuning issue.
[Bug target/82730] extra store/reload of an XMM for every byte extracted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82730 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Severity|normal |enhancement Ever confirmed|0 |1 Last reconfirmed||2021-08-15 --- Comment #1 from Andrew Pinski --- Note the gimple level looks good: _20 = BIT_FIELD_REF ; _1 = (int) _20; _21 = BIT_FIELD_REF ; _2 = (int) _21; _22 = BIT_FIELD_REF ; _3 = (int) _22; _23 = BIT_FIELD_REF ; _4 = (int) _23; _24 = BIT_FIELD_REF ; _5 = (int) _24; _25 = BIT_FIELD_REF ; _6 = (int) _25; _26 = BIT_FIELD_REF ; _7 = (int) _26; _27 = BIT_FIELD_REF ; _8 = (int) _27; _28 = BIT_FIELD_REF ; _9 = (int) _28; _29 = BIT_FIELD_REF ; _10 = (int) _29; _30 = BIT_FIELD_REF ; _11 = (int) _30; _31 = BIT_FIELD_REF ; _12 = (int) _31; _32 = BIT_FIELD_REF ; _13 = (int) _32; _33 = BIT_FIELD_REF ; _14 = (int) _33; _34 = BIT_FIELD_REF ; _15 = (int) _34; _35 = BIT_FIELD_REF ; _16 = (int) _35; - CUT It is the way extractions are done for bytes is not good. Note MSVC is the only one which does extractions in a register only and not do a store to the stack.
[Bug target/82727] ICE in emit_library_call_value_1, at calls.c:4975
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82727 --- Comment #2 from Andrew Pinski --- Note -mabi=ms -mlong-double-128 is enough to reproduce the ICE.
[Bug target/82727] ICE in emit_library_call_value_1, at calls.c:4975
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82727 Andrew Pinski changed: What|Removed |Added Keywords||ice-on-valid-code Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2021-08-15 --- Comment #1 from Andrew Pinski --- Reduced testcase: double __attribute__ ((sysv_abi)) func_native (long double a) { return a; } CUT -- Compile with -mabi=ms -mbionic
[Bug target/46357] Unnecessary movzx instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46357 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #4 from Andrew Pinski --- I had meant to close this. Basically there is a new pass added for GCC 4.7.0 which removes the redundant zero/sign extends.
[Bug target/46357] Unnecessary movzx instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46357 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Target Milestone|--- |4.7.0 Last reconfirmed||2021-08-15 --- Comment #3 from Andrew Pinski --- Fixed for GCC 4.7.0+ by r0-114134.
[Bug target/81813] Inefficient stack pointer adjustment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81813 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2021-08-15 --- Comment #4 from Andrew Pinski --- So what is happening is reload produces: (insn 136 135 187 11 (set (reg:DI 0 ax [orig:102 _38 ] [102]) (mem/v/c:DI (plus:DI (reg/f:DI 7 sp) (const_int 32 [0x20])) [2 MEM[(volatile __u64 *) + 24B]+0 S8 A64])) "./include/linux/compiler.h":276 85 {*movdi_internal} (nil)) (insn 187 136 138 11 (set (reg:DI 2 cx [124]) (plus:DI (reg/f:DI 7 sp) (const_int 8 [0x8]))) "fs/fs_pin.c":64 218 {*leadi} (nil)) (insn 138 187 139 11 (parallel [ (set (reg/f:DI 1 dx [122]) (plus:DI (reg:DI 2 cx [124]) (const_int 24 [0x18]))) (clobber (reg:CC 17 flags)) ]) "fs/fs_pin.c":64 222 {*adddi_1} (nil)) (insn 139 138 140 11 (parallel [ (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int 8 [0x8]))) (clobber (reg:CC 17 flags)) ]) "fs/fs_pin.c":64 222 {*adddi_1} (expr_list:REG_ARGS_SIZE (const_int 0 [0]) (nil))) Notice how cx is being used. And then post_reload produces: (insn 187 136 138 11 (set (reg:DI 2 cx [124]) (plus:DI (reg/f:DI 7 sp) (const_int 8 [0x8]))) "fs/fs_pin.c":64 218 {*leadi} (nil)) (insn 138 187 139 11 (parallel [ (set (reg/f:DI 1 dx [122]) (plus:DI (reg/f:DI 7 sp) (const_int 32 [0x20]))) (clobber (reg:CC 17 flags)) ]) "fs/fs_pin.c":64 222 {*adddi_1} (nil)) (insn 139 138 140 11 (set (reg/f:DI 7 sp) (reg:DI 2 cx [124])) "fs/fs_pin.c":64 85 {*movdi_internal} (expr_list:REG_ARGS_SIZE (const_int 0 [0]) (nil))) But I don't understand why it did not prop (plus (reg/f:DI 7 sp) (const_int 8 [0x8])) into insn 139 and remove insn 187. I think this is an issue for LRA/IRA really in the first place. We did not need to push the variable to the stack in the first place as we are going to Rematerialize the value after the pop anyways. So Vlad might want to debug this to make sure this is not a latent bug.
[Bug target/81813] Inefficient stack pointer adjustment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81813 --- Comment #3 from Andrew Pinski --- There are a 3 places where the calllock_acquire calldebug_lockdep_rcu_enabled movq32(%rsp), %rax popq%rdx Pattern exists and in GCC 7-8, only one of the 3 has the expanded pop for some reason.
[Bug target/81813] Inefficient stack pointer adjustment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81813 Andrew Pinski changed: What|Removed |Added Known to fail||8.1.0 --- Comment #2 from Andrew Pinski --- In GCC 8.5 we had: pushq %r12 .cfi_def_cfa_offset 88 movl$2, %ecx xorl%edx, %edx xorl%r9d, %r9d xorl%r8d, %r8d xorl%esi, %esi movl$rcu_lock_map, %edi calllock_acquire calldebug_lockdep_rcu_enabled movq32(%rsp), %rax leaq8(%rsp), %rcx leaq32(%rsp), %rdx movq%rcx, %rsp .cfi_def_cfa_offset 80 cmpq%rax, %rdx jne .L58 In GCC 9.1 (and the trunk) we have: pushq %r13 .cfi_def_cfa_offset 96 xorl%edx, %edx xorl%r9d, %r9d xorl%r8d, %r8d movl$2, %ecx xorl%esi, %esi movl$rcu_lock_map, %edi calllock_acquire calldebug_lockdep_rcu_enabled movq32(%rsp), %rax popq%rdx .cfi_def_cfa_offset 88 cmpq%rax, %rbx jne .L58
[Bug c++/81760] attribute target uses the wrong default function argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81760 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2021-08-15 Component|target |c++ Status|UNCONFIRMED |NEW CC||pinskia at gcc dot gnu.org --- Comment #1 from Andrew Pinski --- Confirmed, I think this should have been rejected.
[Bug tree-optimization/50417] [9/10/11/12 regression]: memcpy with known alignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417 Andrew Pinski changed: What|Removed |Added CC||dragan.mladjenovic at syrmia dot c ||om --- Comment #34 from Andrew Pinski --- *** Bug 101920 has been marked as a duplicate of this bug. ***
[Bug middle-end/101920] memcpy expansion treats unknown pointers as unaligned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101920 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Andrew Pinski --- Dup of bug 50417. *** This bug has been marked as a duplicate of bug 50417 ***
[Bug target/81496] AVX load from adjacent memory location followed by concatenation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81496 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2021-08-15 Ever confirmed|0 |1 Severity|normal |enhancement Status|UNCONFIRMED |NEW --- Comment #6 from Andrew Pinski --- The first 2 examples (the __int128 ones) are due to: (insn 4 3 5 2 (set (reg:TI 85) (subreg:TI (reg:DI 86) 0)) "/app/example.cpp":8:31 -1 (nil)) (insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8) (reg:DI 87)) "/app/example.cpp":8:31 -1 (nil)) Or rather: (insn 11 8 12 2 (set (reg:V4DI 91) (vec_concat:V4DI (subreg:V2DI (reg/v:TI 84 [ x ]) 0) (subreg:V2DI (reg/v:TI 88 [ y ]) 0))) "/app/example.cpp":8:40 -1 (nil)) clang produces interesting results too. They sometimes do vpunpcklqdq and other times do vpinsrd
[Bug middle-end/101920] New: memcpy expansion treats unknown pointers as unaligned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101920 Bug ID: 101920 Summary: memcpy expansion treats unknown pointers as unaligned Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: dragan.mladjenovic at syrmia dot com Target Milestone: --- I guess it is easiest to observe this on Aarch64 with the following code: #include void test (int *dst, int *src) { (void)memcpy (dst, src, sizeof *src); } With -O1 -mno-strict-align we get: ldr w1, [x1] str w1, [x0] ret With -O1 -mstrict-align we get: ldrbw2, [x1] strbw2, [x0] ldrbw2, [x1, 1] strbw2, [x0, 1] ldrbw2, [x1, 2] strbw2, [x0, 2] ldrbw1, [x1, 3] strbw1, [x0, 3] ret Or with Os: mov x2, 4 b memcpy It seems that builtins.c:get_pointer_alignment finds empty SSA_NAME_PTR_INFO for both pointers and defaults to 8-bit alignment. This can be worked around by applying __builtin_assume_alligned to both src and dest.