[Bug c++/100261] New: [11/12 Regression] ICE: tree check: expected var_decl or type_decl, have error_mark in emit_tinfo_decl, at cp/rtti.c:1643
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100261 Bug ID: 100261 Summary: [11/12 Regression] ICE: tree check: expected var_decl or type_decl, have error_mark in emit_tinfo_decl, at cp/rtti.c:1643 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: asolokha at gmx dot com Target Milestone: --- gcc-11.0.1-alpha20210418 snapshot (g:b412ce8e961052e6becea3bc783a53e1d5feaa0f) ICEs when compiling the following testcase, reduced from libstdc++-v3/testsuite/18_support/type_info/fundamental.cc: #include namespace std { namespace decimal { class decimal32 { float private__decfloat32; }; } } void foo () { typeid (float); typeid (std::decimal::decimal32); } % g++-11.0.1 -c dvovnhjr.cc dvovnhjr.cc:15:34: error: conflicting declaration 'const __class_type_info_pseudo_8 _ZTIf' 15 | typeid (std::decimal::decimal32); | ^ dvovnhjr.cc:14:16: note: previous declaration as 'const __fundamental_type_info_pseudo_2 _ZTIf' 14 | typeid (float); |^ dvovnhjr.cc:16:1: internal compiler error: tree check: expected var_decl or type_decl, have error_mark in emit_tinfo_decl, at cp/rtti.c:1643 16 | } | ^ 0x814248 tree_check_failed(tree_node const*, char const*, int, char const*, ...) /var/tmp/portage/sys-devel/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/tree.c:9816 0x6c3715 tree_check2(tree_node*, char const*, int, char const*, tree_code, tree_code) /var/tmp/portage/sys-devel/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/tree.h:3372 0x6c3715 emit_tinfo_decl(tree_node*) /var/tmp/portage/sys-devel/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/cp/rtti.c:1643 0x99d5cc c_parse_final_cleanups() /var/tmp/portage/sys-devel/gcc-11.0.1_alpha20210418/work/gcc-11-20210418/gcc/cp/decl2.c:4994
[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253 --- Comment #3 from Hongtao.liu --- (In reply to Andrew Pinski from comment #2) > The problem is right away in expand: > ;; vect__36.383_12 = MEM [(char * > {ref-all})_10 + 16B]; > > (insn 23 22 0 (set (reg:V1TI 88 [ vect__36.383 ]) > (mem:V1TI (plus:DI (reg/f:DI 86 [ _10 ]) > (const_int 16 [0x10])) [0 MEM > [(char * {ref-all})_10 + 16B]+0 S16 A128])) -1 > (nil)) > > > I think SLP did not mark the load as unaligned even though it knows it is > one: But gimple tree is marked as aligned. unit-size align:128 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fffea300a80 precision:128 min max pointer_to_this > unsigned V1TI size unit-size align:128 warn_if_not_align:0 symtab:0 alias-set 31 canonical-type 0x7fffe9a59150 nunits:1 pointer_to_this > arg:0 sizes-gimplified public unsigned type_6 DI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7fffea30c498> visited def_stmt _11 = + _214; version:11 ptr-info 0x7fffe9487330> arg:1 constant 16>>
[Bug c/100260] New: DSE: join stores
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100260 Bug ID: 100260 Summary: DSE: join stores Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: david.bolvansky at gmail dot com Target Milestone: --- #include struct pam { void *p1; void *p2; #ifdef LONG unsigned long size; #else unsigned int pad; unsigned int size; #endif }; extern int use(struct pam *param); unsigned int foo(void) { struct pam s_pam; memset(_pam, 0, sizeof(struct pam)); s_pam.size = 1; return use(_pam); } INT foo(): sub rsp, 40 pxor xmm0, xmm0 mov rdi, rsp mov DWORD PTR [rsp+16], 0 mov DWORD PTR [rsp+20], 1 movaps XMMWORD PTR [rsp], xmm0 call use(pam*) add rsp, 40 ret LONG foo(): sub rsp, 40 pxor xmm0, xmm0 mov rdi, rsp movaps XMMWORD PTR [rsp], xmm0 mov QWORD PTR [rsp+16], 1 call use(pam*) add rsp, 40 ret Stores mov DWORD PTR [rsp+16], 0 mov DWORD PTR [rsp+20], 1 can be replaced with one mov QWORD..
[Bug c++/100248] ICE with global "default" keyword
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100248 --- Comment #1 from 康桓瑋 --- Reduced to no header: struct S {}; bool operator==(S&&, S&&) = default; :2:29: internal compiler error: Segmentation fault 2 | bool operator==(S&&, S&&) = default; | ^~~ 0x1cff019 internal_error(char const*, ...) ???:0 0x1366c00 strip_array_types(tree_node*) ???:0 0x9c9778 cp_type_quals(tree_node const*) ???:0 0x9ba4c3 cp_build_qualified_type_real(tree_node*, int, int) ???:0 0x82f194 defaultable_fn_check(tree_node*) ???:0 0x7b2cc8 cp_finish_decl(tree_node*, tree_node*, bool, tree_node*, int) ???:0 0x8e1f6d c_parse_file() ???:0 0xa621b2 c_common_parse_file() ???:0 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report.
[Bug libstdc++/100259] New: ODR violations in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100259 Bug ID: 100259 Summary: ODR violations in Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: aaron at aarongraham dot com Target Milestone: --- Current implementation in has functions that violate ODR: std::experimental::net::ip::make_error_code std::experimental::net::ip::make_error_condition std::experimental::net::ip::make_network_v4 It seems these should be inline and/or constexpr. There are probably others.
[Bug middle-end/95922] Failure to optimize `((b ^ a) & c) ^ a` to `(a & ~c) | (b & c)` the right way on architectures with andnot
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95922 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug rtl-optimization/94806] Failure to optimize unary minus for 128-bit operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94806 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug target/94863] Failure to use blendps over mov when possible
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94863 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug target/94870] Failure to use movhlps instead of seperated mov+unpckhpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94870 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/94930] Failure to optimize out subvsi in expansion of __builtin_memcmp with 1 as the operand with -ftrapv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94930 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug target/100257] poor codegen with vcvtph2ps / stride of 6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/94898] Failure to optimize compare plus sub of same operands into compare
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94898 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/94916] Failure to optimize pattern into difference or zero selector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94916 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/95404] Failure to optimize compare to power of 2 and bitwise and to more direct bitwise and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95404 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/95408] Failure to optimize bitwise and with negated conditional using the same operand to conditional with decremented operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95408 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/95738] Failure to optimize comparison of vector after sign xor to unsigned comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95738 Andrew Pinski changed: What|Removed |Added Summary|Failure to optimize |Failure to optimize |comparison of float after |comparison of vector after |sign xor to unsigned|sign xor to unsigned |comparison |comparison Severity|normal |enhancement --- Comment #1 from Andrew Pinski --- Vectors optimizations are less likely to implemented really.
[Bug middle-end/19987] [meta-bug] fold missing optimizations in general
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987 Bug 19987 depends on bug 95914, which changed state. Bug 95914 Summary: Failure to optimize saturated add properly https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95914 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
[Bug target/95914] Failure to optimize saturated add properly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95914 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Target Milestone|--- |11.0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #1 from Andrew Pinski --- .ident "GCC: (GNU) 11.0.1 20210228 (experimental) [master revision 5d9d6c1cd8d:fd96f7217ea:ec9dc4fa0803cb85ae0b981ca0d6a406e8f6669c]" Produces: addw%di, %si movl$-1, %eax cmovnc %esi, %eax Which is exactly what you would have expected really. This is because we find ADD_OVERFLOW now: _6 = .ADD_OVERFLOW (a_2(D), b_3(D)); c_4 = REALPART_EXPR <_6>; _7 = IMAGPART_EXPR <_6>; if (_7 == 0) goto ; [65.00%] else goto ; [35.00%] [local count: 375809640]: [local count: 1073741824]: # iftmp.0_1 = PHI So fixed.
[Bug tree-optimization/95923] Failure to optimize bool checks into and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95923 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/96172] Failure to optimize direct assignment to bitfield through shifts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96172 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/96702] Failure to optimize comparisons involving result of subtraction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96702 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/94893] Sign function not getting optimized to simple compare
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94893 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2021-04-26 Status|UNCONFIRMED |NEW --- Comment #2 from Andrew Pinski --- Confirmed. Note the original code does have one slight undefined case, when x is INT_MIN but that can be fixed by casting to unsigned before doing the negate of x. That is: inline int sign(int x) { return (x >> 31) | (-(unsigned)x >> 31); } CUT Matching this: _1 = x_5(D) >> 31; x.0_2 = (unsigned int) x_5(D); _3 = -x.0_2; _4 = _3 >> 31; _8 = (int) _4; _6 = _1 | _8; Into: t = (int)(x > 0); result = x < 0 ? - 1 : t; Might not be the best thing.
[Bug tree-optimization/94893] Sign function not getting optimized to simple compare
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94893 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug middle-end/98710] missing optimization (x | c) & ~(y | c) -> x & ~(y | c)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98710 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Last reconfirmed||2021-04-26 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- I will be implementing this for GCC 12.
[Bug c++/97952] Poor optimization of closure-like construct in C++ as compared to C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97952 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug c++/100252] Internal compiler error during template instantiation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100252 --- Comment #2 from Jeremy R. --- Even more minimal case: https://godbolt.org/z/M3Tv9oqcn
[Bug target/100257] poor codegen with vcvtph2ps / stride of 6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257 --- Comment #1 from Andrew Pinski --- Looks like a few missed optimizations at the tree level (and a target issue of the store): memcpy (, src_33, 6); _1 = pixel.b; _2 = pixel.g; _3 = pixel.r; val_2.0_21 = (short int) _1; val_1.1_22 = (short int) _2; val_0.2_23 = (short int) _3; _24 = {val_0.2_23, val_1.1_22, val_2.0_21, 0, 0, 0, 0, 0}; _25 = __builtin_ia32_vcvtph2ps (_24); _14 = BIT_FIELD_REF <_25, 64, 0>; _28 = BIT_FIELD_REF <_25, 32, 64>; MEM [(float *)dst_34] = _14; MEM[(float *)dst_34 + 8B] = _28; MEM[(float *)dst_34 + 12B] = 1.0e+0; The store issue is now PR 100258. This is more about the missed optimization of the first part, the conversion.
[Bug target/100258] New: constant store pulled out of the loop causes an extra memory load
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100258 Bug ID: 100258 Summary: constant store pulled out of the loop causes an extra memory load Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: x86_64-linux-gnu Take: void f(float *x, int t) { for(int i = 0; i < t; i++) x[i*3] = 1.0; } Right now this produces for it at -O2: testl %esi, %esi jle .L5 leal-1(%rsi), %eax leaq(%rax,%rax,2), %rax vmovss .LC0(%rip), %xmm0 leaq12(%rdi,%rax,4), %rax .p2align 4,,10 .p2align 3 .L3: vmovss %xmm0, (%rdi) addq$12, %rdi cmpq%rax, %rdi jne .L3 .L5: ret - CUT If we don't have a loop, e.g. just a store to *x, we get: movl$0x3f80, (%rdi) Which is 100x more effiecent and we just need a loop around that without doing the load of .LC0.
[Bug libstdc++/100017] error: 'fenv_t' has not been declared in '::' x86_64-w64-mingw32 host cross toolchain fails to build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100017 --- Comment #12 from Dave Murphy --- Naive patch based on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100017#c7 gets my canadian crosses building. diff --git a/libstdc++-v3/include/c_compatibility/fenv.h b/libstdc++-v3/include/c_compatibility/fenv.h index 0413e3b7c25..56cabaa3635 100644 --- a/libstdc++-v3/include/c_compatibility/fenv.h +++ b/libstdc++-v3/include/c_compatibility/fenv.h @@ -26,6 +26,10 @@ * This is a Standard C++ Library header. */ +#if !defined __cplusplus || defined _GLIBCXX_INCLUDE_NEXT_C_HEADERS +# include_next +#else + #ifndef _GLIBCXX_FENV_H #define _GLIBCXX_FENV_H 1 diff --git a/libstdc++-v3/include/c_global/cfenv b/libstdc++-v3/include/c_global/cfenv index 0b0ec35a837..d24cb1a3c81 100644 --- a/libstdc++-v3/include/c_global/cfenv +++ b/libstdc++-v3/include/c_global/cfenv @@ -37,9 +37,11 @@ #include -#if _GLIBCXX_HAVE_FENV_H -# include -#endif +// Need to ensure this finds the C library's not a libstdc++ +// wrapper that might already be installed later in the include search path. +#define _GLIBCXX_INCLUDE_NEXT_C_HEADERS +#include_next +#undef _GLIBCXX_INCLUDE_NEXT_C_HEADERS #ifdef _GLIBCXX_USE_C99_FENV_TR1
[Bug c/100257] New: poor codegen with vcvtph2ps / stride of 6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257 Bug ID: 100257 Summary: poor codegen with vcvtph2ps / stride of 6 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: witold.baryluk+gcc at gmail dot com Target Milestone: --- gcc (Compiler-Explorer-Build) 12.0.0 20210424 (experimental) https://godbolt.org/z/n6ooMdnz8 This C code: ``` #include #include #include struct float3 { float f1; float f2; float f3; }; struct util_format_r16g16b16_float { uint16_t r; uint16_t g; uint16_t b; }; static inline struct float3 _mesa_half3_to_float3(uint16_t val_0, uint16_t val_1, uint16_t val_2) { #if defined(__F16C__) //const __m128i in = {val_0, val_1, val_2}; //__m128 out; //__asm volatile("vcvtph2ps %1, %0" : "=v"(out) : "v"(in)); const __m128i in = _mm_setr_epi16(val_0, val_1, val_2, 0, 0, 0, 0, 0); const __m128 out = _mm_cvtph_ps(in); const struct float3 r = {out[0], out[1], out[2]}; return r; #endif } void util_format_r16g16b16_float_unpack_rgba_float(void *restrict dst_row, const uint8_t *restrict src, unsigned width) { float *dst = dst_row; for (unsigned x = 0; x < width; x += 1) { const struct util_format_r16g16b16_float pixel; memcpy(, src, sizeof pixel); struct float3 r = _mesa_half3_to_float3(pixel.r, pixel.g, pixel.b); dst[0] = r.f1; /* r */ dst[1] = r.f2; /* g */ dst[2] = r.f3; /* b */ dst[3] = 1; /* a */ src += 6; dst += 4; } } ``` Is compiled "poorly" by gcc, even worse when compiled on i386 (with -mf16c enabled) when using -FPIE. Example: gcc -O3 -m32 -march=znver2 -mfpmath=sse -fPIE util_format_r16g16b16_float_unpack_rgba_float: pushebp pushedi pushesi pushebx sub esp, 28 mov ecx, DWORD PTR 56[esp] mov edx, DWORD PTR 48[esp] call__x86.get_pc_thunk.ax add eax, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_ mov ebx, DWORD PTR 52[esp] testecx, ecx je .L8 vmovss xmm3, DWORD PTR .LC0@GOTOFF[eax] xor esi, esi xor ebp, ebp vpxor xmm2, xmm2, xmm2 .L3: mov eax, DWORD PTR [ebx] vmovss DWORD PTR 12[edx], xmm3 add ebx, 6 add edx, 16 inc esi mov ecx, eax vmovd xmm0, eax shr ecx, 16 mov edi, ecx movzx ecx, WORD PTR -2[ebx] vpinsrw xmm0, xmm0, edi, 1 vmovd xmm1, ecx vpinsrw xmm1, xmm1, ebp, 1 vpunpckldq xmm0, xmm0, xmm1 vpunpcklqdq xmm0, xmm0, xmm2 vcvtph2ps xmm0, xmm0 vmovss DWORD PTR -16[edx], xmm0 vextractps DWORD PTR -12[edx], xmm0, 1 vextractps DWORD PTR -8[edx], xmm0, 2 cmp DWORD PTR 56[esp], esi jne .L3 .L8: add esp, 28 pop ebx pop esi pop edi pop ebp ret .LC0: .long 1065353216 __x86.get_pc_thunk.ax: mov eax, DWORD PTR [esp] ret clang: util_format_r16g16b16_float_unpack_rgba_float: # @util_format_r16g16b16_float_unpack_rgba_float mov eax, dword ptr [esp + 12] testeax, eax je .LBB0_3 mov ecx, dword ptr [esp + 8] mov edx, dword ptr [esp + 4] .LBB0_2:# =>This Inner Loop Header: Depth=1 vmovd xmm0, dword ptr [ecx] # xmm0 = mem[0],zero,zero,zero vpinsrw xmm0, xmm0, word ptr [ecx + 4], 2 add ecx, 6 vcvtph2ps xmm0, xmm0 vmovss dword ptr [edx], xmm0 vextractps dword ptr [edx + 4], xmm0, 1 vextractps dword ptr [edx + 8], xmm0, 2 mov dword ptr [edx + 12], 1065353216 add edx, 16 dec eax jne .LBB0_2 .LBB0_3: ret clang code is essentially optimal. The issue persist if I use `vcvtph2ps` directly via asm, or via intrinsics. The issue might be the src stride, of 6, instead 8, that is confusing gcc. Additionally, constant 1065353216 (which is weird, I would expect it to be 0), is stored in data section, instead inline as immediate, this makes code actually larger, and in PIE mode, requires extra pointer trickery, and on -m32, even calling extra function. Even without -fPIE the main loop has poor codegen even on x86-64 / amd64 compared to clang or what I would considered good code. gcc -m64 -O3 -march=native util_format_r16g16b16_float_unpack_rgba_float: testedx, edx je .L8 mov edx, edx sal rdx, 4 vmovss xmm3, DWORD PTR .LC0[rip] lea rcx, [rdi+rdx]
[Bug tree-optimization/100256] New: spurious stringop-overflow warning with memset(..., sizeof(dest)) on variable-length array at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100256 Bug ID: 100256 Summary: spurious stringop-overflow warning with memset(..., sizeof(dest)) on variable-length array at -O3 Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gandalf at winds dot org Target Milestone: --- When 'j_degree' is unknown per the function below, -O3 causes a stringop-overflow warning to be emitted on memset() with strange region sizes. The code snapshot below is the result of trying to simplify/remove as many lines as I could while still causing the warning to generate. GCC 10.3.0 and GCC 11.0.1 commit a6f018fcc6ce9236ff37eac33b01a0a80103c9f6, running on x86_64-pc-linux-gnu (Gentoo): --- typedef long unsigned int size_t; extern void *memset (void *__s, int __c, size_t __n) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (1))); extern void *calloc (size_t __nmemb, size_t __size) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__malloc__)) __attribute__ ((__alloc_size__ (1, 2))) ; static void setup_matrix(double **ppd_xx, double *pd_xy, int j_degree) { int kk; double ad_xsum[j_degree*2 + 1]; memset(ad_xsum,0,sizeof(ad_xsum)); for(kk=0; kk < j_degree*2 + 1; kk++) { ad_xsum[kk]++; if(kk < j_degree + 1) pd_xy[kk]++; } } void polyfit(int j_degree, double ad_coef[], double *pd_xy, double **ppd_xx) { int jj; for(jj=0;jj
[Bug debug/100255] Crosscompiler to ia64-hp-vms: vmsdbgout.c:368:20: error: ISO C++17 does not allow 'register' storage class specifier [-Werror=register]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100255 Jakub Jelinek changed: What|Removed |Added Ever confirmed|0 |1 CC||jakub at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2021-04-25 Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
[Bug debug/100255] New: Crosscompiler to ia64-hp-vms: vmsdbgout.c:368:20: error: ISO C++17 does not allow 'register' storage class specifier [-Werror=register]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100255 Bug ID: 100255 Summary: Crosscompiler to ia64-hp-vms: vmsdbgout.c:368:20: error: ISO C++17 does not allow 'register' storage class specifier [-Werror=register] Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: jbg...@lug-owl.de Target Milestone: --- I'm revamping my testing efforts, building cross compilers based on targets listed in ./contrib/config-list.mk. With .../configure --target=ia64-hp-vms --enable-werror-always --enable-languages=all --prefix=/tmp/gcc-ia64-hp-vms (using g++ (Debian 20210320-1) 11.0.1 20210320 (experimental) [master revision 3279a9a5a9a:6526c452d22:5f256a70a05fcfc5a1caf56678ceb12b4f87f781] as the host's compiler), build breaks (as of ed16241c6db23013d70b792a64f29080ad48a414) with this (cf. http://toolchain.lug-owl.de:8080/jobs/gcc-ia64-hp-vms/8): make all-gcc [...] [all 2021-04-25 20:52:34.441260] /usr/lib/gcc-snapshot/bin/g++ -fno-PIE -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../.././gcc -I../.././gcc/. -I../.././gcc/../include -I../.././gcc/../libcpp/include -I../.././gcc/../libcody -I../.././gcc/../libdecnumber -I../.././gcc/../libdecnumber/dpd -I../libdecnumber -I../.././gcc/../libbacktrace -o vmsdbgout.o -MT vmsdbgout.o -MMD -MP -MF ./.deps/vmsdbgout.TPo ../.././gcc/vmsdbgout.c [all 2021-04-25 20:52:38.329848] ../.././gcc/vmsdbgout.c: In function 'int write_debug_string(const char*, const char*, int)': [all 2021-04-25 20:52:38.330047] ../.././gcc/vmsdbgout.c:368:20: error: ISO C++17 does not allow 'register' storage class specifier [-Werror=register] [all 2021-04-25 20:52:38.330139] 368 | register int slen = strlen (P); \ [all 2021-04-25 20:52:38.330220] |^~~~ [all 2021-04-25 20:52:38.330296] ../.././gcc/vmsdbgout.c:549:7: note: in expansion of macro 'ASM_OUTPUT_DEBUG_STRING' [all 2021-04-25 20:52:38.330371] 549 | ASM_OUTPUT_DEBUG_STRING (asm_out_file, string); [all 2021-04-25 20:52:38.330447] | ^~~ [all 2021-04-25 20:52:38.331123] ../.././gcc/vmsdbgout.c:369:28: error: ISO C++17 does not allow 'register' storage class specifier [-Werror=register] [all 2021-04-25 20:52:38.331292] 369 | register const char *p = (P); \ [all 2021-04-25 20:52:38.331370] |^ [all 2021-04-25 20:52:38.331439] ../.././gcc/vmsdbgout.c:549:7: note: in expansion of macro 'ASM_OUTPUT_DEBUG_STRING' [all 2021-04-25 20:52:38.331505] 549 | ASM_OUTPUT_DEBUG_STRING (asm_out_file, string); [all 2021-04-25 20:52:38.331577] | ^~~ [all 2021-04-25 20:52:38.331953] ../.././gcc/vmsdbgout.c:370:20: error: ISO C++17 does not allow 'register' storage class specifier [-Werror=register] [all 2021-04-25 20:52:38.332102] 370 | register int i; \ [all 2021-04-25 20:52:38.332181] |^ [all 2021-04-25 20:52:38.332287] ../.././gcc/vmsdbgout.c:549:7: note: in expansion of macro 'ASM_OUTPUT_DEBUG_STRING' [all 2021-04-25 20:52:38.332364] 549 | ASM_OUTPUT_DEBUG_STRING (asm_out_file, string); [all 2021-04-25 20:52:38.332437] | ^~~ [all 2021-04-25 20:52:38.333260] ../.././gcc/vmsdbgout.c:374:24: error: ISO C++17 does not allow 'register' storage class specifier [-Werror=register] [all 2021-04-25 20:52:38.333448] 374 | register int c = p[i]; \ [all 2021-04-25 20:52:38.333528] |^ [all 2021-04-25 20:52:38.333600] ../.././gcc/vmsdbgout.c:549:7: note: in expansion of macro 'ASM_OUTPUT_DEBUG_STRING' [all 2021-04-25 20:52:38.333668] 549 | ASM_OUTPUT_DEBUG_STRING (asm_out_file, string); [all 2021-04-25 20:52:38.333734] | ^~~ [all 2021-04-25 20:52:38.386301] ../.././gcc/vmsdbgout.c: At global scope: [all 2021-04-25 20:52:38.386507] ../.././gcc/vmsdbgout.c:1232:42: error: ISO C++17 does not allow 'register' storage class specifier [-Werror=register] [all 2021-04-25 20:52:38.386595] 1232 | vmsdbgout_begin_block (register unsigned line, register unsigned blocknum) [all 2021-04-25 20:52:38.386663] | ^~~~ [all 2021-04-25 20:52:38.387009] ../.././gcc/vmsdbgout.c:1232:66: error: ISO C++17 does not allow 'register' storage class specifier [-Werror=register] [all 2021-04-25 20:52:38.387164] 1232 | vmsdbgout_begin_block (register unsigned
[Bug debug/100254] New: [11/12 Regression] -fcompare-debug failure (length) with -O2 -fno-guess-branch-probability -fipa-pta -fnon-call-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100254 Bug ID: 100254 Summary: [11/12 Regression] -fcompare-debug failure (length) with -O2 -fno-guess-branch-probability -fipa-pta -fnon-call-exceptions Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: zsojka at seznam dot cz CC: aoliva at gcc dot gnu.org Target Milestone: --- Host: x86_64-pc-linux-gnu Created attachment 50673 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50673=edit auto-reduced testcase (from OpenTTD sources) Compiler output: $ x86_64-pc-linux-gnu-g++ -O2 -mtune=goldmont -fno-guess-branch-probability -fipa-pta -fnon-call-exceptions -fcompare-debug testcase.C testcase.C:20:38: warning: friend declaration 'bool operator!=(_Rb_tree_const_iterator<_Tp>::_Self, _Rb_tree_const_iterator<_Tp>::_Self)' declares a non-template function [-Wnon-template-friend] 20 | friend bool operator!=(_Self, _Self); | ^ testcase.C:20:38: note: (if this is not what you intended, make sure the function template has already been declared and add '<>' after the function name here) testcase.C: In member function 'bool CargoSorter::operator()(const CargoDataEntry*, const CargoDataEntry*) const': testcase.C:82:61: warning: no return statement in function returning non-void [-Wreturn-type] 82 | const CargoDataEntry *) const {} | ^ x86_64-pc-linux-gnu-g++: error: testcase.C: '-fcompare-debug' failure (length) $ x86_64-pc-linux-gnu-g++ -v Using built-in specs. COLLECT_GCC=/repo/gcc-trunk/binary-latest/bin/x86_64-pc-linux-gnu-g++ COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r12-100-20210424001429-gbcd77b7b9f3-checking-yes-rtl-df-extra-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++ --enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra --with-cloog --with-ppl --with-isl --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld --with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch --prefix=/repo/gcc-trunk//binary-trunk-r12-100-20210424001429-gbcd77b7b9f3-checking-yes-rtl-df-extra-amd64 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.0.0 20210424 (experimental) (GCC)
[Bug c++/100252] Internal compiler error during template instantiation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100252 --- Comment #1 from Jeremy R. --- A more minimal case: https://godbolt.org/z/jxP9e35bz
[Bug fortran/100245] ICE on automatic reallocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100245 --- Comment #2 from José Rui Faustino de Sousa --- Patch posted: https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html
[Bug rtl-optimization/96796] [9 Regression] aarch64: ICE during RTL pass: reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796 rsandifo at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #12 from rsandifo at gcc dot gnu.org --- Fixed for GCC 9 and above. Thanks for the bug report.
[Bug target/98302] [9 Regression] Wrong code on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98302 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #19 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug tree-optimization/95694] [9 Regression] ICE in trunc_int_for_mode, at explow.c:59 since r9-7156-g33579b59aaf02eb7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95694 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #9 from rsandifo at gcc dot gnu.org --- Fixed.
[Bug target/99929] [8/9 Backport] SVE: Wrong code at -O2 -ftree-vectorize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99929 --- Comment #6 from CVS Commits --- The releases/gcc-9 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:1f3c550c9188e1afb30cb9d40c419c3e6ced5cb3 commit r9-9465-g1f3c550c9188e1afb30cb9d40c419c3e6ced5cb3 Author: Richard Sandiford Date: Sun Apr 25 14:51:16 2021 +0100 Check for matching CONST_VECTOR encodings [PR99929] PR99929 is one of those âhow did we get away with this for so longâ bugs: the equality routines weren't checking whether two variable-length CONST_VECTORs had the same encoding. This meant that: { 1, 0, 0, 0, 0, 0, ... } would appear to be equal to: { 1, 0, 1, 0, 1, 0, ... } since both are represented using the elements { 1, 0 }. gcc/ PR rtl-optimization/99929 * rtl.h (same_vector_encodings_p): New function. * cse.c (exp_equiv_p): Check that CONST_VECTORs have the same encoding. * cselib.c (rtx_equal_for_cselib_1): Likewise. * jump.c (rtx_renumbered_equal_p): Likewise. * lra-constraints.c (operands_match_p): Likewise. * reload.c (operands_match_p): Likewise. * rtl.c (rtx_equal_p_cb, rtx_equal_p): Likewise. (cherry picked from commit a87d3f964df31d4fbceb822c6d293e85c117d992)
[Bug target/98136] [8/9 Regression] [aarch64] Internal compiler error with large classes and virtual methods since r8-5967-gf5470a77425a54efebfe1732488c40f05ef176d0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98136 --- Comment #7 from CVS Commits --- The releases/gcc-9 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:04aaa315db27726e090ca7c3ca3aed9dd5895701 commit r9-9464-g04aaa315db27726e090ca7c3ca3aed9dd5895701 Author: Richard Sandiford Date: Sun Apr 25 14:51:15 2021 +0100 aarch64: Tweak post-RA handling of CONST_INT moves [PR98136] This PR is a regression caused by r8-5967, where we replaced a call to aarch64_internal_mov_immediate in aarch64_add_offset with a call to aarch64_force_temporary, which in turn uses the normal emit_move_insn{,_1} routines. The problem is that aarch64_add_offset can be called while outputting a thunk, where we require all instructions to be valid without splitting. However, the move expanders were not splitting CONST_INT moves themselves. I think the right fix is to make the move expanders work even in this scenario, rather than require callers to handle it as a special case. gcc/ PR target/98136 * config/aarch64/aarch64.md (mov): Pass multi-instruction CONST_INTs to aarch64_expand_mov_immediate when called after RA. gcc/testsuite/ PR target/98136 * g++.dg/pr98136.C: New test. (cherry picked from commit 48c79f054bf435051c95ee093c45a0f8c9de5b4e)
[Bug rtl-optimization/96796] [9 Regression] aarch64: ICE during RTL pass: reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796 --- Comment #11 from CVS Commits --- The releases/gcc-9 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:49cc1253d079bbefc18275f29adc526679422176 commit r9-9463-g49cc1253d079bbefc18275f29adc526679422176 Author: Richard Sandiford Date: Sun Apr 25 14:51:14 2021 +0100 lra: Avoid cycling on certain subreg reloads [PR96796] This PR is about LRA cycling for a reload of the form: Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI] Creating newreg=287, assigning class ALL_REGS to slow/invalid mem r287 Creating newreg=288, assigning class ALL_REGS to slow/invalid mem r288 103: r203:SI=r288:SI<<0x1+r196:DI#0 REG_DEAD r196:DI Inserting slow/invalid mem reload before: 316: r287:DI=[r105:DI*0x8+r140:DI] 317: r288:SI=r287:DI#0 The problem is with r287. We rightly give it a broad starting class of POINTER_AND_FP_REGS (reduced from ALL_REGS by preferred_reload_class). However, we never make forward progress towards narrowing it down to a specific choice of class (POINTER_REGS or FP_REGS). I think in practice we rely on two things to narrow a reload pseudo's class down to a specific choice: (1) a restricted class is specified when the pseudo is created This happens for input address reloads, where the class is taken from the target's chosen base register class. It also happens for simple REG reloads, where the class is taken from the chosen alternative's constraints. (2) uses of the reload pseudo as a direct input operand In this case get_reload_reg tries to reuse the existing register and narrow its class, instead of creating a new reload pseudo. However, neither occurs here. As described above, r287 rightly starts out with a wide choice of class, ultimately derived from ALL_REGS, so we don't get (1). And as the comments in the PR explain, r287 is never used as an input reload, only the subreg is, so we don't get (2): Choosing alt 13 in insn 317: (0) r (1) w {*movsi_aarch64} Creating newreg=291, assigning class FP_REGS to r291 317: r288:SI=r291:SI Inserting insn reload before: 320: r291:SI=r287:DI#0 IMO, in this case we should rely on the reload of r316 to narrow down the class of r278. Currently we do: Choosing alt 7 in insn 316: (0) r (1) m {*movdi_aarch64} Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to r289 316: r289:DI=[r105:DI*0x8+r140:DI] Inserting insn reload after: 318: r287:DI=r289:DI --- i.e. we create a new pseudo register r289 and give *that* pseudo GENERAL_REGS instead. This is because get_reload_reg only narrows down the existing class for OP_IN and OP_INOUT, not OP_OUT. But if we have a reload pseudo in a reload instruction and have chosen a specific class for the reload pseudo, I think we should simply install it for OP_OUT reloads too, if the class is a subset of the existing class. We will need to pick such a register whatever happens (for r289 in the example above). And as explained in the PR, doing this actually avoids an unnecessary move via the FP registers too. This backport is less aggressive than the trunk version, in that the new code reuses the test for a reload move from in_class_p. We will therefore only narrow OP_OUT classes if the instruction is a register move or memory load that was generated by LRA itself. gcc/ PR rtl-optimization/96796 * lra-constraints.c (in_class_p): Add a default-false allow_all_reload_class_changes_p parameter. Do not treat reload moves specially when the parameter is true. (get_reload_reg): Try to narrow the class of an existing OP_OUT reload if we're reloading a reload pseudo in a reload instruction. gcc/testsuite/ PR rtl-optimization/96796 * gcc.c-torture/compile/pr96796.c: New test.
[Bug target/98302] [9 Regression] Wrong code on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98302 --- Comment #18 from CVS Commits --- The releases/gcc-9 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:3fa4752e29a5b44219837ebad5bb09ec98af156e commit r9-9462-g3fa4752e29a5b44219837ebad5bb09ec98af156e Author: Richard Sandiford Date: Sun Apr 25 14:51:14 2021 +0100 vect: Avoid generating out-of-range shifts [PR98302] In this testcase we end up with: unsigned long long x = ...; char y = (char) (x << 37); The overwidening pattern realised that only the low 8 bits of x << 37 are needed, but then tried to turn that into: unsigned long long x = ...; char y = (char) x << 37; which gives an out-of-range shift. In this case y can simply be replaced by zero, but as the comment in the patch says, it's kind-of awkward to do that in the middle of vectorisation. Most of the overwidening stuff is about keeping operations as narrow as possible, which is important for vectorisation but could be counter-productive for scalars (especially on RISC targets). In contrast, optimising y to zero in the above feels like an independent optimisation that would benefit scalar code and that should happen before vectorisation. gcc/ PR tree-optimization/98302 * tree-vect-patterns.c (vect_determine_precisions_from_users): Make sure that the precision remains greater than the shift count. gcc/testsuite/ PR tree-optimization/98302 * gcc.dg/vect/pr98302.c: New test. (cherry picked from commit 58a12b0eadac62e691fcf7325ab2bc2c93d46b61)
[Bug tree-optimization/95694] [9 Regression] ICE in trunc_int_for_mode, at explow.c:59 since r9-7156-g33579b59aaf02eb7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95694 --- Comment #8 from CVS Commits --- The releases/gcc-9 branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:90ce58cf411f06292dc8f96aba61f3e3d07f22e8 commit r9-9461-g90ce58cf411f06292dc8f96aba61f3e3d07f22e8 Author: Richard Sandiford Date: Sun Apr 25 14:51:13 2021 +0100 expr: Fix REDUCE_BIT_FIELD for constants [PR95694, PR96151] This is yet another PR caused by constant integer rtxes not storing a mode. We were calling REDUCE_BIT_FIELD on a constant integer that didn't fit in poly_int64, and then tripped the as_a assert on VOIDmode. AFAICT REDUCE_BIT_FIELD is always passed rtxes that have TYPE_MODE (rather than some other mode) and it just fills in the redundant sign bits of that TYPE_MODE value. So it should be safe to get the mode from the type instead of the rtx. The patch does that and asserts that the modes agree, where information is available. That on its own is enough to fix the bug, but we might as well extend the folding case to all constant integers, not just those that fit poly_int64. gcc/ PR middle-end/95694 * expr.c (expand_expr_real_2): Get the mode from the type rather than the rtx, and assert that it is consistent with the mode of the rtx (where known). Optimize all constant integers, not just those that can be represented in poly_int64. gcc/testsuite/ PR middle-end/95694 * gcc.dg/pr95694.c: New test. (cherry picked from commit 760df6d296b8fc59796f42dca5eb14012fbfa28b)
[Bug target/82735] _mm256_zeroupper does not invalidate previously computed registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82735 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #7 from Hongtao.liu --- Confirmed, let me fix this.
[Bug tree-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Component|rtl-optimization|tree-optimization Last reconfirmed||2021-04-25 Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- The problem is right away in expand: ;; vect__36.383_12 = MEM [(char * {ref-all})_10 + 16B]; (insn 23 22 0 (set (reg:V1TI 88 [ vect__36.383 ]) (mem:V1TI (plus:DI (reg/f:DI 86 [ _10 ]) (const_int 16 [0x10])) [0 MEM [(char * {ref-all})_10 + 16B]+0 S16 A128])) -1 (nil)) I think SLP did not mark the load as unaligned even though it knows it is one: t.cc:7:8: note: Vectorizing an unaligned access. t.cc:7:8: note: vect_model_load_cost: unaligned supported by hardware. t.cc:7:8: note: vect_model_load_cost: inside_cost = 24, prologue_cost = 0 . t.cc:7:8: note: ==> examining statement: MEM <__int128 unsigned> [(char * {ref-all}) + 25B] = _36; t.cc:7:8: note: vect_is_simple_use: operand # VUSE <.MEM_30> MEM <__int128 unsignedD.19> [(charD.10 * {ref-all})_10], type of def: internal t.cc:7:8: note: vect_is_simple_use: operand # VUSE <.MEM_35> MEM <__int128 unsignedD.19> [(charD.10 * {ref-all})_19], type of def: internal t.cc:7:8: note: Vectorizing an unaligned access. t.cc:7:8: note: vect_model_store_cost: unaligned supported by hardware. Confirmed. When -fno-tree-bit-ccp is turned off, the prop of the unalignedness does not happen.
[Bug rtl-optimization/100253] [10/11/12 Regression] wrong code with -O2 -fno-tree-bit-ccp -ftree-slp-vectorize (unaligned movdqa)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100253 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #1 from Hongtao.liu --- for sse load/store, with aligned address, movdqu is as fast as movdqa, so i'm thinking that backend can generate only unaligned load/store instructions(which of course may cover up some problems).