[Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741 Bug ID: 114741 Summary: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- void foo(unsigned i, unsigned *p) { *p = i & 1; } with gcc -march=armv8-a+sve -O2 compiles to foo: fmovs31, w0 and z31.s, z31.s, #1 str s31, [x1] ret instead of foo: and w0, w0, 1 str w0, [x1] ret it is wrong with -mcpu=generic but good e.g. with -mcpu=neoverse-v1
[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #40 from nsz at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #22) > BTW, does aarch64 dl-tlsdesc.S save SVE/SME register state (I only see fixed > offsets in there), or are those call-saved? call-saved.
[Bug target/112987] [14 Regression][aarch64] ICE in aarch64_do_track_speculation, at config/aarch64/aarch64-speculation.cc:214 since r14-5886-g426fddcbdad674
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112987 nsz at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #4 from nsz at gcc dot gnu.org --- fixed for gcc-14 at 305fe4f136a3a3a78377a48c55d546000a3ba529
[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #2 from nsz at gcc dot gnu.org --- is this fortran only? glibc release is in a week, we can still do something (or backport a fix). the vector abi does not allow 1 lane in this case https://github.com/ARM-software/abi-aa/blob/main/vfabia64/vfabia64.rst#L867 c annotation: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/fpu/bits/math-vector.h;h=04837bdcd7c0d0ce91192e09fc2d6614cae289c2;hb=HEAD fortran annotation: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/fpu/finclude/math-vector-fortran.h;h=92e15f0d6a758258f5728e628bbb2422b176fa95;hb=HEAD i think the bug can be reproduced with older glibc by adding !GCC$ builtin (cos) attributes simd (notinbranch)
[Bug target/112987] [14 Regression][aarch64] ICE in aarch64_do_track_speculation, at config/aarch64/aarch64-speculation.cc:214 since r14-5886-g426fddcbdad674
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112987 nsz at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2024-01-17 CC||nsz at gcc dot gnu.org --- Comment #2 from nsz at gcc dot gnu.org --- confirmed.
[Bug tree-optimization/111478] [12/13/14 regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:250
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111478 --- Comment #1 from nsz at gcc dot gnu.org --- see also bug 111479
[Bug tree-optimization/111479] New: [12/13 regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:248
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111479 Bug ID: 111479 Summary: [12/13 regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:248 Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- maybe related to bug 111478 $ cat bug.c float a, b, c; void *d; int e, f, g; void p() { float i = a; for (;;) for (e = 0; e < g; e++) { float j = c, k = b, l = k, h = j, m = 0.0, n = 0.0; for (f = 0; f < e; f++) { float o = b; m = n = o; } ((float *)d)[2 * e] = l; ((float *)d)[e] = h; ((float *)d)[2 * e] += m - i * n; ((float *)d)[2 * e + 1] += n + i * m; } } $ gcc -c -O3 -march=armv8-a+sve bug.c during GIMPLE pass: vect : In function 'p': :4:6: internal compiler error: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:248 4 | void p() { | ^ 0x10ca603 compute_live_loop_exits /data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-ssa-loop-manip.cc:248 0x10ca603 add_exit_phis_var /data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-ssa-loop-manip.cc:330 0x10ca603 add_exit_phis /data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-ssa-loop-manip.cc:391 0x10ca603 rewrite_into_loop_closed_ssa_1 /data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-ssa-loop-manip.cc:604 0x10ca603 rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int) /data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-ssa-loop-manip.cc:626 0x1262514 execute /data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-vectorizer.cc:1361 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Compiler returned: 1
[Bug tree-optimization/111478] New: [12/13/14 regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:250
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111478 Bug ID: 111478 Summary: [12/13/14 regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:250 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- $ cat bug.c float a, d, e, f, g; int b, c; void h() { for (; b; b++) { for (; c;) { float i = d = i; } a += f - e * g; a += g + e * f; } } $ gcc -c -O3 -march=armv8-a+sve bug.c during GIMPLE pass: vect : In function 'h': :3:6: internal compiler error: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:250 3 | void h() { | ^ 0x1168eb4 compute_live_loop_exits /data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-ssa-loop-manip.cc:250 0x1168eb4 add_exit_phis_var /data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-ssa-loop-manip.cc:332 0x1168eb4 add_exit_phis /data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-ssa-loop-manip.cc:393 0x1168eb4 rewrite_into_loop_closed_ssa_1 /data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-ssa-loop-manip.cc:606 0x1168eb4 rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int) /data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-ssa-loop-manip.cc:628 0x130a1e8 execute /data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-vectorizer.cc:1358 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Compiler returned: 1
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 --- Comment #12 from nsz at gcc dot gnu.org --- (In reply to Jiangning Liu from comment #11) > Hi Wilco, > > > "it means we will need a linker optimization to remove those redundant BTIs > > (eg. by changing them into NOPs)" > > It will be only for performance optimization, right? If we don't care about > performance, the linker doesn't need to optimize it to be NOP, right? It > could still be useful if we only do this operation for a specific module. no, this is a security feature, we want as few BTI c in an executable segment as possible.
[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org Status|NEW |WAITING --- Comment #7 from nsz at gcc dot gnu.org --- fixed in bfd ld 2.41 see https://sourceware.org/bugzilla/show_bug.cgi?id=30076 we can also fix gcc to work with older ld (emit bti c in local functions), but i don't plan to do that unless there is a reason to do so. (it increases the emitted bti c considerably in some workloads, e.g. linux kernel, while the linker fix is less intrusive in the common case with small binaries and no weird section hacks).
[Bug target/104689] aarch64: libgcc: DW_CFA_val_expression is not supported for RA_SIGN_SATE register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104689 nsz at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |13.0 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #2 from nsz at gcc dot gnu.org --- fixed for gcc-13
[Bug ipa/105160] New: [12 regression] ipa modref marks functions with asm volatile as const or pure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160 Bug ID: 105160 Summary: [12 regression] ipa modref marks functions with asm volatile as const or pure Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org CC: marxin at gcc dot gnu.org Target Milestone: --- the following code is miscompiled with gcc -O1 #define sysreg_read(regname)\ ({ \ unsigned long __sr_val; \ asm volatile( \ "mrs %0, " #regname "\n"\ : "=r" (__sr_val)); \ \ __sr_val; \ }) #define sysreg_write(regname, __sw_val) \ do {\ asm volatile( \ "msr " #regname ", %0\n"\ : \ : "r" (__sw_val)); \ } while (0) #define isb() \ do {\ asm volatile( \ "isb" \ : \ : \ : "memory");\ } while (0) static unsigned long sctlr_read(void) { return sysreg_read(sctlr_el1); } static void sctlr_write(unsigned long val) { sysreg_write(sctlr_el1, val); } static void sctlr_rmw(void) { unsigned long val; val = sctlr_read(); val |= 1UL << 7; sctlr_write(val); } void sctlr_read_multiple(void) { sctlr_read(); sctlr_read(); sctlr_read(); sctlr_read(); } void sctlr_write_multiple(void) { sctlr_write(0); sctlr_write(0); sctlr_write(0); sctlr_write(0); sctlr_write(0); } void sctlr_rmw_multiple(void) { sctlr_rmw(); sctlr_rmw(); sctlr_rmw(); sctlr_rmw(); } void function(void) { sctlr_read_multiple(); sctlr_write_multiple(); sctlr_rmw_multiple(); isb(); } aarch64-linux-gnu-gcc -O1 compiles it to (note 'function' and 'sctlr_rmw_multiple'): sctlr_rmw: mrs x0, sctlr_el1 orr x0, x0, 128 msr sctlr_el1, x0 ret sctlr_read_multiple: mrs x0, sctlr_el1 mrs x0, sctlr_el1 mrs x0, sctlr_el1 mrs x0, sctlr_el1 ret sctlr_write_multiple: mov x0, 0 msr sctlr_el1, x0 msr sctlr_el1, x0 msr sctlr_el1, x0 msr sctlr_el1, x0 msr sctlr_el1, x0 ret sctlr_rmw_multiple: ret function: isb ret a similar issue in linux (but lager source file) got bisected to https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1b62cddcf091fb8cadf575246a7d3ff778650a6b commit 1b62cddcf091fb8cadf575246a7d3ff778650a6b Author: Jan Hubicka Date: Fri Nov 12 14:00:47 2021 +0100 Fix ipa-modref pure/const discovery PR ipa/103200 * ipa-modref.c (analyze_function, modref_propagate_in_scc): Do not mark pure/const function if there are side-effects. with -fdump-ipa-all $ grep found t.c.087i.modref Function found to be const: sctlr_rmw/2 Function found to be const: sctlr_read_multiple/3 Function found to be const: sctlr_write_multiple/4 Function found to be const: sctlr_rmw_multiple/5 even though t.c.086i.pure-const correctly identifies asm volatile as not const/pure.
[Bug target/104689] New: aarch64: libgcc: DW_CFA_val_expression is not supported for RA_SIGN_SATE register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104689 Bug ID: 104689 Summary: aarch64: libgcc: DW_CFA_val_expression is not supported for RA_SIGN_SATE register Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- gcc emits DW_CFA_AARCH64_negate_ra_state (DW_CFA_window_save) for pac-ret but it's valid to set the RA_SIGN_STATE pseudo register via other dwarf instructions. currently libgcc unwinder can crash if DW_CFA_val_expression is used to set the register value directly. (reportedly the cranelift compiler can generate such code.)
[Bug target/102768] [feature request] Add compiler support for aarch64 shadow call stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768 nsz at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED Target Milestone|--- |12.0 --- Comment #9 from nsz at gcc dot gnu.org --- i'm closing this as fixed. open separate bugs for further improvements. Fixed by https://gcc.gnu.org/g:ce09ab17ddd21f73ff2caf6eec3b0ee9b0e1a11e commit ce09ab17ddd21f73ff2caf6eec3b0ee9b0e1a11e Author: Dan Li AuthorDate: 2022-02-21 20:01:14 + aarch64: Add compiler support for Shadow Call Stack Shadow Call Stack can be used to protect the return address of a function at runtime, and clang already supports this feature[1]. To enable SCS in user mode, in addition to compiler, other support is also required (as discussed in [2]). This patch only adds basic support for SCS from the compiler side, and provides convenience for users to enable SCS. For linux kernel, only the support of the compiler is required. [1] https://clang.llvm.org/docs/ShadowCallStack.html [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768 Signed-off-by: Dan Li gcc/ChangeLog: * config/aarch64/aarch64.cc (SLOT_REQUIRED): Change wb_candidate[12] to wb_push_candidate[12]. (aarch64_layout_frame): Likewise, and change callee_adjust when scs is enabled. (aarch64_save_callee_saves): Change wb_candidate[12] to wb_push_candidate[12]. (aarch64_restore_callee_saves): Change wb_candidate[12] to wb_pop_candidate[12]. (aarch64_get_separate_components): Change wb_candidate[12] to wb_push_candidate[12]. (aarch64_expand_prologue): Push x30 onto SCS before it's pushed onto stack. (aarch64_expand_epilogue): Pop x30 frome SCS, while preventing it from being popped from the regular stack again. (aarch64_override_options_internal): Add SCS compile option check. (TARGET_HAVE_SHADOW_CALL_STACK): New hook. * config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled, wb_pop_candidate[12], and rename wb_candidate[12] to wb_push_candidate[12]. * config/aarch64/aarch64.md (scs_push): New template. (scs_pop): Likewise. * doc/invoke.texi: Document -fsanitize=shadow-call-stack. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Add hook have_shadow_call_stack. * flag-types.h (enum sanitize_code): Add SANITIZE_SHADOW_CALL_STACK. * opts.cc (parse_sanitizer_options): Add shadow-call-stack and exclude SANITIZE_SHADOW_CALL_STACK. * target.def: New hook. * toplev.cc (process_options): Add SCS compile option check. * ubsan.cc (ubsan_expand_null_ifn): Enum type conversion. gcc/testsuite/ChangeLog: * gcc.target/aarch64/shadow_call_stack_1.c: New test. * gcc.target/aarch64/shadow_call_stack_2.c: New test. * gcc.target/aarch64/shadow_call_stack_3.c: New test. * gcc.target/aarch64/shadow_call_stack_4.c: New test. * gcc.target/aarch64/shadow_call_stack_5.c: New test. * gcc.target/aarch64/shadow_call_stack_6.c: New test. * gcc.target/aarch64/shadow_call_stack_7.c: New test. * gcc.target/aarch64/shadow_call_stack_8.c: New test.
[Bug middle-end/104504] New: spurious -Wswitch-unreachable warning with -ftrivial-auto-var-init=zero
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104504 Bug ID: 104504 Summary: spurious -Wswitch-unreachable warning with -ftrivial-auto-var-init=zero Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- reduced from linux code on which gcc-12 warns now: int foo(int x) { switch(x) { int y; /* spuriously warns with -ftrivial-auto-var-init=zero */ default: y = x * 2; return y; } } $ gcc -Wall -ftrivial-auto-var-init=zero -c a.c a.c: In function 'foo': a.c:3:13: warning: statement will never be executed [-Wswitch-unreachable] 3 | int y; | ^ i can see why gcc warns, but it would be better not to.
[Bug target/102768] [feature request] Add support for aarch64 shadow call stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768 --- Comment #3 from nsz at gcc dot gnu.org --- well, protection mechanisms are rarely equivalent. neither scs nor traditional stack protector are perfect. to me compiler support for freestanding environments such as linux makes sense. i cannot immediately tell if libc support would work. (android is not a good indicator of what can be done in linux userspace: the android abi is broken between releases while glibc is abi stable, bionic can do hacks in longjmp/setcontext that is not acceptable in glibc and android does not have mixed toolchain issues such as old unwinder tries to unwind across a new binary.)
[Bug target/102768] [feature request] Add support for aarch64 shadow call stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #1 from nsz at gcc dot gnu.org --- note that this at least - requires runtime support (to manage the shadow stack), - needs a reserved register (x18), - affects unwinding (shadow stack must be unwound too), - affects longjmp and jmp_buf abi. i guess these are taken care of in the linux context and in that case i think it makes sense to have the gcc support upstream instead of in a plugin. however the general support in user-space is not trivial (the required libc changes may not be possible in a backward compatible way such as changing jmp_buf, or reliably such as allocating the size of shadow stack and dealing with related failures, or with good ui e.g. opt-in mechanism for binaries that require shadow stack so there is no regression for non-shadow-stack binaries, etc.) and there are existing stack protection mechanisms implemented. i just wanted to note here that the linux kernel use-case can be treated separately from user-space applications and likely less effort and less controversial if you scope the feature right.
[Bug target/100354] New: [9 regression] aarch64: non-deligitimized UNSPEC UNSPEC_TLS (76) found in variable location
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100354 Bug ID: 100354 Summary: [9 regression] aarch64: non-deligitimized UNSPEC UNSPEC_TLS (76) found in variable location Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- i see this note/warning a lot during an aarch64 glibc build since gcc-9, it seems to require -O -g, and seems to be harmless wrt code generation, just annoying. $ cat bug.c struct s { void *p; int n; }; void foo(struct s *x) { void *p = __builtin_thread_pointer(); if (x->p != p) x->p = p; x->n++; } $ aarch64-none-linux-gnu-gcc -S -O1 -g bug.c bug.c: In function ‘foo’: bug.c:6:6: note: non-delegitimized UNSPEC UNSPEC_TLS (76) found in variable location 6 | void foo(struct s *x) | ^~~ may be related to bug 89006
[Bug target/99551] New: aarch64: csel is used for cold scalar computation which affects performance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99551 Bug ID: 99551 Summary: aarch64: csel is used for cold scalar computation which affects performance Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- this is an optimization bug, i don't know which layer it should be fixed so i report it as target bug. cold path affects performance of hot code because csel is used: long foo(long x, int c) { if (__builtin_expect(c,0)) x = (x + 15) & ~15; return x; } compiles to foo: cmp w1, 0 add x1, x0, 15 and x1, x1, -16 cselx0, x1, x0, ne ret i think it would be better to use a branch if the user explicitly marked the computation cold. e.g. this is faster if c is always 0: long foo(long x, int c) { if (__builtin_expect(c,0)) { asm (""); x = (x + 15) & ~15; } return x; } foo: cbnzw1, .L7 ret .L7: add x0, x0, 15 and x0, x0, -16 ret
[Bug target/98747] New: aarch64: __ARM_FEATURE_MEMORY_TAGGING is defined on ilp32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98747 Bug ID: 98747 Summary: aarch64: __ARM_FEATURE_MEMORY_TAGGING is defined on ilp32 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- memory tagging intrinsics should be available when arm_acle.h is included and __ARM_FEATURE_MEMORY_TAGGING is defined. memory tagging is not supported with ILP32 so the feature test macro should not be defined either, but gcc seems to define it $ gcc -march=armv8.5-a+memtag -mabi=lp64 -E -dM - int *foo(int *p, unsigned long m) { #ifdef __ARM_FEATURE_MEMORY_TAGGING return __arm_mte_create_random_tag(p, m); #else return p; #endif } but with -march=armv8.5-a+memtag -mabi=ilp32 it fails In file included from :1: : In function 'foo': :6:12: error: Memory Tagging Extension does not support '-mabi=ilp32' 6 | return __arm_mte_create_random_tag(p, m); |^~~ Compiler returned: 1
[Bug target/98618] aarch64: oob adrp offset causes relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98618 --- Comment #5 from nsz at gcc dot gnu.org --- (In reply to Wilco from comment #3) > I fixed this in GCC10: > https://gcc.gnu.org/git/?p=gcc.git=commit; > h=7d3b27ff12610fde9d6c4b56abc70c6ee9b6b3db > > So this just needs to be backported. thanks, i'll try that, i'm still looking for a simple workaround in glibc, this affects this code in elf_get_dynamic_info: ... 63 else if ((d_tag_utype) DT_VERSIONTAGIDX (dyn->d_tag) < DT_VERSIONTAGNUM) 64 info[VERSYMIDX (dyn->d_tag)] = dyn; 65 else if ((d_tag_utype) DT_EXTRATAGIDX (dyn->d_tag) < DT_EXTRANUM) 66 info[DT_EXTRATAGIDX (dyn->d_tag) + DT_NUM + DT_THISPROCNUM 67 + DT_VERSIONTAGNUM] = dyn; 68 else if ((d_tag_utype) DT_VALTAGIDX (dyn->d_tag) < DT_VALNUM) 69 info[DT_VALTAGIDX (dyn->d_tag) + DT_NUM + DT_THISPROCNUM 70 + DT_VERSIONTAGNUM + DT_EXTRANUM] = dyn; 71 else if ((d_tag_utype) DT_ADDRTAGIDX (dyn->d_tag) < DT_ADDRNUM) 72 info[DT_ADDRTAGIDX (dyn->d_tag) + DT_NUM + DT_THISPROCNUM 73 + DT_VERSIONTAGNUM + DT_EXTRANUM + DT_VALNUM] = dyn; ...
[Bug target/98618] aarch64: oob adrp offset causes relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98618 --- Comment #4 from nsz at gcc dot gnu.org --- (In reply to Florian Weimer from comment #1) > Is the test case really valid? It involves an out-of-bounds array access, > after all. sorry you are right the indexes are too far, a better test is long n; struct s { long a[100]; }; extern struct s obj __attribute__((visibility("hidden"))); void foo() { long *a = obj.a; a[n - 0x7000] = n; a[0x7000 - n + 99] = n; } (i wanted to have an example with both + and - offset) it compiles to foo: adrpx0, :got:n adrpx2, obj-15032385536 add x2, x2, :lo12:obj-15032385536 adrpx1, obj+15032386328 ldr x0, [x0, #:got_lo12:n] add x1, x1, :lo12:obj+15032386328 ldr x0, [x0] neg x3, x0, lsl 3 str x0, [x2, x0, lsl 3] str x0, [x3, x1] ret
[Bug target/98618] aarch64: oob adrp offset causes relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98618 --- Comment #2 from nsz at gcc dot gnu.org --- (In reply to Florian Weimer from comment #1) > Is the test case really valid? It involves an out-of-bounds array access, > after all. no it doesn't, n is signed long and its value can be such that the access is in bounds (and that's what the compiler must assume, so adrp must be anchored accordingly).
[Bug target/98618] New: aarch64: oob adrp offset causes relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98618 Bug ID: 98618 Summary: aarch64: oob adrp offset causes relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 Product: gcc Version: 8.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- gcc-8 and earlier can generate adrp with out of bounds offset for hidden and local symbols. i haven't yet found the change that fixed this in gcc-9. this affects glibc since https://sourceware.org/git/?p=glibc.git;a=commit;h=2f056e8a5dd4dc0f075413f931e82cede37d1057 $ cat bug.c long n; struct s { long a[100]; }; extern struct s obj __attribute__((visibility("hidden"))); void foo() { long *a = obj.a; a[n - 0x7000 + 35] = n; a[0x6dff - n + 35 + 6 + 16 + 3] = n; } $ gcc -fPIC -O2 -c bug.c $ objdump -rd bug.o bug.o: file format elf64-littleaarch64 Disassembly of section .text: : 0: 9000adrpx0, 8 0: R_AARCH64_ADR_GOT_PAGE n 4: 9002adrpx2, 0 4: R_AARCH64_ADR_PREL_PG_HI21 obj-0x37ee8 8: 9142add x2, x2, #0x0 8: R_AARCH64_ADD_ABS_LO12_NCobj-0x37ee8 c: 9001adrpx1, 0 c: R_AARCH64_ADR_PREL_PG_HI21 obj+0x371d8 10: f940ldr x0, [x0] 10: R_AARCH64_LD64_GOT_LO12_NC n 14: 9121add x1, x1, #0x0 14: R_AARCH64_ADD_ABS_LO12_NC obj+0x371d8 18: f940ldr x0, [x0] 1c: cb000fe3neg x3, x0, lsl #3 20: f8207840str x0, [x2, x0, lsl #3] 24: f8216860str x0, [x3, x1] 28: d65f03c0ret $ gcc -shared bug.o obj.o bug.o: In function `foo': bug.c:(.text+0x4): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `obj' defined in .data section in obj.o bug.c:(.text+0xc): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `obj' defined in .data section in obj.o collect2: error: ld returned 1 exit status
[Bug libgcc/98251] libgcc on 32-bit soft-float ARM narrows -NaN incorrectly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98251 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #1 from nsz at gcc dot gnu.org --- i believe ieee-754 only specifies the sign bit of a nan after copy, negate, abs and copysign operations. iso c does not specify further requirements about the sign bit of a nan either. so i think gcc should not assume that conversions preserve the sign bit. (there may be real hw where that is not the case, independently from what libgcc is doing.)
[Bug target/97638] New: aarch64: bti c is missing at function entry with branch-protection
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97638 Bug ID: 97638 Summary: aarch64: bti c is missing at function entry with branch-protection Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- gcc-10 (and trunk) with -mbranch-protection=bti (or standard) fails to generate bti c at function entry in some cases: char *foo (const char *s, const int c) { const char *p = 0; for (;;) { if (*s == c) p = s; if (p != 0 || *s++ == 0) break; } return (char *)p; } gcc -O2 -mbranch-protection=bti is foo: .L3: ldrbw2, [x0] cmp w2, w1 beq .L2 add x0, x0, 1 cbnzw2, .L3 mov x0, 0 .L2: ret
[Bug c/97321] New: add warning for pointer casts that may lead to aliasing violation when dereferenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97321 Bug ID: 97321 Summary: add warning for pointer casts that may lead to aliasing violation when dereferenced Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- consider: int f(unsigned char **); int g(char *p) { return f((unsigned char **)); } such code is almost surely wrong (if f dereferences its argument) this is a common mistake and it seems gcc-11 will optimize such code more aggressively which can lead to broken behavior, see bug 97264. so it would be useful to simply warn about casts between pointer types that cannot alias. e.g.: "warning: dangerous cast from `char **` to `unsigned char **` can lead to aliasing violation [-Wpointer-cast]" does not have to be in -Wall, but the current aliasing warnings are too weak to catch bugs like in the example.
[Bug target/94891] aarch64: there is no way to strip PAC from a return address in c code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94891 nsz at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |11.0 Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #13 from nsz at gcc dot gnu.org --- fixed for gcc-11, gcc-10.2, gcc-9.4
[Bug target/94791] aarch64: -pg profiling is broken with pac-ret
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94791 nsz at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED Target Milestone|--- |11.0 --- Comment #4 from nsz at gcc dot gnu.org --- fixed for gcc-11, gcc-10.2, gcc-9.4
[Bug libgcc/96001] aarch64: bti is missing from lse.S when built with branch protection
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96001 nsz at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |11.0 Status|UNCONFIRMED |RESOLVED --- Comment #4 from nsz at gcc dot gnu.org --- fixed for gcc-11, gcc-10.2, gcc-9.4
[Bug libfortran/95920] Implicit declaration of function 'feenableexcept' in fpu-target.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95920 nsz at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||nsz at gcc dot gnu.org Resolution|--- |INVALID --- Comment #1 from nsz at gcc dot gnu.org --- this is a newlib bug.
[Bug tree-optimization/95966] New: soft float operations are not tail called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95966 Bug ID: 95966 Summary: soft float operations are not tail called Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- i'd expect this to be a tail call into the soft float add operation on soft float targets: fp_t foo(fp_t a, fp_t b) { return a + b; } e.g. on x86 with 'typedef __float128 fp_t' the generated code is foo: sub rsp, 8 call__addtf3 add rsp, 8 ret on aarch64 with 'typedef long double fp_t' the generated code is foo: stp x29, x30, [sp, -16]! mov x29, sp bl __addtf3 ldp x29, x30, [sp], 16 ret i see similar code on other softfp targets.
[Bug target/94986] missing diagnostic on ARM thumb2 compilation with -pg when using r7 in inline asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94986 --- Comment #5 from nsz at gcc dot gnu.org --- (In reply to Nick Desaulniers from comment #4) > (In reply to nsz from comment #2) > > ideally r7 clobber would just work with -pg -fomit-frame-pointer. > > the alloca problem is a separate issue (that r7 clobber may not > > work with alloca). > > Should GCC change this for aaarch32 then (rather than closing the bug)? yes, but that's bug 69690.
[Bug target/94986] missing diagnostic on ARM thumb2 compilation with -pg when using r7 in inline asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94986 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #2 from nsz at gcc dot gnu.org --- on arm the -pg abi is func: push {lr} bl _gnu_mcount_nc ... so no frame pointer is involved, -pg implying -fno-omit-frame-pointer is a historical mistake i think (because some targets required fp for -pg, but most don't). ideally r7 clobber would just work with -pg -fomit-frame-pointer. the alloca problem is a separate issue (that r7 clobber may not work with alloca).
[Bug target/94748] aarch64: many unnecessary bti j emitted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94748 nsz at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #4 from nsz at gcc dot gnu.org --- fixed for gcc-10.1 and on the gcc-9 branch.
[Bug target/94697] aarch64: bti j at function start instead of bti c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697 nsz at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from nsz at gcc dot gnu.org --- fixed for gcc-10.1 and on the gcc-9 branch.
[Bug target/94515] aarch64: broken unwind information for pac-ret
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94515 nsz at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #5 from nsz at gcc dot gnu.org --- fixed for gcc-10.1 and on gcc-9 and gcc-8 branches.
[Bug target/94514] aarch64: unwinding across mixed pac-ret and non-pac-ret frames is broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94514 nsz at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #5 from nsz at gcc dot gnu.org --- fixed for gcc-10.1 and on gcc-9 and gcc-8 branches.
[Bug target/94515] aarch64: broken unwind information for pac-ret
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94515 Bug 94515 depends on bug 94514, which changed state. Bug 94514 Summary: aarch64: unwinding across mixed pac-ret and non-pac-ret frames is broken https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94514 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
[Bug target/95129] aarch64: make outline-atomics work on non-gnu targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95129 --- Comment #1 from nsz at gcc dot gnu.org --- i also opened bug 95128 to just configure the outline-atomics away.
[Bug target/95128] aarch64: configure option for outline-atomics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95128 --- Comment #2 from nsz at gcc dot gnu.org --- i also opened bug 95129 to fix the runtime detection.
[Bug target/95129] New: aarch64: make outline-atomics work on non-gnu targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95129 Bug ID: 95129 Summary: aarch64: make outline-atomics work on non-gnu targets Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- the initializer in libgcc uses __getauxval which is not available on non-gnu targets so outlining atomics is ineffective. change the runtime lse check in libgcc such that non-glibc targets can implement it too (e.g. calling __getauxval via a weak reference and no #ifdef __gnu_linux__ check allows a libc to implement it later, unfortunately a non-linux os may not have the same hwcap mechanism so a more generic libc<->libgcc abi would be better).
[Bug target/95128] New: aarch64: configure option for outline-atomics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95128 Bug ID: 95128 Summary: aarch64: configure option for outline-atomics Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- on aarch64, non-gnu targets likely want to turn outline atomics off in their toolchain (since outlining is ineffective without the hwcap based initializer that can select lse atomics at runtime).
[Bug target/94697] aarch64: bti j at function start instead of bti c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697 --- Comment #6 from nsz at gcc dot gnu.org --- this is fixed for gcc 10.1, just not backported yet so i kept the bug open
[Bug target/94891] New: aarch64: there is no way to strip PAC from a return address in c code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94891 Bug ID: 94891 Summary: aarch64: there is no way to strip PAC from a return address in c code Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- Neither __builtin_return_address nor __builtin_extract_return_address strips the pointer authentication code (PAC) when compiling with -mbranch-protection=pac-ret. Currently inline asm is the only way to get the actual return address in pac-ret code (xpaclri instruction strips PAC without authenticating the pointer), so users will have to disable pac-ret for code that uses the builtins or add aarch64 asm. It seems the only code that requires __builtin_return_address to return the signed return address is the libgcc unwinder so it seems that would be easier to fix than all other code. (Note that having PAC in __builtin_return_address is not compatible with ilp32 and thus currently pac-ret is disabled with -mabi=ilp32) __builtin_extract_return_addr is required to be invertible with __builtin_frob_return_addr which does not work for PAC. So it seems aarch64 needs new builtins or existing builtins need to change.
[Bug target/94791] New: aarch64: -pg profiling is broken with pac-ret
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94791 Bug ID: 94791 Summary: aarch64: -pg profiling is broken with pac-ret Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- int foo(int x) { return x; } gcc -pg -mbranch-protection=pac-ret gives foo: hint25 // paciasp stp x29, x30, [sp, -32]! mov x29, sp mov x1, x30 str w0, [sp, 28] mov x0, x1 // passing signed return address bl _mcount ldr w0, [sp, 28] ldp x29, x30, [sp], 32 hint29 // autiasp ret _mcount needs a valid code address as argument so different calls from the same call site can be correlated and the caller can be identified (e.g. with dladdr). either pac should be removed with xpaclri or x30 saved into another temp reg before paciasp.
[Bug target/94748] New: aarch64: many unnecessary bti j emitted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94748 Bug ID: 94748 Summary: aarch64: many unnecessary bti j emitted Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- __attribute__((target("branch-protection=bti"))) int foo(void) { label: return 0; } compiles to foo: hint34 // bti c hint36 // bti j mov w0, 0 ret the bti j is not necessary and bti j should be rarely emitted otherwise the security architecture is weakened.
[Bug target/94697] aarch64: bti j at function start instead of bti c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697 nsz at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |10.0
[Bug target/94729] New: aarch64: __attribute__((target("branch-protection=pac-ret"))) is accepted in ilp32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94729 Bug ID: 94729 Summary: aarch64: __attribute__((target("branch-protection=pac-ret"))) is accepted in ilp32 Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- -mbranch-protection=pac-ret is not supported in ilp32 so i would expect the related attribute to be a compile time error. (or a warning that it is ignored on ilp32) gcc generates pac-ret instructions with -mabi=ilp32 which will almost surely fail at runtime on a pac-ret enabled system: long bar(void); __attribute__((target("branch-protection=pac-ret"))) long foo(void) { return bar()+1; } becomes foo: hint25 // paciasp stp x29, x30, [sp, -16]! mov x29, sp bl bar add w0, w0, 1 ldp x29, x30, [sp], 16 hint29 // autiasp ret
[Bug target/94697] New: aarch64: bti j at function start instead of bti c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697 Bug ID: 94697 Summary: aarch64: bti j at function start instead of bti c Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- function that may be indirectly called does not start with bti c: void bar(int *); void *addr; int foo(int x) { label: addr=& bar(); return x; } with -O2 -mbranch-protection=bti+pac-ret foo: .L2: hint36 // bti j hint25 // paciasp adrpx1, .L2 stp x29, x30, [sp, -32]! add x1, x1, :lo12:.L2 adrpx2, .LANCHOR0 mov x29, sp str x1, [x2, #:lo12:.LANCHOR0] str w0, [sp, 28] add x0, sp, 28 bl bar ldr w0, [sp, 28] ldp x29, x30, [sp], 32 hint29 // autiasp ret .set.LANCHOR0,. + 0 addr: .zero 8 happens if function starts with a label that may be indirect jump target so a bti j is inserted, but there is a paciasp at the beginning which would normally act as implicit bti c when it's the first instruction.
[Bug target/94515] aarch64: broken unwind information for pac-ret
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94515 --- Comment #1 from nsz at gcc dot gnu.org --- i had a fix but it's not enough, so here is another test case: __attribute__((noreturn)) void unwind(void); int bar(void); int global; int foo(int x) { if (x==1) return 2; int y = bar(); if (y > global) global=y; if (y==3) unwind(); return 0; } -O2 -S -mbranch-protection=pac-ret the asm: foo: .cfi_startproc cmp w0, 1 beq .L4 hint25 // paciasp .cfi_window_save pauth on stp x29, x30, [sp, -16]! .cfi_def_cfa_offset 16 .cfi_offset 29, -16 .cfi_offset 30, -8 mov x29, sp bl bar mov w1, w0 adrpx2, .LANCHOR0 ldr w0, [x2, #:lo12:.LANCHOR0] cmp w0, w1 blt .L11 .L3: mov w0, 0 cmp w1, 3 beq .L12 ldp x29, x30, [sp], 16 .cfi_remember_state .cfi_restore 30 .cfi_restore 29 .cfi_def_cfa_offset 0 hint29 // autiasp .cfi_window_save pauth off ret .p2align 2,,3 .L11: .cfi_restore_state pauth on str w1, [x2, #:lo12:.LANCHOR0] b .L3 .p2align 2,,3 .L4: .cfi_def_cfa_offset 0 .cfi_restore 29 .cfi_restore 30 mov w0, 2 pauth should be off but it's on ret .L12: .cfi_def_cfa_offset 16 .cfi_offset 29, -16 .cfi_offset 30, -8 bl unwind .cfi_endproc
[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970 nsz at gcc dot gnu.org changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2020-04-18 --- Comment #11 from nsz at gcc dot gnu.org --- confirmed
[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970 nsz at gcc dot gnu.org changed: What|Removed |Added CC||bugdal at aerifal dot cx --- Comment #10 from nsz at gcc dot gnu.org --- *** Bug 94646 has been marked as a duplicate of this bug. ***
[Bug target/94646] [arm] invalid codegen for conversion from 64-bit int to double hardfloat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94646 nsz at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE CC||nsz at gcc dot gnu.org --- Comment #1 from nsz at gcc dot gnu.org --- dup *** This bug has been marked as a duplicate of bug 91970 ***
[Bug target/94515] New: aarch64: broken unwind information for pac-ret
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94515 Bug ID: 94515 Summary: aarch64: broken unwind information for pac-ret Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- pac-ret uses the .cfi_window_save directive to toggle between signed/unsigned return address, alternatively .cfi_remember_state and .cfi_restore_state pair can be used to keep track of the "return address signedness" state. in some cases, when there are several return paths, gcc fails to generate the correct cfi directives for all return paths which can cause the unwinder not to authenticate a signed return address leading to a runtime crash on pauth enabled systems. example c++ test that segfaults (after fixing bug 94514 ): volatile int zero = 0; __attribute__((noinline)) void unwind (void) { if (zero == 0) throw 42; } __attribute__((noinline,noipa)) static int test (int z) { if (z) { asm volatile("":::"x20","x21"); unwind(); return 1; } else { unwind(); return 2; } } int main () { try { test (zero); __builtin_abort (); } catch (...) { return 0; } __builtin_abort (); } the test() function with -mbranch-protection=standard -O2 compiles to _ZL4testi: .LFB1: .cfi_startproc hint25 // paciasp .cfi_window_save // pauth on stp x29, x30, [sp, -32]! .cfi_def_cfa_offset 32 .cfi_offset 29, -32 .cfi_offset 30, -24 mov x29, sp cbz w0, .L9 stp x20, x21, [sp, 16] .cfi_offset 21, -8 .cfi_offset 20, -16 bl _Z6unwindv mov w0, 1 ldp x20, x21, [sp, 16] .cfi_restore 21 .cfi_restore 20 ldp x29, x30, [sp], 32 .cfi_restore 30 .cfi_restore 29 .cfi_def_cfa_offset 0 hint29 // autiasp .cfi_window_save // pauth off ret .p2align 2,,3 .L9: ret addr pauth state is wrong here ! .cfi_def_cfa_offset 32 .cfi_offset 29, -32 .cfi_offset 30, -24 bl _Z6unwindv ldp x29, x30, [sp], 32 .cfi_restore 30 .cfi_restore 29 .cfi_def_cfa_offset 0 hint29 // autiasp .cfi_window_save mov w0, 2 ret .cfi_endproc .LFE1: .size _ZL4testi, .-_ZL4testi
[Bug target/94514] New: aarch64: unwinding across mixed pac-ret and non-pac-ret frames is broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94514 Bug ID: 94514 Summary: aarch64: unwinding across mixed pac-ret and non-pac-ret frames is broken Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- libgcc unwinder on aarch64 fails to keep track of pauth state and may try to authenticate return addresses that were not signed causing a runtime crash. example c++ code that segfaults in the unwinder on a pauth enabled system: __attribute__((noinline, target("branch-protection=pac-ret"))) static void do_throw (void) { throw 42; __builtin_abort (); } __attribute__((noinline, target("branch-protection=none"))) static void no_pac_ret (void) { do_throw (); __builtin_abort (); } int main () { try { no_pac_ret (); } catch (...) { return 0; } __builtin_abort (); }
[Bug libgomp/91938] libgomp (and libitm) DSOs are incorrectly built with initial-exec tls-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91938 --- Comment #7 from nsz at gcc dot gnu.org --- (In reply to Martin Liška from comment #6) > Can we close this issue now? as far as *-musl* is concerned the bug is fixed, but e.g. now android uses elf tls too, i'm not sure what happens there. i'm fine closing the bug with target milestone gcc-10 and let other *-linux* targets open new bugs if they care (don't reserve tls surplus).
[Bug target/92424] [aarch64] Broken code with -fpatchable-function-entry and BTI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 nsz at gcc dot gnu.org changed: What|Removed |Added Target|aarch64, x86|aarch64 Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |10.0 --- Comment #7 from nsz at gcc dot gnu.org --- fixed for gcc-10 and gcc-9.3, opened bug 93492 for the x86 case.
[Bug target/93492] New: Broken code with -fpatchable-function-entry and -fcf-protection=full
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93492 Bug ID: 93492 Summary: Broken code with -fpatchable-function-entry and -fcf-protection=full Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- x86 version of bug 92424 endbr64 is not right at the function label with -fcf-protection=full -fpatchable-function-entry=1 void f(){} is compiled to f: .section__patchable_function_entries,"aw",@progbits .quad .LPFE1 .text .LPFE1: nop .LFB0: .cfi_startproc endbr64 ret .cfi_endproc .LFE0: .size f, .-f
[Bug target/93455] New: aarch64: Q constraint address is recomputed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93455 Bug ID: 93455 Summary: aarch64: Q constraint address is recomputed Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- gcc may recompute the address used in a Q constraint (which may be used for atomic load and stores). static volatile int x[1]; int f() { int r; asm volatile ("A %w0 %1" : "=r"(r) : "Q"(*x)); asm volatile ("B %0" : "=Q"(*x)); return r; } with -O3 gcc generates f: adrpx1, .LANCHOR0 add x0, x1, :lo12:.LANCHOR0 A w0 [x0] add x1, x1, :lo12:.LANCHOR0 B [x1] ret i expected one address computation.
[Bug c/91113] add declare_simd_variant attribute support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91113 nsz at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WONTFIX --- Comment #1 from nsz at gcc dot gnu.org --- i think this will have to be done differently (the attribute syntax).
[Bug target/92424] [aarch64] Broken code with -fpatchable-function-entry and BTI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 nsz at gcc dot gnu.org changed: What|Removed |Added Target|aarch64 |aarch64, x86 CC||nsz at gcc dot gnu.org --- Comment #4 from nsz at gcc dot gnu.org --- also affects x86 with -fcf-protection=branch -fpatchable-function-entry=N that's the same issue so this should not be target specific.
[Bug target/92822] [10 Regression] testsuite failures on aarch64 after r278938
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92822 --- Comment #3 from nsz at gcc dot gnu.org --- it seems at least the following neon intrinsics are affected: float32x2_t vmulx_laneq_f32 (float32x2_t, float32x4_t, const int); float32x2_t vmul_laneq_f32 (float32x2_t, float32x4_t, const int); float32x2_t vfma_laneq_f32 (float32x2_t, float32x2_t, float32x4_t, const int); float32x2_t vfms_laneq_f32 (float32x2_t, float32x2_t, float32x4_t, const int); float64x1_t vmul_laneq_f64 (float64x1_t, float64x2_t, const int);
[Bug target/92822] [10 Regression] testsuite failures on aarch64 after r278938
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92822 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #2 from nsz at gcc dot gnu.org --- e.g. #include float32x2_t foo (float32x2_t v0, float32x4_t v1) { return vmulx_laneq_f32 (v0, v1, 0); } used to get translated to foo: fmulx v0.2s, v0.2s, v1.s[0] ret now it is foo: adrpx0, .LC0 ldr q2, [x0, #:lo12:.LC0] tbl v1.16b, {v1.16b}, v2.16b fmulx v0.2s, v0.2s, v1.2s ret .size foo, .-foo .section.rodata.cst16,"aM",@progbits,16 .align 4 .LC0: .byte 0 .byte 1 .byte 2 .byte 3 .byte 0 .byte 1 .byte 2 .byte 3 .byte 0 .byte 1 .byte 2 .byte 3 .byte 4 .byte 5 .byte 6 .byte 7
[Bug libgomp/91938] libgomp (and libitm) DSOs are incorrectly built with initial-exec tls-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91938 --- Comment #5 from nsz at gcc dot gnu.org --- Author: nsz Date: Tue Dec 3 11:13:38 2019 New Revision: 278932 URL: https://gcc.gnu.org/viewcvs?rev=278932=gcc=rev Log: musl: Fix invalid tls model in libgomp and libitm PR91938 Musl does not support initial-exec tls in dynamically loaded shared libraries. libgomp/ChangeLog: 2019-12-03 Szabolcs Nagy PR libgomp/91938 * configure.tgt: Avoid IE tls on *-*-musl*. libitm/ChangeLog: 2019-12-03 Szabolcs Nagy PR libgomp/91938 * configure.tgt: Avoid IE tls on *-*-musl*. Modified: trunk/libgomp/ChangeLog trunk/libgomp/configure.tgt trunk/libitm/ChangeLog trunk/libitm/configure.tgt
[Bug libgcc/91737] On Alpine Linux (libmusl) a statically linked C++ program which throws the first exception in two threads at the same time can busy spin on shutdown after main().
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737 nsz at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |10.0 --- Comment #6 from nsz at gcc dot gnu.org --- fixed in r278399 for gcc-10
[Bug target/65649] gcc generates overlarge constants for microblaze-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65649 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org Target Milestone|--- |10.0
[Bug target/65649] gcc generates overlarge constants for microblaze-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65649 --- Comment #7 from nsz at gcc dot gnu.org --- Author: nsz Date: Fri Nov 15 17:39:14 2019 New Revision: 278308 URL: https://gcc.gnu.org/viewcvs?rev=278308=gcc=rev Log: microblaze: fix PR65649 microblaze-linux-musl build fails without this. (This is a rebase of an earlier patch posted on bugzilla.) gcc/ChangeLog: 2019-11-15 Nick Clifton Szabolcs Nagy PR target/65649 * config/microblaze/microblaze.c (print_operand): Print value as long. Modified: trunk/gcc/ChangeLog trunk/gcc/config/microblaze/microblaze.c
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #31 from nsz at gcc dot gnu.org --- (In reply to Segher Boessenkool from comment #28) > [ "ws" needs at least a Power7, btw. ] powerpc64le-* implies power8 and that's where this came up.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #30 from nsz at gcc dot gnu.org --- i think it is not the end of the world if the asm constraint api changes in this case: fixing musl is easy because it's not super important to optimize fmin, fminf, fmax, fmaxf in libc (if it were important then gcc should inline them instead of calling into libc, currently it seems gcc is not able to do that without -ffast-math). the change breaks the build of old musl releases with new gcc, so as a general principle it makes more sense to me to keep documented apis working (e.g. when glibc removed ustat, the gcc devs asked for 5 years advance notice via deprecation warnings), but it's up to the gcc maintainers to decide.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #6 from nsz at gcc dot gnu.org --- (In reply to Segher Boessenkool from comment #5) > -- LLVM should support "wa", since that is *the* constraint for VSX > registers. > -- musl should use the "wa" constraint in its inline asm. > -- If after those two you still want "ws" (for compiling legacy code, say), I >can add that back to GCC 10 (it will do just the same as "wa"). > > Is that a plan? llvm only accepts vector types for wa, not scalar types, so there is a difference between wa and ws in llvm. i guess musl can switch to wa and configure check if it works (and disable the asm on compilers where it does not) but i would prefer if ws and ww were kept as alias to wa in gcc to avoid breaking existing code (this should not have huge maintenance cost).
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #2 from nsz at gcc dot gnu.org --- note that "ws" is now supported by clang, but "wa" is not.
[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 --- Comment #1 from nsz at gcc dot gnu.org --- seems to be broken since r271916
[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970 --- Comment #9 from nsz at gcc dot gnu.org --- ok i was looking at the wrong code, didn't know libgcc2, i agree that's the right way to fix this.
[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970 --- Comment #7 from nsz at gcc dot gnu.org --- i think the code snippet i posted is more efficient and significantly smaller than using libgcc (which also sounds hard to wire up to do the right thing). the code sequence can possibly be even inlined. (and i don't mind if ucontrollers with single precision only fpu don't have correct fenv behaviour)
[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970 --- Comment #5 from nsz at gcc dot gnu.org --- ok so the real problem is that libgcc does not define FP_INIT_ROUNDMODE and FP_HANDLE_EXCEPTIONS etc for hardfloat arm targets.
[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970 --- Comment #3 from nsz at gcc dot gnu.org --- (In reply to Andreas Schwab from comment #2) > Don't you need #pragma STDC FENV_ACCESS? yes, for iso c conformance you need it, but gcc does not handle it anyway, instead it requires -frounding-math. however if double prec instructions are available, using them may be even faster in the difficult inexact case, e.g. double uconv64(uint64_t x) { double lo = uconv32(x); // single instruction, always exact double hi = uconv32(x>>32); return lo + hi*0x1p32; } so i would not make the fix depend on -frounding-math, just always use hardfloat instructions on hardfloat targets to do the conv. (i suspect it affects more than just armhf)
[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970 --- Comment #1 from nsz at gcc dot gnu.org --- floating-point exceptions are also missing for the same reason.
[Bug target/91970] New: arm: 64bit int to double conversion does not respect rounding mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970 Bug ID: 91970 Summary: arm: 64bit int to double conversion does not respect rounding mode Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- on arm-* with #include #include int main() { long long x = (1LL << 60) - 1; double y; fesetround(FE_DOWNWARD); __asm__ __volatile__ ("" : "+m" (x)); y = x; __asm__ __volatile__ ("" : "+m" (y)); fesetround(FE_TONEAREST); printf("%a\n", y); } i get 0x1p60 instead of 0x1.fp+59 i assume this is because the conversion is handled by __aeabi_l2d (also known as __floatdidf in libgcc) which is not rounding mode aware. this affects hardfloat targets which otherwise support directed rounding modes.
[Bug libgomp/91938] libgomp (and libitm) DSOs are incorrectly built with initial-exec tls-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91938 --- Comment #3 from nsz at gcc dot gnu.org --- i opened a glibc bug https://sourceware.org/bugzilla/show_bug.cgi?id=25051 but i think this bug should be kept open for non *-linux*-gnu* targets.
[Bug libgomp/91938] libgomp (and libitm) DSOs are incorrectly built with initial-exec tls-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91938 --- Comment #2 from nsz at gcc dot gnu.org --- if you really want this optimization then libgomp has to do checks to guarantee that the target libc supports this usage and only enable it when it's 100% safe. (e.g. musl or bionic does not support this, my guess is nothing really supports this other than glibc and even glibc has trouble because users abuse it) i don't believe the unacceptable performance claims, since with tlsdesc there should be only a small performance difference between initial-exec tls vs general tls access, so instead of building broken binaries with initial-exec tls-model may be the tls-dialect should be changed to tlsdesc (when supported).
[Bug libgomp/91938] New: libgomp (and libitm) DSOs are incorrectly built with initial-exec tls-model
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91938 Bug ID: 91938 Summary: libgomp (and libitm) DSOs are incorrectly built with initial-exec tls-model Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- initial-exec tls is only valid in a dso if there is a guarantee that the dso is never dynamically loaded or the c runtime has tls reserved specifically for that dso to use. gcc target libs don't provide such guarantee nor glibc has special tls for libgomp or libitm so they are broken on *-linux*. optimizing tls access is only acceptable if it does not break correctness. (side note: initial-exec tls is required on glibc for as-safe tls access, if that's necessary then the fix will need glibc discussions, but the default should be safe for other libcs.) this hits targets like aarch64 and powerpc* harder where glibc can optimize dynamic tls in dsos to use the preallocated static tls area if available so it runs out faster than on targets where no such optimization is done. (initial-exec tls usage is actually less performance relevant on those targets for the same reason.) in principle that glibc logic can be changed to be more consistent across targets, but i would only support that if there is a way to coordinate the use of preallocated tls otherwise it's unsupportable. see the libc-apha discussion https://sourceware.org/ml/libc-alpha/2019-09/msg00512.html
[Bug c/82542] -fdump-lang-raw (formerly -fdump-translation-unit) no longer available for C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82542 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #13 from nsz at gcc dot gnu.org --- this option allows on <=gcc-7 to see all global identifiers (types, builtins, etc) that the compiler predefines, currently i don't see a way to do that for c. e.g. currently there is now way to tell what _FloatN variants gcc understands, even though -fdump-translation-unit with empty tu worked for it reliably previously. (i guess i can attach gdb to cc1 and hope there is enough debug info in cc1 to print things from gcc internal data structures.. but that's not exactly userfriendly)
[Bug target/91900] New: [10 regression] mipsisa64r6-*-* rejects lo clobber
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91900 Bug ID: 91900 Summary: [10 regression] mipsisa64r6-*-* rejects lo clobber Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- mips64 syscall code in musl is like #define __NR_getpid 5038 static inline long __syscall0(long n) { register long r7 __asm__("$7"); register long r2 __asm__("$2") = n; __asm__ __volatile__ ( "syscall" : "+"(r2), "=r"(r7) : : "$1", "$3", "$10", "$11", "$12", "$13", "$14", "$15", "$24", "$25", "hi", "lo", "memory"); return r7 ? -r2 : r2; } int getpid() { return __syscall0(__NR_getpid); } because linux clobbers all sorts of registers. this compiles with mips64-linux-musl-gcc and up to gcc-9 with mipsisa64r6-linux-musl-gcc too, but mipsisa64r6-* fails with trunk gcc (gcc version 10.0.0 20190924): t.c: In function '__syscall0': t.c:7:2: error: the register 'lo' cannot be clobbered in 'asm' for the current target 7 | __asm__ __volatile__ ( | ^~~
[Bug target/91886] New: [10 regression] powerpc64 impossible register constraint in asm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886 Bug ID: 91886 Summary: [10 regression] powerpc64 impossible register constraint in asm Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- this used to work for me: double fmax(double x, double y) { __asm__ ("xsmaxdp %x0, %x1, %x2" : "=ws"(x) : "ws"(x), "ws"(y)); return x; } compiled to fmax: xsmaxdp 1, 1, 2 blr now (gcc version 10.0.0 20190924) i get fmax.c: In function 'fmax': fmax.c:3:2: error: impossible constraint in 'asm' 3 | __asm__ ("xsmaxdp %x0, %x1, %x2" : "=ws"(x) : "ws"(x), "ws"(y)); | ^~~
[Bug c++/91809] New: in c++ bit-field is not promoted to int in printf argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91809 Bug ID: 91809 Summary: in c++ bit-field is not promoted to int in printf argument Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- may be a -Wformat bug only, but the c++ front-end seems to use the wrong type: #include struct X { unsigned long long a: 1; } x; void foo() { printf("%d", x.a); } gcc -Wformat -xc++ says a.c: In function 'void foo()': a.c:9:12: warning: format '%d' expects argument of type 'int', but argument 2 has type 'long long unsigned int' [-Wformat=] 9 | printf("%d", x.a); | ~^ ~~~ || | |int long long unsigned int | %lld the warning is not present with -xc, which is the expected behaviour: bit-field should be promoted to int in this context, i don't think c++ should behave differently. (not a new regression, at least present since gcc-4.8)
[Bug libgcc/91737] On Alpine Linux (libmusl) a statically linked C++ program which throws the first exception in two threads at the same time can busy spin on shutdown after main().
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737 nsz at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|NEW Last reconfirmed||2019-09-17 CC||nsz at gcc dot gnu.org Resolution|MOVED |--- Ever confirmed|0 |1 --- Comment #5 from nsz at gcc dot gnu.org --- (In reply to Andrew Pinski from comment #1) > Glibc has a similar bug and been discussed how to fix it. > The way Glibc is going to fix it (though it has not yet) is that > libpthread.a will be really just include one object file which includes all > of the pthread library. citation needed. the plan in glibc is to provide a "is single threaded" api. https://sourceware.org/ml/libc-alpha/2019-08/msg00438.html once that's in then in principle any library (like libstdc++) can do single thread optimizations without hacks. (another glibc plan is to move libpthread.so into libc.so so there are no awkward internal abis between them and then avoiding pthread dependency is no longer relevant.) i think that should work for the unwinder in libgcc too. on the musl side, we want to disable this hack before that happens, it's better to not do any single thread optimizations than silently breaking things. so the right fix is something equivalent to https://gcc.gnu.org/viewcvs/gcc?view=revision=222329 i.e. libgcc should be compiled with GTHREAD_USE_WEAK=0 on *musl*.
[Bug tree-optimization/91723] New: builtin fma is not optimized or vectorized as *+
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91723 Bug ID: 91723 Summary: builtin fma is not optimized or vectorized as *+ Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- i'd expect a*b+c to generate the same code as __builtin_fmaf(a,b,c) when hw instruction is available for fmaf, but the later generates significantly worst code in some cases, e.g. when vectorization is involved. consider: void foo (float *restrict r, const float *restrict a, const float *restrict b, const float *restrict c) { for (int i=0; i < 4; i++) { float x; #ifdef BUILTIN x = __builtin_fmaf(a[i],b[i],c[i]); x = __builtin_fmaf(a[i],b[i],x); #else x = a[i]*b[i]+c[i]; x = a[i]*b[i]+x; #endif r[i] = x; } } with gcc -O3 -mfma -mavx -ffp-contract=fast -fno-math-errno i get good code: foo: vmovups (%rdx), %xmm0 vmovups (%rsi), %xmm1 vmovaps %xmm0, %xmm2 vfmadd213ps (%rcx), %xmm1, %xmm2 vfmadd132ps %xmm1, %xmm2, %xmm0 vmovups %xmm0, (%rdi) ret but if i add -DBUILTIN i get foo: vmovss (%rsi), %xmm0 vmovss (%rdx), %xmm1 vmovaps %xmm0, %xmm2 vfmadd213ss (%rcx), %xmm1, %xmm2 vfmadd132ss %xmm1, %xmm2, %xmm0 vmovss 4(%rdx), %xmm1 vmovss %xmm0, (%rdi) vmovss 4(%rsi), %xmm0 vmovaps %xmm0, %xmm2 vfmadd213ss 4(%rcx), %xmm1, %xmm2 vfmadd132ss %xmm1, %xmm2, %xmm0 vmovss 8(%rdx), %xmm1 vmovss %xmm0, 4(%rdi) vmovss 8(%rsi), %xmm0 vmovaps %xmm0, %xmm2 vfmadd213ss 8(%rcx), %xmm1, %xmm2 vfmadd132ss %xmm1, %xmm2, %xmm0 vmovss 12(%rdx), %xmm1 vmovss %xmm0, 8(%rdi) vmovss 12(%rsi), %xmm0 vmovaps %xmm0, %xmm2 vfmadd213ss 12(%rcx), %xmm1, %xmm2 vfmadd132ss %xmm1, %xmm2, %xmm0 vmovss %xmm0, 12(%rdi) ret i expected identical results, the same happens on other targets.
[Bug lto/91299] LTO inlines a weak definition in presence of a non-weak definition from an ELF file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91299 nsz at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-07-30 CC||nsz at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from nsz at gcc dot gnu.org --- happens on trunk gcc too and target independent.
[Bug c/91113] New: add declare_simd_variant attribute support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91113 Bug ID: 91113 Summary: add declare_simd_variant attribute support Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- to declare vector functions on aarch64 for one simd architecture only, support for the openmp 5.0 declare variant syntax is required, but full support for the omp declare variant pragma is excessive. (for the aarch64 use-case, see user defined vector functions in https://developer.arm.com/docs/101129/latest ) I suggest introducing an attribute in gcc that can handle a subset of omp declare variant pragma and works in c and fortran declarations for declare simd functions. I think the syntax and semantics for the attribute should follow the proposal for clang (without the clang_ prefix): http://lists.llvm.org/pipermail/llvm-dev/2019-June/132987.html ``` declare_simd_variant (, {, }) := The name of a function variant that is a base language identifier, or, for C++, a template-id. := , {, } := simdlen() | simdlen("scalable") := inbranch | notinbranch := | | | {,} := linear_ref(,) | linear_var(, ) | linear_uval(, ) | linear(, ) := | := uniform() := align(, ) := Name of a parameter in the scalar function declaration/definition := ... | -2 | -1 | 1 | 2 | ... := 1 | 2 | 3 | ... := {}{,} {} := isa(target-specific-value) := arch(target-specific-value) ``` example usage: ``` __attribute__(declare_simd_variant("vfoo", simdlen(2), notinbranch, isa("simd")) double foo(double x); float64x2_t vfoo(float64x2_t vx); ``` should be equivalent to the openmp 5.0 code ``` #pragma omp declare variant(vfoo) \ match(construct={simd(simdlen(2), notinbranch)}, device={isa("simd")}) double foo(double x); float64x2_t vfoo(float64x2_t vx); ```
[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #9 from nsz at gcc dot gnu.org --- spec2017 521.wrf_r never finishes on aarch64 gcc rev 271291 runs fine gcc rev 271380 does not finish (possibly a crash that the spec scripts don't detect)
[Bug middle-end/90478] [10 Regression] ICE in emit_case_dispatch_table at gcc/stmt.c:796
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90478 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #8 from nsz at gcc dot gnu.org --- i see FAIL: gcc.dg/tree-ssa/pr90478-2.c (internal compiler error) on aarch64-none-elf, aarch64_be-none-elf, arm-none-eabi targets.
[Bug target/89628] New: aarch64_vector_pcs does not use v24-v31 as temp regs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89628 Bug ID: 89628 Summary: aarch64_vector_pcs does not use v24-v31 as temp regs Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: nsz at gcc dot gnu.org Target Milestone: --- consider typedef __Float32x4_t vec; __attribute__((aarch64_vector_pcs)) vec f(vec a0, vec a1, vec a2, vec a3, vec a4, vec a5, vec a6, vec a7) { vec t0, t1, t2, t3, t4, t5, t6, t7, s0, s1, s2, s3; t0 = a0 - a7; t1 = a1 - a6; t2 = a2 - a5; t3 = a3 - a4; t4 = a4 - a3; t5 = a5 - a2; t6 = a6 - a1; t7 = a7 - a0; s0 = t0 * t1; s1 = t2 * t3; s2 = t4 * t5; s3 = t6 * t7; return s0 * s1 * s2 * s3 * a0 * a1 * a2 * a3 * a4 * a5 * a6 * a7; } the aarch64 vpcs has 8 arg + 8 temp regs to use, so i think such code should not need to spill, however current gcc seems to compile it as f: stp q16, q17, [sp, -96]! fsubv16.4s, v2.4s, v5.4s stp q18, q19, [sp, 32] fsubv17.4s, v0.4s, v7.4s stp q20, q21, [sp, 64] fsubv18.4s, v1.4s, v6.4s fsubv20.4s, v3.4s, v4.4s fsubv21.4s, v5.4s, v2.4s fsubv19.4s, v4.4s, v3.4s fmulv17.4s, v17.4s, v18.4s fmulv16.4s, v16.4s, v20.4s fsubv18.4s, v6.4s, v1.4s fsubv20.4s, v7.4s, v0.4s fmulv19.4s, v19.4s, v21.4s fmulv16.4s, v17.4s, v16.4s fmulv17.4s, v18.4s, v20.4s ldp q20, q21, [sp, 64] fmulv16.4s, v16.4s, v19.4s ldp q18, q19, [sp, 32] fmulv16.4s, v16.4s, v17.4s fmulv16.4s, v16.4s, v0.4s fmulv1.4s, v16.4s, v1.4s ldp q16, q17, [sp], 96 fmulv2.4s, v1.4s, v2.4s fmulv3.4s, v2.4s, v3.4s fmulv4.4s, v3.4s, v4.4s fmulv5.4s, v4.4s, v5.4s fmulv6.4s, v5.4s, v6.4s fmulv0.4s, v6.4s, v7.4s ret note that v24..v31 regs are not used but there are 6 spills.
[Bug libfortran/78314] [aarch64] ieee_support_halting does not report unsupported fpu traps correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314 --- Comment #18 from nsz at gcc dot gnu.org --- (In reply to Christophe Lyon from comment #16) > I've noticed this problem on arm and aarch64 native builds too. > But my cross-compilers (using QEMU as simulator) still pass this test. Does > this mean there is a bug in QEMU? qemu-user will just translate each guest fp operations to host fp operations, so if the host supports traps then you will see traps working. it's not a bug in the sense that the arm architecture allows trap support (it's just not required), but it's buggy that it would not report the support correctly (e.g. enabling traps always succeed under qemu but traps don't happen if the underlying hw has no support)
[Bug libfortran/78314] [aarch64] ieee_support_halting does not report unsupported fpu traps correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314 nsz at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|nsz at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #15 from nsz at gcc dot gnu.org --- i unassigned myself as i'm not working on this right now.
[Bug libfortran/78314] [aarch64] ieee_support_halting does not report unsupported fpu traps correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314 --- Comment #14 from nsz at gcc dot gnu.org --- (In reply to Uroš Bizjak from comment #13) > (In reply to nsz from comment #12) > > i don't know how to change this to false for IEEE_SUPPORT_HALTING > > on aarch64 and arm targets, but that would be a possible fix. > > --cut here-- > Index: libgfortran/config/fpu-glibc.h that only turns the runtime check into "always false" but the compile time check is still "always true". which is still broken.
[Bug libfortran/78314] [aarch64] ieee_support_halting does not report unsupported fpu traps correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314 --- Comment #12 from nsz at gcc dot gnu.org --- this got reverted because of bug 88678 and because compile time and runtime support_halting are different. the compile time value is unconditionally true, which is wrong for aarch64 and arm: gcc/fortran/simplify.c: gfc_expr * simplify_ieee_support (gfc_expr *expr) { /* We consider that if the IEEE modules are loaded, we have full support for flags, halting and rounding, which are the three functions (IEEE_SUPPORT_{FLAG,HALTING,ROUNDING}) allowed in constant expressions. One day, we will need libgfortran to detect support and communicate it back to us, allowing for partial support. */ return gfc_get_logical_expr (gfc_default_logical_kind, >where, true); } i don't know how to change this to false for IEEE_SUPPORT_HALTING on aarch64 and arm targets, but that would be a possible fix.
[Bug fortran/88678] [9 regression] Many gfortran.dg/ieee/ieee_X.f90 test cases fail starting with r267465
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88678 --- Comment #21 from nsz at gcc dot gnu.org --- this fix undid the change for bug 78314 do you plan to backport it to gcc 7,8 branches ? note that in principle on targets where trapping is not supported the "immediate alternate exception handling" mechanism of ieee 754 can be emulated by save/clear/check/restore status flags around each fp operation, but i don't think gcc currently supports that (and it's not very practical unless somebody uses it for debugging fp issues).
[Bug fortran/88678] [9 regression] Many gfortran.dg/ieee/ieee_X.f90 test cases fail starting with r267465
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88678 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #19 from nsz at gcc dot gnu.org --- that code was there for a reason.. now aarch64 fails because it cannot detect if the flags are supported or not. so if detection is turned off then on aarch64 "supports trapping" should always be false and likewise on any target that allows an implementation without trapping exceptions.
[Bug target/88954] __attribute__((noplt)) doesn't work with function pointers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88954 nsz at gcc dot gnu.org changed: What|Removed |Added CC||nsz at gcc dot gnu.org --- Comment #7 from nsz at gcc dot gnu.org --- note that with void f_noplt(void) __attribute__((noplt)); void (*p)(void) = f_noplt; the linker may create a PLT for f_noplt and use its address to initialize p in case of non-pie linking. alternatively the linker may emit a dynamic relocation for p so it is filled in by the dynamic linker to the actual address of f_noplt. it seems the bfd linker on x86_64 does the latter (if there is otherwise no PLT), but e.g. the gold linker does the former. (as far as the sysv abi is concerned both behaviours are correct, the linker does not know about the noplt attr.) this means that (depending on linker behaviour) a noplt function may get a PLT in non-pie executables (so noplt can only avoid lazy binding and jump slot relocs reliably in pic code), may be linkers should be fixed so noplt always avoids PLT (on x86_64, other targets have other issues with non-pic), but then this has to be abi to be reliable.