https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112465
Bug ID: 112465 Summary: libgcc: aarch64: lse runtime does not work with big data segments Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: jemarch at gcc dot gnu.org Target Milestone: --- While compiling and linking the STREAM benchmark (http://www.cs.virginia.edu/stream/ref.html) in aarch64 with very big arrays, this happens: $ gcc -O2 -DSTREAM_ARRAY_SIZE=178956970 -mcmodel=large -fopenmp -o stream.4gb stream.c libgcc.a(lse-init.o): in function `init_have_lse_atomics': (.text.startup+0x14): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against `.bss' libgcc.a(ldadd_4_1.o): in function `__aarch64_ldadd4_relax': (.text+0x4): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21 against symbol `__aarch64_have_lse_atomics' defined in .bss section in collect2: error: ld returned 1 exit status The LSE machinery in libgcc relies on the fact that the global __aarch64_have_lse_atomics is reachable within 4GiB. This is due to code like this: .macro JUMP_IF_NOT_LSE label adrp x(tmp0), __aarch64_have_lse_atomics ldrb w(tmp0), [x(tmp0), :lo12:__aarch64_have_lse_atomics] cbz w(tmp0), \label .endm That is put in the prologue in all LSE instructions in libcc (such as __aarch64_ldadd4_relax in the little reproducer below) and in the initialization routine also part of libgcc: static void __attribute__((constructor (90))) init_have_lse_atomics (void) { unsigned long hwcap = __getauxval (AT_HWCAP); __aarch64_have_lse_atomics = (hwcap & HWCAP_ATOMICS) != 0; } The code compiled for the last assignment in that function also makes use of an instruction sequence using adrp. The addressing mode implemented by adrp+ldrb allows to access +-4GiB. In the stream.c benchmark, and also in this little reproducer: static int foo; static double a[178956970],b[178956970],c[178956970]; int main () { #pragma omp atomic foo++; return foo + a[0] + b[0] + c[0]; } The variables a, b and c get allocated as bss. Now, it happens that __aarch64_have_lse_atomics also goes to the bss: /* Define the symbol gating the LSE implementations. */ _Bool __aarch64_have_lse_atomics __attribute__((visibility("hidden"), nocommon)); But _after_ a, b and c. So it is the offset of __aarch64_have_lse_atomics within the bss that is overflowing the relocation for the adrp instruction.