[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 Florian Weimer changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #18 from Florian Weimer --- This is a glibc bug.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #17 from Florian Weimer --- Jakub's glibc test failures were due to --prefix=/usr/local, so that glibc wouldn't find the installed system libgcc_s in /usr/lib64.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #16 from Jakub Jelinek --- In the glibc bugzilla or on private IRC I've suggested either: diff --git a/sysdeps/x86_64/nptl/tls.h b/sysdeps/x86_64/nptl/tls.h index a08bf972de..ccb5f24d92 100644 --- a/sysdeps/x86_64/nptl/tls.h +++ b/sysdeps/x86_64/nptl/tls.h @@ -180,11 +180,16 @@ _Static_assert (offsetof (tcbhead_t, __glibc_unused2) == 0x80, assignments like pthread_descr self = thread_self(); do not get optimized away. */ -# define THREAD_SELF \ +# if __GNUC_PREREQ (6, 0) +# define THREAD_SELF \ + (*(struct pthread *__seg_fs *) offsetof (struct pthread, header.self)) +# else +# define THREAD_SELF \ ({ struct pthread *__self; \ asm ("mov %%fs:%c1,%0" : "=r" (__self) \ : "i" (offsetof (struct pthread, header.self)));\ __self;}) +# endif /* Magic for libthread_db to know how to do THREAD_SELF. */ # define DB_THREAD_SELF_INCLUDE /* For the FS constant. */ or diff --git a/sysdeps/x86_64/nptl/tls.h b/sysdeps/x86_64/nptl/tls.h index a08bf972de..f3cbae9cba 100644 --- a/sysdeps/x86_64/nptl/tls.h +++ b/sysdeps/x86_64/nptl/tls.h @@ -180,11 +180,21 @@ _Static_assert (offsetof (tcbhead_t, __glibc_unused2) == 0x80, assignments like pthread_descr self = thread_self(); do not get optimized away. */ -# define THREAD_SELF \ +# if __GNUC_PREREQ (6, 0) +# define THREAD_SELF \ + ({ struct pthread *__self; \ + asm ("mov %%fs:%c1,%0" : "=r" (__self) \ + : "i" (offsetof (struct pthread, header.self)), \ + "m" (*(struct pthread *__seg_fs *)\ +offsetof (struct pthread, header.self)));\ + __self;}) +# else +# define THREAD_SELF \ ({ struct pthread *__self; \ asm ("mov %%fs:%c1,%0" : "=r" (__self) \ : "i" (offsetof (struct pthread, header.self)));\ __self;}) +# endif /* Magic for libthread_db to know how to do THREAD_SELF. */ # define DB_THREAD_SELF_INCLUDE /* For the FS constant. */ With make check I'm seeing 153 glibc FAILs with trunk with the first patch, about to make check the second one, but maybe I'm doing something wrong or there are other gcc or glibc bugs unrelated to this one. I think it would be good to try these patches with GCC 10 first to see if those patches don't break stuff with known working compilers and only then look at the failed tests if any. Florian said he'll take it over on the glibc side.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #15 from Richard Biener --- (In reply to Jakub Jelinek from comment #13) > And no, the asm isn't marked volatile, that would have prevented it too: > # define THREAD_SELF \ > ({ struct pthread *__self; > \ > asm ("mov %%fs:%c1,%0" : "=r" (__self) > \ > : "i" (offsetof (struct pthread, header.self))); > \ > __self;}) > This all boils down to: > void > foo (int *p) > { > for (int i = 0; i < 64; i++) > { > if (p[i]) > { > int *q; > asm ("mov %%fs:%c1,%0" : "=r" (q) : "i" (16)); > q[0]++; > } > } > } > which would hoist the inline asm to the function prologue even in GCC 4.1. I suppose we'd need to mark it possibly trapping (which it actually does), but I think there's no good way to do this.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #14 from Jakub Jelinek --- I guess with GCC 6 and later one can use: void foo (int *p) { for (int i = 0; i < 64; i++) { if (p[i]) { int *q; //asm ("mov %%fs:%c1,%0" : "=r" (q) : "i" (16)); q = *(int *__seg_fs *) 16; q[0]++; } } } instead (but I haven't done any testing beyond this testcase), for older GCC I guess you can keep using what it did because we've been lucky (dunno why).
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #13 from Jakub Jelinek --- And no, the asm isn't marked volatile, that would have prevented it too: # define THREAD_SELF \ ({ struct pthread *__self; \ asm ("mov %%fs:%c1,%0" : "=r" (__self) \ : "i" (offsetof (struct pthread, header.self)));\ __self;}) This all boils down to: void foo (int *p) { for (int i = 0; i < 64; i++) { if (p[i]) { int *q; asm ("mov %%fs:%c1,%0" : "=r" (q) : "i" (16)); q[0]++; } } } which would hoist the inline asm to the function prologue even in GCC 4.1.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #12 from Florian Weimer --- (In reply to Jakub Jelinek from comment #11) > Ah, RTL loop_invariant. Perhaps because the inline asm is buggy? > asm ("mov %%fs:%c1,%0" : "=r" (__self) : "i" (__builtin_offsetof (struct > pthread, header.self))); > The only input of the asm is the constant, so really nothing tells the > optimizers it can't move it arbitrarily. I think it needs to have some > memory input or clobber to properly model that it reads from unknown memory. Ahh, do you mean the THREAD_SELF macro? # define THREAD_SELF \ ({ struct pthread *__self; \ asm ("mov %%fs:%c1,%0" : "=r" (__self) \ : "i" (offsetof (struct pthread, header.self)));\ __self;}) I had only looked at the other macros. glibc relies on GCC optimizing away THREAD_SELF reads on x86-64 because most of the TCB access does not need it.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #11 from Jakub Jelinek --- Ah, RTL loop_invariant. Perhaps because the inline asm is buggy? asm ("mov %%fs:%c1,%0" : "=r" (__self) : "i" (__builtin_offsetof (struct pthread, header.self))); The only input of the asm is the constant, so really nothing tells the optimizers it can't move it arbitrarily. I think it needs to have some memory input or clobber to properly model that it reads from unknown memory.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #10 from Jakub Jelinek --- Looking at *.optimized dump, mov %%fs:%c1,%0 appears in there only bb 64, guarded by flags_90(D) & 4 check, so exactly the check that appears in the source code - flags & DL_LOOKUP_GSCOPE_LOCK (among various other earlier checks). But in the assembly it is unconditional at the start of the function, so something must have hoisted that asm insn to the start of the function.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #9 from Florian Weimer --- And we should not end up in the add_dependency part, either because l_type won't be lt_loaded and the DL_LOOKUP_ADD_DEPENDENCY flag hasn't been set, either. The inline asm is marked as volatile, and that should prevent it from being moved across these checks.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #8 from Florian Weimer --- (In reply to Jakub Jelinek from comment #6) > on the mov %fs:0x10,%rax perhaps %fs isn't initialized yet? Yes, that's why the access is guarded by flags & DL_LOOKUP_GSCOPE_LOCK. During initial relocation, _dl_lookup_symbol_x is called without this flag.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #7 from Martin Liška --- Hm, adding -fno-ipa-sra does not help while -O1 does in order to remove the miscompilation.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #6 from Jakub Jelinek --- The crash seems to be: => 0x77fdbe20 <_dl_lookup_symbol_x+0>: push %r15 0x77fdbe22 <_dl_lookup_symbol_x+2>: push %r14 0x77fdbe24 <_dl_lookup_symbol_x+4>: push %r13 0x77fdbe26 <_dl_lookup_symbol_x+6>: push %r12 0x77fdbe28 <_dl_lookup_symbol_x+8>: mov%rdi,%r12 0x77fdbe2b <_dl_lookup_symbol_x+11>: push %rbp 0x77fdbe2c <_dl_lookup_symbol_x+12>: mov%rdx,%rbp 0x77fdbe2f <_dl_lookup_symbol_x+15>: push %rbx 0x77fdbe30 <_dl_lookup_symbol_x+16>: mov%fs:0x10,%rax 0x77fdbe39 <_dl_lookup_symbol_x+25>: sub$0xa8,%rsp End of assembler dump. (gdb) bt #0 _dl_lookup_symbol_x (undef_name=0x77ff414a "__vdso_clock_gettime", undef_map=0x77ffe850, ref=0x7fffd598, symbol_scope=0x77ffebe8, version=0x7fffd5c0, type_class=0, flags=0, skip_map=0x0) at dl-lookup.c:835 #1 0x77fd463f in dl_vdso_vsym (name=0x77ff414a "__vdso_clock_gettime") at ../sysdeps/unix/sysv/linux/dl-vdso.h:52 #2 setup_vdso_pointers () at ../sysdeps/unix/sysv/linux/dl-vdso-setup.h:30 #3 dl_main (phdr=, phnum=12, user_entry=, auxv=0x7fffdac8) at rtld.c:1620 #4 0x77feb1e7 in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7fffd8f0, dl_main=dl_main@entry=0x77fd34e0 ) at ../elf/dl-sysdep.c:252 #5 0x77fd3015 in _dl_start_final (arg=0x7fffd8f0) at rtld.c:485 #6 _dl_start (arg=0x7fffd8f0) at rtld.c:578 #7 0x77fd2058 in _start () on the mov %fs:0x10,%rax perhaps %fs isn't initialized yet?
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #5 from Jakub Jelinek --- (In reply to Martin Liška from comment #3) > Do we have a simple reproducer for the miscompiled Glibc library? All one needs to do is with glibc git trunk CC=trunk-gcc CXX=trunk-g++ ../configure --disable-sanity-check make -jN elf/ld.so ./libc.so.6 Segmentation fault (core dumped)
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 --- Comment #4 from Jakub Jelinek --- _value is passed to do_lookup_x call which does modify it in some cases, e.g. result->s = sym; result->m = (struct link_map *) map; or passes it to other function, do_lookup_unique, which can perform similar changes. But it happens only early: for (size_t start = i; *scope != # 854 "dl-lookup.c" 3 4 ((void *)0) # 854 "dl-lookup.c" ; start = 0, ++scope) if (do_lookup_x (undef_name, new_hash, _hash, *ref, _value, *scope, start, version, flags, skip_map, type_class, undef_map) != 0) break; and then current_value is no longer modified, so unless I've missed do_lookup_x remembering the address of current_value somewhere, that optimization seems to be ok.
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #3 from Martin Liška --- Do we have a simple reproducer for the miscompiled Glibc library?
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- I see first real code changes between r11-5028 and r11-5029 during fre3 which optimizes away some rereads of current_value.m in _dl_lookup_symbol_x, which passes its address to other functions. # DEBUG BEGIN_STMT # PT = nonlocal escaped null _47 = current_value.m; _48 = _47->l_used; _49 = _48 == 0; ... _335 = val.m; - # PT = nonlocal escaped null - _336 = current_value.m; - if (_335 != _336) + if (_47 != _335) ... [local count: 3147328]: - # PT = nonlocal escaped null - _383 = current_value.m; - iftmp.45_384 = _383->l_map_start; + iftmp.45_384 = _47->l_map_start; ... - # PT = nonlocal escaped null - _120 = current_value.m; [local count: 114862414]: # PT = nonlocal escaped null - # _66 = PHI <0B(30), _116(153), _120(216)> + # _66 = PHI <0B(30), _116(153), _47(216)> (plus of course since the modref pass points to info changes).
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 Jakub Jelinek changed: What|Removed |Added CC||romain.geissler at amadeus dot com --- Comment #1 from Jakub Jelinek --- *** Bug 98106 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 H.J. Lu changed: What|Removed |Added Target Milestone|--- |11.0
[Bug tree-optimization/98110] [11 Regression] dl-lookup.c in glibc is miscompiled by r11-5029
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98110 H.J. Lu changed: What|Removed |Added Priority|P3 |P1