[Bug 1863162]
The release/2.34/master branch has been updated by Florian Weimer : https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=024a7640ab9ecea80e527f4e4d7f7a1868e952c5 commit 024a7640ab9ecea80e527f4e4d7f7a1868e952c5 Author: Szabolcs Nagy Date: Wed Sep 15 15:16:19 2021 +0100 elf: Avoid deadlock between pthread_create and ctors [BZ #28357] The fix for bug 19329 caused a regression such that pthread_create can deadlock when concurrent ctors from dlopen are waiting for it to finish. Use a new GL(dl_load_tls_lock) in pthread_create that is not taken around ctors in dlopen. The new lock is also used in __tls_get_addr instead of GL(dl_load_lock). The new lock is held in _dl_open_worker and _dl_close_worker around most of the logic before/after the init/fini routines. When init/fini routines are running then TLS is in a consistent, usable state. In _dl_open_worker the new lock requires catching and reraising dlopen failures that happen in the critical section. The new lock is reinitialized in a fork child, to keep the existing behaviour and it is kept recursive in case malloc interposition or TLS access from signal handlers can retake it. It is not obvious if this is necessary or helps, but avoids changing the preexisting behaviour. The new lock may be more appropriate for dl_iterate_phdr too than GL(dl_load_write_lock), since TLS state of an incompletely loaded module may be accessed. If the new lock can replace the old one, that can be a separate change. Fixes bug 28357. Reviewed-by: Adhemerval Zanella (cherry picked from commit 83b5323261bb72313bffcf37476c1b8f0847c736) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
The master branch has been updated by Szabolcs Nagy : https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=83b5323261bb72313bffcf37476c1b8f0847c736 commit 83b5323261bb72313bffcf37476c1b8f0847c736 Author: Szabolcs Nagy Date: Wed Sep 15 15:16:19 2021 +0100 elf: Avoid deadlock between pthread_create and ctors [BZ #28357] The fix for bug 19329 caused a regression such that pthread_create can deadlock when concurrent ctors from dlopen are waiting for it to finish. Use a new GL(dl_load_tls_lock) in pthread_create that is not taken around ctors in dlopen. The new lock is also used in __tls_get_addr instead of GL(dl_load_lock). The new lock is held in _dl_open_worker and _dl_close_worker around most of the logic before/after the init/fini routines. When init/fini routines are running then TLS is in a consistent, usable state. In _dl_open_worker the new lock requires catching and reraising dlopen failures that happen in the critical section. The new lock is reinitialized in a fork child, to keep the existing behaviour and it is kept recursive in case malloc interposition or TLS access from signal handlers can retake it. It is not obvious if this is necessary or helps, but avoids changing the preexisting behaviour. The new lock may be more appropriate for dl_iterate_phdr too than GL(dl_load_write_lock), since TLS state of an incompletely loaded module may be accessed. If the new lock can replace the old one, that can be a separate change. Fixes bug 28357. Reviewed-by: Adhemerval Zanella -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
i opened bug 28357 for the ABBA deadlock in pthread_create. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
i'm trying to fix the ABBA deadlock by introducing a new lock into dlopen that protects tls state and gets released before init functions are run. using that lock at thread creation would fix the issue. this only requires a small amount of changes, but it seems to be difficult to ensure that the new lock is released on all failure paths within _dl_open_worker (which sometimes uses longjmp for error handling). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
(In reply to xujing from comment #39) > I'm sorry, I misled you. I think there is an ABBA deadlock issue in some > scenarios. > > If I have a c++ dynamic library(named libA.so) that contains a global > object, the global object will call the post-constructor at initialization > and hold it's own lock(named A_lock) when dlopen loads libA.so. Assume that > two threads execute the following process: > Thread1:dlopen(libA.so) => hold dl_load_lock => load libA.so => init > global > object from libA.so => wait for hold A_lock > Thread2:my own code hold A_lock => pthread_create => > _dl_allocate_tls_init > => wait for hold dl_load_lock > In this case, an ABBA deadlock occurs. Is this a bug? yes i think this should work (it is a lock ordering issue between a user and libc internal lock, which is only possible if user code is run while a libc lock is held) note that if you replace pthread_create with dlopen that deadlocks too. so it's still bug 15686. but it may be more common than i expected. i think we need to look at fixing that bug. (fixing the dynamic tls race of this bug without locks in pthread_create is very hard, so i don't think we can revert the quoted patch) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
(In reply to Szabolcs Nagy from comment #38) > (In reply to xujing from comment #35) > > (In reply to cvs-com...@gcc.gnu.org from comment #31) > > > commit 1387ad6225cf027790e3f460e31aa5dd2c54 > > > Author: Szabolcs Nagy > > > Date: Wed Dec 30 19:19:37 2020 + > > > > > > elf: Fix data races in pthread_create and TLS access [BZ #19329] > > > > > this patch use dl_load_lock in _dl_allocate_tls_init, is there a problem > > when dlopen a dynamic library which will call pthread_create? I think it > > will cause dl_load_lock and dl_load_lock dead lock. > > the real bug is that ctors are run with the dlopen lock held. > that can causes deadlocks anyway (a ctor can create threads > and that thread can call dlopen). this is bug 15686 which is not > easy to fix, but that's the right solution. (in general, running > user callbacks while libc internal locks are held is wrong.) > > that bug is now more exposed because the lock is also taken > at _dl_allocate_tls_init during thread creation. however i > expect that to be called in the parent thread only, so there > should be no deadlock when ctor calls pthread_create, only > when the child thread calls it again (which i considered rare). > > if you have example code that you think should work but now > deadlocks, then please report it. I'm sorry, I misled you. I think there is an ABBA deadlock issue in some scenarios. If I have a c++ dynamic library(named libA.so) that contains a global object, the global object will call the post-constructor at initialization and hold it's own lock(named A_lock) when dlopen loads libA.so. Assume that two threads execute the following process: Thread1:dlopen(libA.so) => hold dl_load_lock => load libA.so => init global object from libA.so => wait for hold A_lock Thread2:my own code hold A_lock => pthread_create => _dl_allocate_tls_init => wait for hold dl_load_lock In this case, an ABBA deadlock occurs. Is this a bug? My stack looks like this: Thread 1 (LWP 136013): #0 0x7f57a108510d in ?? () from /usr/lib64/libpthread.so.0 #1 0x7f57a107e4d1 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0 #1 stack waiting for holding A_lock ... #6 0x7f5781c1bb8b in LogProcess::Init (strProcName=..., nProcHandle=nProcHandle@entry=0) at ./service/biz_frame/code/server/src/logging/logprocess.cpp:107 ... #20 0x7f57a0fef21f in _dl_catch_exception () from /usr/lib64/libc.so.6 #21 0x7f57a786442b in ?? () from /lib64/ld-linux-x86-64.so.2 #22 0x7f57a3de2296 in ?? () from /usr/lib64/libdl.so.2 #23 0x7f57a0fef21f in _dl_catch_exception () from /usr/lib64/libc.so.6 #24 0x7f57a0fef2af in _dl_catch_error () from /usr/lib64/libc.so.6 #25 0x7f57a3de2985 in ?? () from /usr/lib64/libdl.so.2 #26 0x7f57a3de2351 in dlopen () from /usr/lib64/libdl.so.2 ... ... #38 0x7f57a0fb3520 in clone () from /usr/lib64/libc.so.6 Thread 2 (LWP 134627): #0 0x7f57a108510d in ?? () from /usr/lib64/libpthread.so.0 #1 0x7f57a107e580 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0 #2 0x7f57a7863835 in _dl_allocate_tls_init () from /lib64/ld-linux-x86-64.so.2 #3 0x7f57a107cb7c in pthread_create () from /usr/lib64/libpthread.so.0 ... #10 Stack holding A_lock ... #14 0x561689e0d579 in main () -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162] Re: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
** Bug watch added: Sourceware.org Bugzilla #28331 https://sourceware.org/bugzilla/show_bug.cgi?id=28331 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
(In reply to xujing from comment #35) > (In reply to cvs-com...@gcc.gnu.org from comment #31) > > commit 1387ad6225cf027790e3f460e31aa5dd2c54 > > Author: Szabolcs Nagy > > Date: Wed Dec 30 19:19:37 2020 + > > > > elf: Fix data races in pthread_create and TLS access [BZ #19329] > > > this patch use dl_load_lock in _dl_allocate_tls_init, is there a problem > when dlopen a dynamic library which will call pthread_create? I think it > will cause dl_load_lock and dl_load_lock dead lock. the real bug is that ctors are run with the dlopen lock held. that can causes deadlocks anyway (a ctor can create threads and that thread can call dlopen). this is bug 15686 which is not easy to fix, but that's the right solution. (in general, running user callbacks while libc internal locks are held is wrong.) that bug is now more exposed because the lock is also taken at _dl_allocate_tls_init during thread creation. however i expect that to be called in the parent thread only, so there should be no deadlock when ctor calls pthread_create, only when the child thread calls it again (which i considered rare). if you have example code that you think should work but now deadlocks, then please report it. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
(In reply to cvs-com...@gcc.gnu.org from comment #31) > The master branch has been updated by Szabolcs Nagy : > > https://sourceware.org/git/gitweb.cgi?p=glibc.git; > h=1387ad6225cf027790e3f460e31aa5dd2c54 > > commit 1387ad6225cf027790e3f460e31aa5dd2c54 > Author: Szabolcs Nagy > Date: Wed Dec 30 19:19:37 2020 + > > elf: Fix data races in pthread_create and TLS access [BZ #19329] > > DTV setup at thread creation (_dl_allocate_tls_init) is changed > to take the dlopen lock, GL(dl_load_lock). Avoiding data races > here without locks would require design changes: the map that is > accessed for static TLS initialization here may be concurrently > freed by dlclose. That use after free may be solved by only > locking around static TLS setup or by ensuring dlclose does not > free modules with static TLS, however currently every link map > with TLS has to be accessed at least to see if it needs static > TLS. And even if that's solved, still a lot of atomics would be > needed to synchronize DTV related globals without a lock. So fix > both bug 19329 and bug 27111 with a lock that prevents DTV setup > running concurrently with dlopen or dlclose. > > _dl_update_slotinfo at TLS access still does not use any locks > so CONCURRENCY NOTES are added to explain the synchronization. > The early exit from the slotinfo walk when max_modid is reached > is not strictly necessary, but does not hurt either. > > An incorrect acquire load was removed from _dl_resize_dtv: it > did not synchronize with any release store or fence and > synchronization is now handled separately at thread creation > and TLS access time. > > There are still a number of racy read accesses to globals that > will be changed to relaxed MO atomics in a followup patch. This > should not introduce regressions compared to existing behaviour > and avoid cluttering the main part of the fix. > > Not all TLS access related data races got fixed here: there are > additional races at lazy tlsdesc relocations see bug 27137. > > Reviewed-by: Adhemerval Zanella Hi! I think this commit may cause an ABBA deadlock problem which i mentioned in https://sourceware.org/bugzilla/show_bug.cgi?id=28331. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
(In reply to xujing from comment #35) > (In reply to cvs-com...@gcc.gnu.org from comment #31) > > The master branch has been updated by Szabolcs Nagy : > > > > https://sourceware.org/git/gitweb.cgi?p=glibc.git; > > h=1387ad6225cf027790e3f460e31aa5dd2c54 > > > > commit 1387ad6225cf027790e3f460e31aa5dd2c54 > > Author: Szabolcs Nagy > > Date: Wed Dec 30 19:19:37 2020 + > > > > elf: Fix data races in pthread_create and TLS access [BZ #19329] > > > > DTV setup at thread creation (_dl_allocate_tls_init) is changed > > to take the dlopen lock, GL(dl_load_lock). Avoiding data races > > here without locks would require design changes: the map that is > > accessed for static TLS initialization here may be concurrently > > freed by dlclose. That use after free may be solved by only > > locking around static TLS setup or by ensuring dlclose does not > > free modules with static TLS, however currently every link map > > with TLS has to be accessed at least to see if it needs static > > TLS. And even if that's solved, still a lot of atomics would be > > needed to synchronize DTV related globals without a lock. So fix > > both bug 19329 and bug 27111 with a lock that prevents DTV setup > > running concurrently with dlopen or dlclose. > > > > _dl_update_slotinfo at TLS access still does not use any locks > > so CONCURRENCY NOTES are added to explain the synchronization. > > The early exit from the slotinfo walk when max_modid is reached > > is not strictly necessary, but does not hurt either. > > > > An incorrect acquire load was removed from _dl_resize_dtv: it > > did not synchronize with any release store or fence and > > synchronization is now handled separately at thread creation > > and TLS access time. > > > > There are still a number of racy read accesses to globals that > > will be changed to relaxed MO atomics in a followup patch. This > > should not introduce regressions compared to existing behaviour > > and avoid cluttering the main part of the fix. > > > > Not all TLS access related data races got fixed here: there are > > additional races at lazy tlsdesc relocations see bug 27137. > > > > Reviewed-by: Adhemerval Zanella > > this patch use dl_load_lock in _dl_allocate_tls_init, is there a problem > when dlopen a dynamic library which will call pthread_create? I think it > will cause dl_load_lock and dl_load_lock dead lock. dlopen will hold on dl_load_lock, and pthread_create will call _dl_allocate_tls_init and then will hold on dl_load_lock. Finally, it will cause dead lock. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
(In reply to cvs-com...@gcc.gnu.org from comment #31) > The master branch has been updated by Szabolcs Nagy : > > https://sourceware.org/git/gitweb.cgi?p=glibc.git; > h=1387ad6225cf027790e3f460e31aa5dd2c54 > > commit 1387ad6225cf027790e3f460e31aa5dd2c54 > Author: Szabolcs Nagy > Date: Wed Dec 30 19:19:37 2020 + > > elf: Fix data races in pthread_create and TLS access [BZ #19329] > > DTV setup at thread creation (_dl_allocate_tls_init) is changed > to take the dlopen lock, GL(dl_load_lock). Avoiding data races > here without locks would require design changes: the map that is > accessed for static TLS initialization here may be concurrently > freed by dlclose. That use after free may be solved by only > locking around static TLS setup or by ensuring dlclose does not > free modules with static TLS, however currently every link map > with TLS has to be accessed at least to see if it needs static > TLS. And even if that's solved, still a lot of atomics would be > needed to synchronize DTV related globals without a lock. So fix > both bug 19329 and bug 27111 with a lock that prevents DTV setup > running concurrently with dlopen or dlclose. > > _dl_update_slotinfo at TLS access still does not use any locks > so CONCURRENCY NOTES are added to explain the synchronization. > The early exit from the slotinfo walk when max_modid is reached > is not strictly necessary, but does not hurt either. > > An incorrect acquire load was removed from _dl_resize_dtv: it > did not synchronize with any release store or fence and > synchronization is now handled separately at thread creation > and TLS access time. > > There are still a number of racy read accesses to globals that > will be changed to relaxed MO atomics in a followup patch. This > should not introduce regressions compared to existing behaviour > and avoid cluttering the main part of the fix. > > Not all TLS access related data races got fixed here: there are > additional races at lazy tlsdesc relocations see bug 27137. > > Reviewed-by: Adhemerval Zanella this patch use dl_load_lock in _dl_allocate_tls_init, is there a problem when dlopen a dynamic library which will call pthread_create? I think it will cause dl_load_lock and dl_load_lock dead lock. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162] Re: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
** Changed in: glibc Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
fixed for 2.34 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
The master branch has been updated by Szabolcs Nagy : https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=572bd547d57a39b6cf0ea072545dc4048921f4c3 commit 572bd547d57a39b6cf0ea072545dc4048921f4c3 Author: Szabolcs Nagy Date: Thu Dec 31 13:59:38 2020 + elf: Fix DTV gap reuse logic [BZ #27135] For some reason only dlopen failure caused dtv gaps to be reused. It is possible that the intent was to never reuse modids for a different module, but after dlopen failure all gaps are reused not just the ones caused by the unfinished dlopened. So the code has to handle reused modids already which seems to work, however the data races at thread creation and tls access (see bug 19329 and bug 27111) may be more severe if slots are reused so this is scheduled after those fixes. I think fixing the races are not simpler if reuse is disallowed and reuse has other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are removed from the middle of the slotinfo list. The value does not have to be correct: incorrect true value causes the next modid query to do a slotinfo walk, incorrect false will leave gaps and new entries are added at the end. Fixes bug 27135. Reviewed-by: Adhemerval Zanella -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
The master branch has been updated by Szabolcs Nagy : https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f4f8f4d4e0f92488431b268c8cd9555730b9afe9 commit f4f8f4d4e0f92488431b268c8cd9555730b9afe9 Author: Szabolcs Nagy Date: Wed Dec 30 19:19:37 2020 + elf: Use relaxed atomics for racy accesses [BZ #19329] This is a follow up patch to the fix for bug 19329. This adds relaxed MO atomics to accesses that were previously data races but are now race conditions, and where relaxed MO is sufficient. The race conditions all follow the pattern that the write is behind the dlopen lock, but a read can happen concurrently (e.g. during tls access) without holding the lock. For slotinfo entries the read value only matters if it reads from a synchronized write in dlopen or dlclose, otherwise the related dtv entry is not valid to access so it is fine to leave it in an inconsistent state. The same applies for GL(dl_tls_max_dtv_idx) and GL(dl_tls_generation), but there the algorithm relies on the fact that the read of the last synchronized write is an increasing value. Reviewed-by: Adhemerval Zanella -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
The master branch has been updated by Szabolcs Nagy : https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1387ad6225cf027790e3f460e31aa5dd2c54 commit 1387ad6225cf027790e3f460e31aa5dd2c54 Author: Szabolcs Nagy Date: Wed Dec 30 19:19:37 2020 + elf: Fix data races in pthread_create and TLS access [BZ #19329] DTV setup at thread creation (_dl_allocate_tls_init) is changed to take the dlopen lock, GL(dl_load_lock). Avoiding data races here without locks would require design changes: the map that is accessed for static TLS initialization here may be concurrently freed by dlclose. That use after free may be solved by only locking around static TLS setup or by ensuring dlclose does not free modules with static TLS, however currently every link map with TLS has to be accessed at least to see if it needs static TLS. And even if that's solved, still a lot of atomics would be needed to synchronize DTV related globals without a lock. So fix both bug 19329 and bug 27111 with a lock that prevents DTV setup running concurrently with dlopen or dlclose. _dl_update_slotinfo at TLS access still does not use any locks so CONCURRENCY NOTES are added to explain the synchronization. The early exit from the slotinfo walk when max_modid is reached is not strictly necessary, but does not hurt either. An incorrect acquire load was removed from _dl_resize_dtv: it did not synchronize with any release store or fence and synchronization is now handled separately at thread creation and TLS access time. There are still a number of racy read accesses to globals that will be changed to relaxed MO atomics in a followup patch. This should not introduce regressions compared to existing behaviour and avoid cluttering the main part of the fix. Not all TLS access related data races got fixed here: there are additional races at lazy tlsdesc relocations see bug 27137. Reviewed-by: Adhemerval Zanella -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162]
(In reply to Szabolcs Nagy from comment #29) > i have a new patch set that includes a different fix for this bug: > https://sourceware.org/pipermail/libc-alpha/2021-February/122626.html > > the new fix takes the dlopen lock at thread creation time instead > of just using atomics (which cannot work for fixing the same race > with dlclose: bug 27111). > > using atomics is still necessary for tls access. > > it will likely take a few review iterations to get this in glibc. Hi,Szabolcs, Do you know when will these patches be reviewed? Their Delegate is still Nobody, https://patchwork.sourceware.org/project/glibc/list/?series=1673. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162] Re: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
Launchpad has imported 30 comments from the remote bug at https://sourceware.org/bugzilla/show_bug.cgi?id=19329. If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. On 2015-12-04T12:37:03+00:00 nszabolcs wrote: (this is a continuation of bug 17918, but it turns out to be a different issue that was originally reported there.) failure: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' failed! caused by dlopen (in _dl_add_to_slotinfo and in dl_open_worker) doing listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1; //... if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0)) while pthread_create (in _dl_allocate_tls_init) concurrently doing assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation)); so T1: y = x + 1; ++x; T2: assert(y <= x); this is hard to trigger as the race window is short compared to the time dlopen and pthread_create takes, however if i add a usleep(1000) between the two operations in T1, it is triggered all the time. the slotinfo and tls generation update lack any sort of synchronization or atomics in _dl_allocate_tls_init (dlopen holds GL(dl_load_lock)). on x86_64 with added usleep: (gdb) p _rtld_local._dl_tls_dtv_slotinfo_list->slotinfo[0]@64 $11 = {{gen = 0, map = 0x77ff94e8}, {gen = 1, map = 0x77ff94e8}, {gen = 2, map = 0x7910}, {gen = 0, map = 0x0} } (gdb) p _rtld_local._dl_tls_generation $12 = 1 T1: #0 0x77df2097 in nanosleep () at ../sysdeps/unix/syscall-template.S:84 #1 0x77df1f74 in usleep (useconds=) at ../sysdeps/posix/usleep.c:32 #2 0x77decc6b in dl_open_worker (a=a@entry=0x77611c80) at dl-open.c:527 #3 0x77de8314 in _dl_catch_error (objname=objname@entry=0x77611c70, errstring=errstring@entry=0x77611c78, mallocedp=mallocedp@entry=0x77611c6f, operate=operate@entry=0x77dec720 , args=args@entry=0x77611c80) at dl-error.c:187 #4 0x77dec2a9 in _dl_open (file=0x77611ee0 "mod-0.so", mode=-2147483646, caller_dlopen=0x4007e2 , nsid=-2, argc=, argv=, env=0x7fffe378) at dl-open.c:652 #5 0x77bd5ee9 in dlopen_doit (a=a@entry=0x77611eb0) at dlopen.c:66 #6 0x77de8314 in _dl_catch_error (objname=0x78d0, errstring=0x78d8, mallocedp=0x78c8, operate=0x77bd5e90 , args=0x77611eb0) at dl-error.c:187 #7 0x77bd6521 in _dlerror_run (operate=operate@entry=0x77bd5e90 , args=args@entry=0x77611eb0) at dlerror.c:163 #8 0x77bd5f82 in __dlopen (file=file@entry=0x77611ee0 "mod-0.so", mode=mode@entry=2) at dlopen.c:87 #9 0x004007e2 in start (a=) at a.c:19 #10 0x779bf3d4 in start_thread (arg=0x77612700) at pthread_create.c:333 #11 0x776feedd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 T2: #0 __GI___assert_fail (assertion=0x77df8840 "listp->slotinfo[cnt].gen <= GL(dl_tls_generation)", file=0x77df68e6 "dl-tls.c", line=493, function=0x77df9020 <__PRETTY_FUNCTION__.9528> "_dl_allocate_tls_init") at dl-minimal.c:220 #1 0x77deb492 in __GI__dl_allocate_tls_init (result=0x7fffb7fff700) at dl-tls.c:493 #2 0x779bff67 in allocate_stack (stack=, pdp=, attr=0x7fffdf90) at allocatestack.c:579 #3 __pthread_create_2_1 (newthread=newthread@entry=0x7fffe078, attr=attr@entry=0x0, start_routine=start_routine@entry=0x4007c0 , arg=arg@entry=0xd) at pthread_create.c:526 #4 0x0040062a in main () at a.c:34 i think GL(dl_tls_generation) GL(dl_tls_dtv_slotinfo_list) listp->slotinfo[i].map listp->slotinfo[i].gen listp->next may all be accessed concurrently by pthread_create and dlopen without any synchronization. this can also cause wrong maxgen computation into dtv[0].counter Reply at: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/0 On 2015-12-29T10:51:48+00:00 I-palachev wrote: Hi, I've suggested a patch for this bug: https://sourceware.org/ml/libc-alpha/2015-12/msg00570.html Reply at: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/1 On 2016-01-08T18:19:09+00:00 nszabolcs wrote: Created attachment 8893 test case (main module) Reply at: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/comments/2 On 2016-01-08T18:20:10+00:00 nszabolcs wrote: Created atta
[Bug 1863162] Re: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
The upstream fix is still not finalized. Please participate in the upstream discussion and when the a patch gets accepted it can be backported. ** Bug watch added: Sourceware.org Bugzilla #19329 https://sourceware.org/bugzilla/show_bug.cgi?id=19329 ** Also affects: glibc via https://sourceware.org/bugzilla/show_bug.cgi?id=19329 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162] Re: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: glibc (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1863162] [NEW] Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
generation number read below is for a valid entry. +TODO: remove_slotinfo in dlclose is not synchronized. */ + map = atomic_load_acquire (>slotinfo[cnt].map); if (map == NULL) /* Unused entry. */ continue; + size_t gen = listp->slotinfo[cnt].gen; + if (gen > gen_count) + /* New, concurrently loaded entry. */ + continue; + /* Keep track of the maximum generation number. This might not be the generation counter. */ - assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation)); - maxgen = MAX (maxgen, listp->slotinfo[cnt].gen); + maxgen = MAX (maxgen, gen); dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED; dtv[map->l_tls_modid].pointer.is_static = false; @@ -518,11 +576,14 @@ _dl_allocate_tls_init (void *result) } total += cnt; - if (total >= GL(dl_tls_max_dtv_idx)) + if (total >= dtv_slots) break; - listp = listp->next; - assert (listp != NULL); + /* Synchronize with the release mo store in _dl_add_to_slotinfo +so only initialized slotinfo nodes are looked at. */ + listp = atomic_load_acquire (>next); + if (listp == NULL) + break; } /* The DTV version is up-to-date now. */ @@ -916,7 +977,7 @@ _dl_add_to_slotinfo (struct link_map *l) the first slot. */ assert (idx == 0); - listp = prevp->next = (struct dtv_slotinfo_list *) + listp = (struct dtv_slotinfo_list *) malloc (sizeof (struct dtv_slotinfo_list) + TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); if (listp == NULL) @@ -939,9 +1000,15 @@ cannot create TLS data structures")); listp->next = NULL; memset (listp->slotinfo, '\0', TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); + /* _dl_allocate_tls_init concurrently walks this list at thread +creation, it must only observe initialized nodes in the list. +See the CONCURRENCY NOTES there. */ + atomic_store_release (>next, listp); } /* Add the information into the slotinfo data structure. */ - listp->slotinfo[idx].map = l; listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1; + /* Synchronize with the acquire load in _dl_allocate_tls_init. + See the CONCURRENCY NOTES there. */ + atomic_store_release (>slotinfo[idx].map, l); } PATCH 2 diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 073321c..2c9ad2a 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -571,7 +571,7 @@ _dl_allocate_tls_init (void *result) } total += cnt; - if (total >= dtv_slots) + if (total > dtv_slots) break; /* Synchronize with dl_add_to_slotinfo. */ ** Affects: glibc (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs