On Fri, Dec 24, 2021 at 06:24:43AM -1000, Steve Sakoman wrote: > On Fri, Dec 24, 2021 at 12:42 AM akash hadke via > lists.openembedded.org <[email protected]> > wrote: > > > > Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: > > Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' > > failed! > > caused by dlopen (in _dl_add_to_slotinfo and in dl_open_worker) doing > > listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1; > > //... > > if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0)) > > while pthread_create (in _dl_allocate_tls_init) concurrently doing > > assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation)); > > > > Backported below patch that can fix the following bugs with a lock > > that prevents DTV setup running concurrently with dlopen or dlclose. > > > > Bug 19329: https://sourceware.org/bugzilla/show_bug.cgi?id=19329 > > Bug 27111: https://sourceware.org/bugzilla/show_bug.cgi?id=27111 > > > > Patch: > > 0031-elf-Fix-data-races-in-pthread_create-and-TLS-access-BZ-19329.patch > > Link: > > https://sourceware.org/git/?p=glibc.git;a=patch;h=1387ad6225c2222f027790e3f460e31aa5dd2c54 > > > > It requires a supporting patch > > 0030-elf-Refactor_dl_update-slotinfo-to-avoid-use-after-free.patch > > Link: > > https://sourceware.org/git/?p=glibc.git;a=patch;h=c0669ae1a629e16b536bf11cdd0865e0dbcf4bee > > > > After adding the above fix there is a number of racy read accesses > > to globals that will be changed to relaxed MO atomics in follow-up > > patch given below. > > > > This fixes the regressions and avoids cluttering the main part > > of the fix. > > > > 0032-elf-Use-relaxed-atomics-for-racy-accesses-BZ-19329.patch > > Link: > > https://sourceware.org/git/?p=glibc.git;a=patch;h=f4f8f4d4e0f92488431b268c8cd9555730b9afe9 > > > > Backported the below patch to add the test to check the added fix. > > 0033-elf-Add-test-case-for-BZ-19329.patch > > Link: > > https://sourceware.org/git/?p=glibc.git;a=patch;h=9d0e30329c23b5ad736fda3f174208c25970dbce > > > > Previously modids were never resused for a > > different module, but after dlopen failure all gaps are reused > > not just the ones caused by the unfinished dlopened. > > > > The code has to handle reused modids already which seems to > > work, however the data races at thread creation and tls access > > (see bug 19329 and bug 27111) may be more severe if slots are > > reused. Fixing the races are not simpler if reuse is disallowed > > and reuse has other benefits so upstream added fix > > https://sourceware.org/git/?p=glibc.git;a=commit;h=572bd547d57a39b6cf0ea072545dc4048921f4c3 > > for the following bug. > > > > Bug 27135: https://sourceware.org/bugzilla/show_bug.cgi?id=27135 > > > > But in glibc upstream the commit 572bd547d57a was reverted as the > > issue with 572bd547d57a patch was the DTV entry only updated on > > dl_open_worker() with the update_tls_slotinfo() call after all > > dependencies are being processed by _dl_map_object_deps(). However > > _dl_map_object_deps() itself might call _dl_next_tls_modid(), > > and since the _dl_tls_dtv_slotinfo_list::map was not yet set the > > entry can be wrongly reused. > > > > So added below patch to fix Bug 27135. > > 0034-elf-Fix-DTV-gap-reuse-logic-BZ-27135.patch > > Link: > > https://sourceware.org/git/?p=glibc.git;a=patch;h=ba33937be210da5d07f7f01709323743f66011ce > > > > Not all TLS access related data races got fixed by adding > > 0031-elf-Fix-data-races-in-pthread_create-and-TLS-access-BZ-19329.patch, > > there are additional races at lazy tlsdesc relocations. > > Bug 27137: https://sourceware.org/bugzilla/show_bug.cgi?id=27137 > > > > Backported below patches to fix this issue. > > > > 0035-x86_64-Avoid-lazy-relocation-of-tlsdesc-BZ-27137.patch > > Link: > > https://sourceware.org/git/?p=glibc.git;a=patch;h=8f7e09f4dbdb5c815a18b8285fbc5d5d7bc17d86 > > > > 0036-i386-Avoid-lazy-relocation-of-tlsdesc-BZ-27137.patch > > Link: > > https://sourceware.org/git/?p=glibc.git;a=patch;h=ddcacd91cc10ff92d6201eda87047d029c14158d > > > > The fix > > 0031-elf-Fix-data-races-in-pthread_create-and-TLS-access-BZ-19329.patch > > for bug 19329 caused a regression such that pthread_create can > > deadlock when concurrent ctors from dlopen are waiting for it > > to finish. > > Bug 28357: https://sourceware.org/bugzilla/show_bug.cgi?id=28357 > > > > Backported below patch to fix this issue. > > 0037-Avoid-deadlock-between-pthread_create-and-ctors.patch > > Link: > > https://sourceware.org/git/?p=glibc.git;a=patch;h=024a7640ab9ecea80e527f4e4d7f7a1868e952c5 > > > > Signed-off-by: Akash Hadke <[email protected]> > > Signed-off-by: Akash Hadke <[email protected]> > > This is a quite extensive change set. I'd like to get Khem's opinion > as glibc maintainer. Thoughts, Khem?
If not clear, this fixes a rare crash in dlopen(). Saw this in automated testing but it is quite rare and non-trivial to hit. -Mikko
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#160386): https://lists.openembedded.org/g/openembedded-core/message/160386 Mute This Topic: https://lists.openembedded.org/mt/87934753/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
