On Fri, Dec 24, 2021 at 06:24:43AM -1000, Steve Sakoman wrote:
> On Fri, Dec 24, 2021 at 12:42 AM akash hadke via
> lists.openembedded.org <[email protected]>
> wrote:
> >
> > Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: 
> > Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' 
> > failed!
> > caused by dlopen (in _dl_add_to_slotinfo and in dl_open_worker) doing
> >   listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
> >   //...
> >   if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0))
> > while pthread_create (in _dl_allocate_tls_init) concurrently doing
> >   assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation));
> >
> > Backported below patch that can fix the following bugs with a lock
> > that prevents DTV setup running concurrently with dlopen or dlclose.
> >
> > Bug 19329: https://sourceware.org/bugzilla/show_bug.cgi?id=19329
> > Bug 27111: https://sourceware.org/bugzilla/show_bug.cgi?id=27111
> >
> > Patch: 
> > 0031-elf-Fix-data-races-in-pthread_create-and-TLS-access-BZ-19329.patch
> > Link: 
> > https://sourceware.org/git/?p=glibc.git;a=patch;h=1387ad6225c2222f027790e3f460e31aa5dd2c54
> >
> > It requires a supporting patch
> > 0030-elf-Refactor_dl_update-slotinfo-to-avoid-use-after-free.patch
> > Link: 
> > https://sourceware.org/git/?p=glibc.git;a=patch;h=c0669ae1a629e16b536bf11cdd0865e0dbcf4bee
> >
> > After adding the above fix there is a number of racy read accesses
> > to globals that will be changed to relaxed MO atomics in follow-up
> > patch given below.
> >
> > This fixes the regressions and avoids cluttering the main part
> > of the fix.
> >
> > 0032-elf-Use-relaxed-atomics-for-racy-accesses-BZ-19329.patch
> > Link: 
> > https://sourceware.org/git/?p=glibc.git;a=patch;h=f4f8f4d4e0f92488431b268c8cd9555730b9afe9
> >
> > Backported the below patch to add the test to check the added fix.
> > 0033-elf-Add-test-case-for-BZ-19329.patch
> > Link: 
> > https://sourceware.org/git/?p=glibc.git;a=patch;h=9d0e30329c23b5ad736fda3f174208c25970dbce
> >
> > Previously modids were never resused for a
> > different module, but after dlopen failure all gaps are reused
> > not just the ones caused by the unfinished dlopened.
> >
> > The code has to handle reused modids already which seems to
> > work, however the data races at thread creation and tls access
> > (see bug 19329 and bug 27111) may be more severe if slots are
> > reused. Fixing the races are not simpler if reuse is disallowed
> > and reuse has other benefits so upstream added fix
> > https://sourceware.org/git/?p=glibc.git;a=commit;h=572bd547d57a39b6cf0ea072545dc4048921f4c3
> > for the following bug.
> >
> > Bug 27135: https://sourceware.org/bugzilla/show_bug.cgi?id=27135
> >
> > But in glibc upstream the commit 572bd547d57a was reverted as the
> > issue with 572bd547d57a patch was the DTV entry only updated on
> > dl_open_worker() with the update_tls_slotinfo() call after all
> > dependencies are being processed by _dl_map_object_deps(). However
> > _dl_map_object_deps() itself might call _dl_next_tls_modid(),
> > and since the _dl_tls_dtv_slotinfo_list::map was not yet set the
> > entry can be wrongly reused.
> >
> > So added below patch to fix Bug 27135.
> > 0034-elf-Fix-DTV-gap-reuse-logic-BZ-27135.patch
> > Link: 
> > https://sourceware.org/git/?p=glibc.git;a=patch;h=ba33937be210da5d07f7f01709323743f66011ce
> >
> > Not all TLS access related data races got fixed by adding
> > 0031-elf-Fix-data-races-in-pthread_create-and-TLS-access-BZ-19329.patch,
> > there are additional races at lazy tlsdesc relocations.
> > Bug 27137: https://sourceware.org/bugzilla/show_bug.cgi?id=27137
> >
> > Backported below patches to fix this issue.
> >
> > 0035-x86_64-Avoid-lazy-relocation-of-tlsdesc-BZ-27137.patch
> > Link: 
> > https://sourceware.org/git/?p=glibc.git;a=patch;h=8f7e09f4dbdb5c815a18b8285fbc5d5d7bc17d86
> >
> > 0036-i386-Avoid-lazy-relocation-of-tlsdesc-BZ-27137.patch
> > Link: 
> > https://sourceware.org/git/?p=glibc.git;a=patch;h=ddcacd91cc10ff92d6201eda87047d029c14158d
> >
> > The fix 
> > 0031-elf-Fix-data-races-in-pthread_create-and-TLS-access-BZ-19329.patch
> > for bug 19329 caused a regression such that pthread_create can
> > deadlock when concurrent ctors from dlopen are waiting for it
> > to finish.
> > Bug 28357: https://sourceware.org/bugzilla/show_bug.cgi?id=28357
> >
> > Backported below patch to fix this issue.
> > 0037-Avoid-deadlock-between-pthread_create-and-ctors.patch
> > Link: 
> > https://sourceware.org/git/?p=glibc.git;a=patch;h=024a7640ab9ecea80e527f4e4d7f7a1868e952c5
> >
> > Signed-off-by: Akash Hadke <[email protected]>
> > Signed-off-by: Akash Hadke <[email protected]>
> 
> This is a quite extensive change set.  I'd like to get Khem's opinion
> as glibc maintainer.  Thoughts, Khem?

If not clear, this fixes a rare crash in dlopen(). Saw this in automated testing
but it is quite rare and non-trivial to hit.

-Mikko
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#160386): 
https://lists.openembedded.org/g/openembedded-core/message/160386
Mute This Topic: https://lists.openembedded.org/mt/87934753/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to