Public bug reported: When using glibc as part of our NSX product, we are running into the above mentioned glibc assert case sometimes.
Here's relevant revision information : Ubuntu - 16.04 glibc version - 2.23. This is a known issue with resolution identified as per thread link below : https://sourceware.org/ml/libc-alpha/2016-01/msg00480.htm and in addition see Comment 9 in https://sourceware.org/bugzilla/show_bug.cgi?id=19329. We have applied this patch in our product and it seems to be working fine. Is there a way to upstream these changes and make those available in standard glibc upstream? Please let us know. Here are the two patches: PATCH1 ============================= diff --git a/elf/dl-open.c b/elf/dl-open.c index 6f178b3..2b97605 100644 --- a/elf/dl-open.c +++ b/elf/dl-open.c @@ -524,9 +524,16 @@ dl_open_worker (void *a) } /* Bump the generation number if necessary. */ - if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0)) - _dl_fatal_printf (N_("\ + if (any_tls) + { + size_t newgen = GL(dl_tls_generation) + 1; + if (__builtin_expect (newgen == 0, 0)) + _dl_fatal_printf (N_("\ TLS generation counter wrapped! Please report this.")); + /* Synchronize with the load acquire in _dl_allocate_tls_init. + See the CONCURRENCY NOTES there in dl-tls.c. */ + atomic_store_release (&GL(dl_tls_generation), newgen); + } /* We need a second pass for static tls data, because _dl_update_slotinfo must not be run while calls to _dl_add_to_slotinfo are still pending. */ diff --git a/elf/dl-tls.c b/elf/dl-tls.c index ed13fd9..7184a54 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -443,6 +443,48 @@ _dl_resize_dtv (dtv_t *dtv) } +/* CONCURRENCY NOTES: + + During dynamic TLS and DTV allocation and setup various objects may be + accessed concurrently: + + GL(dl_tls_max_dtv_idx) + GL(dl_tls_generation) + listp->slotinfo[i].map + listp->slotinfo[i].gen + listp->next + + where listp is a node in the GL(dl_tls_dtv_slotinfo_list) list. The public + APIs that may access them are + + Writers: dlopen, dlclose and dynamic linker start up code. + Readers only: pthread_create and __tls_get_addr (TLS access). + + The writers hold the GL(dl_load_lock), but the readers don't, so atomics + should be used when accessing these globals. + + dl_open_worker (called from dlopen) for each loaded module increases + GL(dl_tls_max_dtv_idx), sets the link_map of the module up, adds a new + slotinfo entry to GL(dl_tls_dtv_slotinfo_list) with the new link_map and + the next generation number GL(dl_tls_generation)+1. Then it increases + GL(dl_tls_generation) which sinals that the new slotinfo entries are ready. + This last write is release mo so previous writes can be synchronized. + + GL(dl_tls_max_dtv_idx) is always an upper bound of the modids of the ready + entries. The slotinfo list might be shorter than that during dlopen. + Entries in the slotinfo list might have gen > GL(dl_tls_generation) and + map == NULL. + + _dl_allocate_tls_init is called from pthread_create and it looks through + the slotinfo list to do the dynamic TLS and DTV setup for the new thread. + It first loads the current GL(dl_tls_generation) with acquire mo and only + considers modules up to that generation ignoring any later change to the + slotinfo list. + + TODO: Entries might get changed and freed in dlclose without sync. + TODO: __tls_get_addr is not yet synchronized with dlopen and dlclose. +*/ + void * internal_function _dl_allocate_tls_init (void *result) @@ -455,9 +497,18 @@ _dl_allocate_tls_init (void *result) struct dtv_slotinfo_list *listp; size_t total = 0; size_t maxgen = 0; - - /* Check if the current dtv is big enough. */ - if (dtv[-1].counter < GL(dl_tls_max_dtv_idx)) + size_t gen_count; + size_t dtv_slots; + + /* Synchronize with the release mo store in dl_open_worker, modules with + larger generation number are ignored. */ + gen_count = atomic_load_acquire (&GL(dl_tls_generation)); + /* Check if the current dtv is big enough. GL(dl_tls_max_dtv_idx) is + concurrently modified, but after the release mo store to + GL(dl_tls_generation) it always remains a modid upper bound for + previously loaded modules so relaxed access is enough. */ + dtv_slots = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx)); + if (dtv[-1].counter < dtv_slots) { /* Resize the dtv. */ dtv = _dl_resize_dtv (dtv); @@ -480,18 +531,25 @@ _dl_allocate_tls_init (void *result) void *dest; /* Check for the total number of used slots. */ - if (total + cnt > GL(dl_tls_max_dtv_idx)) + if (total + cnt > dtv_slots) break; - map = listp->slotinfo[cnt].map; + /* Synchronize with the release mo store in _dl_add_to_slotinfo in + dlopen, so the generation number read below is for a valid entry. + TODO: remove_slotinfo in dlclose is not synchronized. */ + map = atomic_load_acquire (&listp->slotinfo[cnt].map); if (map == NULL) /* Unused entry. */ continue; + size_t gen = listp->slotinfo[cnt].gen; + if (gen > gen_count) + /* New, concurrently loaded entry. */ + continue; + /* Keep track of the maximum generation number. This might not be the generation counter. */ - assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation)); - maxgen = MAX (maxgen, listp->slotinfo[cnt].gen); + maxgen = MAX (maxgen, gen); dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED; dtv[map->l_tls_modid].pointer.is_static = false; @@ -518,11 +576,14 @@ _dl_allocate_tls_init (void *result) } total += cnt; - if (total >= GL(dl_tls_max_dtv_idx)) + if (total >= dtv_slots) break; - listp = listp->next; - assert (listp != NULL); + /* Synchronize with the release mo store in _dl_add_to_slotinfo + so only initialized slotinfo nodes are looked at. */ + listp = atomic_load_acquire (&listp->next); + if (listp == NULL) + break; } /* The DTV version is up-to-date now. */ @@ -916,7 +977,7 @@ _dl_add_to_slotinfo (struct link_map *l) the first slot. */ assert (idx == 0); - listp = prevp->next = (struct dtv_slotinfo_list *) + listp = (struct dtv_slotinfo_list *) malloc (sizeof (struct dtv_slotinfo_list) + TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); if (listp == NULL) @@ -939,9 +1000,15 @@ cannot create TLS data structures")); listp->next = NULL; memset (listp->slotinfo, '\0', TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo)); + /* _dl_allocate_tls_init concurrently walks this list at thread + creation, it must only observe initialized nodes in the list. + See the CONCURRENCY NOTES there. */ + atomic_store_release (&prevp->next, listp); } /* Add the information into the slotinfo data structure. */ - listp->slotinfo[idx].map = l; listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1; + /* Synchronize with the acquire load in _dl_allocate_tls_init. + See the CONCURRENCY NOTES there. */ + atomic_store_release (&listp->slotinfo[idx].map, l); } PATCH 2 ======== diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 073321c..2c9ad2a 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -571,7 +571,7 @@ _dl_allocate_tls_init (void *result) } total += cnt; - if (total >= dtv_slots) + if (total > dtv_slots) break; /* Synchronize with dl_add_to_slotinfo. */ ** Affects: glibc (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1863162 Title: Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed! To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs