Public bug reported:

When using glibc as part of our NSX product, we are running into the
above mentioned glibc assert case sometimes.

Here's relevant revision information :

Ubuntu - 16.04
glibc version - 2.23.


This is a known issue with resolution identified as per thread link below :

 https://sourceware.org/ml/libc-alpha/2016-01/msg00480.htm and in
addition see Comment 9 in
https://sourceware.org/bugzilla/show_bug.cgi?id=19329.

We have applied this patch in our product and it seems to be working
fine.

Is there a way to upstream these changes and make those available in
standard glibc upstream?

Please let us know.

Here are the two patches:

PATCH1
=============================
diff --git a/elf/dl-open.c b/elf/dl-open.c
index 6f178b3..2b97605 100644
--- a/elf/dl-open.c
+++ b/elf/dl-open.c
@@ -524,9 +524,16 @@ dl_open_worker (void *a)
     }
 
   /* Bump the generation number if necessary.  */
-  if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0))
-    _dl_fatal_printf (N_("\
+  if (any_tls)
+    {
+      size_t newgen = GL(dl_tls_generation) + 1;
+      if (__builtin_expect (newgen == 0, 0))
+       _dl_fatal_printf (N_("\
 TLS generation counter wrapped!  Please report this."));
+      /* Synchronize with the load acquire in _dl_allocate_tls_init.
+        See the CONCURRENCY NOTES there in dl-tls.c.  */
+      atomic_store_release (&GL(dl_tls_generation), newgen);
+    }
 
   /* We need a second pass for static tls data, because _dl_update_slotinfo
      must not be run while calls to _dl_add_to_slotinfo are still pending.  */
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index ed13fd9..7184a54 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -443,6 +443,48 @@ _dl_resize_dtv (dtv_t *dtv)
 }
 
 
+/* CONCURRENCY NOTES:
+
+   During dynamic TLS and DTV allocation and setup various objects may be
+   accessed concurrently:
+
+     GL(dl_tls_max_dtv_idx)
+     GL(dl_tls_generation)
+     listp->slotinfo[i].map
+     listp->slotinfo[i].gen
+     listp->next
+
+   where listp is a node in the GL(dl_tls_dtv_slotinfo_list) list.  The public
+   APIs that may access them are
+
+     Writers: dlopen, dlclose and dynamic linker start up code.
+     Readers only: pthread_create and __tls_get_addr (TLS access).
+
+   The writers hold the GL(dl_load_lock), but the readers don't, so atomics
+   should be used when accessing these globals.
+
+   dl_open_worker (called from dlopen) for each loaded module increases
+   GL(dl_tls_max_dtv_idx), sets the link_map of the module up, adds a new
+   slotinfo entry to GL(dl_tls_dtv_slotinfo_list) with the new link_map and
+   the next generation number GL(dl_tls_generation)+1.  Then it increases
+   GL(dl_tls_generation) which sinals that the new slotinfo entries are ready.
+   This last write is release mo so previous writes can be synchronized.
+
+   GL(dl_tls_max_dtv_idx) is always an upper bound of the modids of the ready
+   entries.  The slotinfo list might be shorter than that during dlopen.
+   Entries in the slotinfo list might have gen > GL(dl_tls_generation) and
+   map == NULL.
+
+   _dl_allocate_tls_init is called from pthread_create and it looks through
+   the slotinfo list to do the dynamic TLS and DTV setup for the new thread.
+   It first loads the current GL(dl_tls_generation) with acquire mo and only
+   considers modules up to that generation ignoring any later change to the
+   slotinfo list.
+
+   TODO: Entries might get changed and freed in dlclose without sync.
+   TODO: __tls_get_addr is not yet synchronized with dlopen and dlclose.
+*/
+
 void *
 internal_function
 _dl_allocate_tls_init (void *result)
@@ -455,9 +497,18 @@ _dl_allocate_tls_init (void *result)
   struct dtv_slotinfo_list *listp;
   size_t total = 0;
   size_t maxgen = 0;
-
-  /* Check if the current dtv is big enough.   */
-  if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
+  size_t gen_count;
+  size_t dtv_slots;
+
+  /* Synchronize with the release mo store in dl_open_worker, modules with
+     larger generation number are ignored.  */
+  gen_count = atomic_load_acquire (&GL(dl_tls_generation));
+  /* Check if the current dtv is big enough.  GL(dl_tls_max_dtv_idx) is
+     concurrently modified, but after the release mo store to
+     GL(dl_tls_generation) it always remains a modid upper bound for
+     previously loaded modules so relaxed access is enough.  */
+  dtv_slots = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
+  if (dtv[-1].counter < dtv_slots)
     {
       /* Resize the dtv.  */
       dtv = _dl_resize_dtv (dtv);
@@ -480,18 +531,25 @@ _dl_allocate_tls_init (void *result)
          void *dest;
 
          /* Check for the total number of used slots.  */
-         if (total + cnt > GL(dl_tls_max_dtv_idx))
+         if (total + cnt > dtv_slots)
            break;
 
-         map = listp->slotinfo[cnt].map;
+         /* Synchronize with the release mo store in _dl_add_to_slotinfo in
+            dlopen, so the generation number read below is for a valid entry.
+            TODO: remove_slotinfo in dlclose is not synchronized.  */
+         map = atomic_load_acquire (&listp->slotinfo[cnt].map);
          if (map == NULL)
            /* Unused entry.  */
            continue;
 
+         size_t gen = listp->slotinfo[cnt].gen;
+         if (gen > gen_count)
+           /* New, concurrently loaded entry.  */
+           continue;
+
          /* Keep track of the maximum generation number.  This might
             not be the generation counter.  */
-         assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation));
-         maxgen = MAX (maxgen, listp->slotinfo[cnt].gen);
+         maxgen = MAX (maxgen, gen);
 
          dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED;
          dtv[map->l_tls_modid].pointer.is_static = false;
@@ -518,11 +576,14 @@ _dl_allocate_tls_init (void *result)
        }
 
       total += cnt;
-      if (total >= GL(dl_tls_max_dtv_idx))
+      if (total >= dtv_slots)
        break;
 
-      listp = listp->next;
-      assert (listp != NULL);
+      /* Synchronize with the release mo store in _dl_add_to_slotinfo
+        so only initialized slotinfo nodes are looked at.  */
+      listp = atomic_load_acquire (&listp->next);
+      if (listp == NULL)
+       break;
     }
 
   /* The DTV version is up-to-date now.  */
@@ -916,7 +977,7 @@ _dl_add_to_slotinfo (struct link_map *l)
         the first slot.  */
       assert (idx == 0);
 
-      listp = prevp->next = (struct dtv_slotinfo_list *)
+      listp = (struct dtv_slotinfo_list *)
        malloc (sizeof (struct dtv_slotinfo_list)
                + TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
       if (listp == NULL)
@@ -939,9 +1000,15 @@ cannot create TLS data structures"));
       listp->next = NULL;
       memset (listp->slotinfo, '\0',
              TLS_SLOTINFO_SURPLUS * sizeof (struct dtv_slotinfo));
+      /* _dl_allocate_tls_init concurrently walks this list at thread
+        creation, it must only observe initialized nodes in the list.
+        See the CONCURRENCY NOTES there.  */
+      atomic_store_release (&prevp->next, listp);
     }
 
   /* Add the information into the slotinfo data structure.  */
-  listp->slotinfo[idx].map = l;
   listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
+  /* Synchronize with the acquire load in _dl_allocate_tls_init.
+     See the CONCURRENCY NOTES there.  */
+  atomic_store_release (&listp->slotinfo[idx].map, l);
 }


PATCH 2
========
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 073321c..2c9ad2a 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -571,7 +571,7 @@ _dl_allocate_tls_init (void *result)
        }

       total += cnt;
-      if (total >= dtv_slots)
+      if (total > dtv_slots)
        break;

       /* Synchronize with dl_add_to_slotinfo.  */

** Affects: glibc (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1863162

Title:
  Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init:
  Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1863162/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to