Re: ld.so crash second attempt

2015-11-05 Thread Philip Guenther
On Thu, Nov 5, 2015 at 1:00 PM, Peter Hajdu
 wrote:
...
> I've tested the patch on amd64 with a simple sdl2 test and with my
> original tests on both amd64 and i386.  Everything seems to work just
> fine.  Thank you very much for your effort.

And it's now committed.  Thanks again for the nudge!


Philip Guenther



Re: ld.so crash second attempt

2015-11-05 Thread Peter Hajdu
On 03/11/15 at 01:46P, Philip Guenther wrote:
> On Sun, 25 Oct 2015, Peter Hajdu wrote:
> > I try to give it one more attempt with a bit more description about the 
> > bug.
> > 
> > After calling dlclose in _dl_notify_unload_shlib_ group reference counts 
> > are decreased by following the object's grpref-list.  Unfortunately the 
> > references are removed from the list during the graph traversal.
> > 
> > dlclose will run all destructors of the unused objects and tries to 
> > unload all objects reachable from the closed object's child and 
> > grpref-list.  Since the grpref-list references were removed, the unused 
> > destructed object stays in memory as garbage.  Next time when this 
> > object is loaded it is found in the memory and crashes during load.
> > 
> > This patch unloads all unused objects instead of following the closed 
> > object's child and grpref list.
> 
> Thank you for working on this.  After a long rumination, I'd like to 
> propose a different diff, seen below.
> 
> Your diff changes dlclose() to switch from calling _dl_unload_shlib() to a 
> new function _dl_unload_unused().  They both unload refcnt==0 objects: 
> _dl_unload_shlib() finds them by walking the dependency tree from the 
> selected object, while _dl_unload_unused() just scans the entire list.
> 
> So why is the former not sufficient?  As you describe, the problem occurs 
> when a grpref is removed which is the last reference to an object. grprefs 
> are used guarantee that entire load groups are unloaded all at once and 
> not piecemeal.  If later dlopen() adds a dependency to a child in this 
> load group, the entire group will be kept even if that child is the last 
> real link.  When that dependency is added, the later object takes a grpref 
> to the load_object, which is root of the load group.
> 
> As you note _dl_run_all_dtors() releases that grpref, but we still know 
> where it had pointed: to the load_object of some object being released!  
> So we can retain the behavior of _dl_unload_shlib(), but we need to add a 
> check for whether our load_object is now unreferenced.  If so, it 
> previously had a grpref which has been released, so we need to take down 
> the entire load group.
> 
> Thus the diff below.  It works with your test setup (thanks for writing 
> that!), passes regress/libexec/ld.so/, and chrome hasn't choked on it.  
> Can someone who's familiar with the sdl problem case test it there?
> 

Hi Philip,

I've tested the patch on amd64 with a simple sdl2 test and with my
original tests on both amd64 and i386.  Everything seems to work just
fine.  Thank you very much for your effort.

Peter

> 
> Does that make sense?
> 
> 
> Again, thank you for pushing on this.  If what I show here works, it's 
> because your description triggered an "ahhh, wait a moment..." thought.
> 
> 
> Philip
> 
> 
> Index: library.c
> ===
> RCS file: /data/src/openbsd/src/libexec/ld.so/library.c,v
> retrieving revision 1.71
> diff -u -p -r1.71 library.c
> --- library.c 16 Jan 2015 16:18:07 -  1.71
> +++ library.c 3 Nov 2015 09:09:15 -
> @@ -59,9 +59,27 @@ void
>  _dl_unload_shlib(elf_object_t *object)
>  {
>   struct dep_node *n;
> + elf_object_t *load_object = object->load_object;
> +
> + /*
> +  * If our load object has become unreferenced then we lost the
> +  * last group reference to it, so the entire group should be taken
> +  * down.  The current object is somewhere below load_object in
> +  * the child_list tree, so it'll get cleaned up by the recursion.
> +  * That means we can just switch here to the load object.
> +  */
> + if (load_object != object && OBJECT_REF_CNT(load_object) == 0 &&
> + (load_object->status & STAT_UNLOADED) == 0) {
> + DL_DEB(("unload_shlib switched from %s to %s\n",
> + object->load_name, load_object->load_name));
> + object = load_object;
> + goto unload;
> + }
> +
>   DL_DEB(("unload_shlib called on %s\n", object->load_name));
>   if (OBJECT_REF_CNT(object) == 0 &&
>   (object->status & STAT_UNLOADED) == 0) {
> +unload:
>   object->status |= STAT_UNLOADED;
>   TAILQ_FOREACH(n, &object->child_list, next_sib)
>   _dl_unload_shlib(n->data);
> Index: library_mquery.c
> ===
> RCS file: /data/src/openbsd/src/libexec/ld.so/library_mquery.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 library_mquery.c
> --- library_mquery.c  22 Jan 2015 05:48:17 -  1.49
> +++ library_mquery.c  3 Nov 2015 09:10:39 -
> @@ -64,10 +64,27 @@ void
>  _dl_unload_shlib(elf_object_t *object)
>  {
>   struct dep_node *n;
> + elf_object_t *load_object = object->load_object;
> +
> + /*
> +  * If our load object has become unreferenced then we lost the
> +  * last group reference to it, so 

Re: ld.so crash second attempt

2015-11-04 Thread Alf Schlichting
On Tue, 03 Nov 2015 10:46:57 +0100,
Philip Guenther wrote:
> 
> On Sun, 25 Oct 2015, Peter Hajdu wrote:
> > I try to give it one more attempt with a bit more description about the 
> > bug.
> > 
> > After calling dlclose in _dl_notify_unload_shlib_ group reference counts 
> > are decreased by following the object's grpref-list.  Unfortunately the 
> > references are removed from the list during the graph traversal.
> > 
> > dlclose will run all destructors of the unused objects and tries to 
> > unload all objects reachable from the closed object's child and 
> > grpref-list.  Since the grpref-list references were removed, the unused 
> > destructed object stays in memory as garbage.  Next time when this 
> > object is loaded it is found in the memory and crashes during load.
> > 
> > This patch unloads all unused objects instead of following the closed 
> > object's child and grpref list.
> 
> Thank you for working on this.  After a long rumination, I'd like to 
> propose a different diff, seen below.
> 
> Your diff changes dlclose() to switch from calling _dl_unload_shlib() to a 
> new function _dl_unload_unused().  They both unload refcnt==0 objects: 
> _dl_unload_shlib() finds them by walking the dependency tree from the 
> selected object, while _dl_unload_unused() just scans the entire list.
> 
> So why is the former not sufficient?  As you describe, the problem occurs 
> when a grpref is removed which is the last reference to an object. grprefs 
> are used guarantee that entire load groups are unloaded all at once and 
> not piecemeal.  If later dlopen() adds a dependency to a child in this 
> load group, the entire group will be kept even if that child is the last 
> real link.  When that dependency is added, the later object takes a grpref 
> to the load_object, which is root of the load group.
> 
> As you note _dl_run_all_dtors() releases that grpref, but we still know 
> where it had pointed: to the load_object of some object being released!  
> So we can retain the behavior of _dl_unload_shlib(), but we need to add a 
> check for whether our load_object is now unreferenced.  If so, it 
> previously had a grpref which has been released, so we need to take down 
> the entire load group.
> 
> Thus the diff below.  It works with your test setup (thanks for writing 
> that!), passes regress/libexec/ld.so/, and chrome hasn't choked on it.  
> Can someone who's familiar with the sdl problem case test it there?
> 
> 
> Does that make sense?
> 
> 
> Again, thank you for pushing on this.  If what I show here works, it's 
> because your description triggered an "ahhh, wait a moment..." thought.
> 
> 
> Philip
> 

sdl2 (the old, marked-as-broken) port and sdl2 apps start and work
with this patch on amd64 -current.

Thx,
Alf

> 
> Index: library.c
> ===
> RCS file: /data/src/openbsd/src/libexec/ld.so/library.c,v
> retrieving revision 1.71
> diff -u -p -r1.71 library.c
> --- library.c 16 Jan 2015 16:18:07 -  1.71
> +++ library.c 3 Nov 2015 09:09:15 -
> @@ -59,9 +59,27 @@ void
>  _dl_unload_shlib(elf_object_t *object)
>  {
>   struct dep_node *n;
> + elf_object_t *load_object = object->load_object;
> +
> + /*
> +  * If our load object has become unreferenced then we lost the
> +  * last group reference to it, so the entire group should be taken
> +  * down.  The current object is somewhere below load_object in
> +  * the child_list tree, so it'll get cleaned up by the recursion.
> +  * That means we can just switch here to the load object.
> +  */
> + if (load_object != object && OBJECT_REF_CNT(load_object) == 0 &&
> + (load_object->status & STAT_UNLOADED) == 0) {
> + DL_DEB(("unload_shlib switched from %s to %s\n",
> + object->load_name, load_object->load_name));
> + object = load_object;
> + goto unload;
> + }
> +
>   DL_DEB(("unload_shlib called on %s\n", object->load_name));
>   if (OBJECT_REF_CNT(object) == 0 &&
>   (object->status & STAT_UNLOADED) == 0) {
> +unload:
>   object->status |= STAT_UNLOADED;
>   TAILQ_FOREACH(n, &object->child_list, next_sib)
>   _dl_unload_shlib(n->data);
> Index: library_mquery.c
> ===
> RCS file: /data/src/openbsd/src/libexec/ld.so/library_mquery.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 library_mquery.c
> --- library_mquery.c  22 Jan 2015 05:48:17 -  1.49
> +++ library_mquery.c  3 Nov 2015 09:10:39 -
> @@ -64,10 +64,27 @@ void
>  _dl_unload_shlib(elf_object_t *object)
>  {
>   struct dep_node *n;
> + elf_object_t *load_object = object->load_object;
> +
> + /*
> +  * If our load object has become unreferenced then we lost the
> +  * last group reference to it, so the entire group should be taken
> +  * down.  The current object is

Re: ld.so crash second attempt

2015-11-03 Thread Anthony J. Bentley
Hi Philip,

Philip Guenther writes:
> Thus the diff below.  It works with your test setup (thanks for writing 
> that!), passes regress/libexec/ld.so/, and chrome hasn't choked on it.  
> Can someone who's familiar with the sdl problem case test it there?

This patch fixes my sdl2 testcase, mupen64plus. Thanks for looking into
this along with Peter.

-- 
Anthony J. Bentley



Re: ld.so crash second attempt

2015-11-03 Thread Philip Guenther
On Sun, 25 Oct 2015, Peter Hajdu wrote:
> I try to give it one more attempt with a bit more description about the 
> bug.
> 
> After calling dlclose in _dl_notify_unload_shlib_ group reference counts 
> are decreased by following the object's grpref-list.  Unfortunately the 
> references are removed from the list during the graph traversal.
> 
> dlclose will run all destructors of the unused objects and tries to 
> unload all objects reachable from the closed object's child and 
> grpref-list.  Since the grpref-list references were removed, the unused 
> destructed object stays in memory as garbage.  Next time when this 
> object is loaded it is found in the memory and crashes during load.
> 
> This patch unloads all unused objects instead of following the closed 
> object's child and grpref list.

Thank you for working on this.  After a long rumination, I'd like to 
propose a different diff, seen below.

Your diff changes dlclose() to switch from calling _dl_unload_shlib() to a 
new function _dl_unload_unused().  They both unload refcnt==0 objects: 
_dl_unload_shlib() finds them by walking the dependency tree from the 
selected object, while _dl_unload_unused() just scans the entire list.

So why is the former not sufficient?  As you describe, the problem occurs 
when a grpref is removed which is the last reference to an object. grprefs 
are used guarantee that entire load groups are unloaded all at once and 
not piecemeal.  If later dlopen() adds a dependency to a child in this 
load group, the entire group will be kept even if that child is the last 
real link.  When that dependency is added, the later object takes a grpref 
to the load_object, which is root of the load group.

As you note _dl_run_all_dtors() releases that grpref, but we still know 
where it had pointed: to the load_object of some object being released!  
So we can retain the behavior of _dl_unload_shlib(), but we need to add a 
check for whether our load_object is now unreferenced.  If so, it 
previously had a grpref which has been released, so we need to take down 
the entire load group.

Thus the diff below.  It works with your test setup (thanks for writing 
that!), passes regress/libexec/ld.so/, and chrome hasn't choked on it.  
Can someone who's familiar with the sdl problem case test it there?


Does that make sense?


Again, thank you for pushing on this.  If what I show here works, it's 
because your description triggered an "ahhh, wait a moment..." thought.


Philip


Index: library.c
===
RCS file: /data/src/openbsd/src/libexec/ld.so/library.c,v
retrieving revision 1.71
diff -u -p -r1.71 library.c
--- library.c   16 Jan 2015 16:18:07 -  1.71
+++ library.c   3 Nov 2015 09:09:15 -
@@ -59,9 +59,27 @@ void
 _dl_unload_shlib(elf_object_t *object)
 {
struct dep_node *n;
+   elf_object_t *load_object = object->load_object;
+
+   /*
+* If our load object has become unreferenced then we lost the
+* last group reference to it, so the entire group should be taken
+* down.  The current object is somewhere below load_object in
+* the child_list tree, so it'll get cleaned up by the recursion.
+* That means we can just switch here to the load object.
+*/
+   if (load_object != object && OBJECT_REF_CNT(load_object) == 0 &&
+   (load_object->status & STAT_UNLOADED) == 0) {
+   DL_DEB(("unload_shlib switched from %s to %s\n",
+   object->load_name, load_object->load_name));
+   object = load_object;
+   goto unload;
+   }
+
DL_DEB(("unload_shlib called on %s\n", object->load_name));
if (OBJECT_REF_CNT(object) == 0 &&
(object->status & STAT_UNLOADED) == 0) {
+unload:
object->status |= STAT_UNLOADED;
TAILQ_FOREACH(n, &object->child_list, next_sib)
_dl_unload_shlib(n->data);
Index: library_mquery.c
===
RCS file: /data/src/openbsd/src/libexec/ld.so/library_mquery.c,v
retrieving revision 1.49
diff -u -p -r1.49 library_mquery.c
--- library_mquery.c22 Jan 2015 05:48:17 -  1.49
+++ library_mquery.c3 Nov 2015 09:10:39 -
@@ -64,10 +64,27 @@ void
 _dl_unload_shlib(elf_object_t *object)
 {
struct dep_node *n;
+   elf_object_t *load_object = object->load_object;
+
+   /*
+* If our load object has become unreferenced then we lost the
+* last group reference to it, so the entire group should be taken
+* down.  The current object is somewhere below load_object in
+* the child_list tree, so it'll get cleaned up by the recursion.
+* That means we can just switch here to the load object.
+*/
+   if (load_object != object && OBJECT_REF_CNT(load_object) == 0 &&
+   (load_object->status & STAT_UNLOADED) == 0) {
+  

Re: ld.so crash second attempt

2015-10-28 Thread Alf Schlichting
On Sun, 25 Oct 2015 11:01:27 +0100,
Peter Hajdu wrote:
> 
> [1  ]
> Hi,
> 
> I try to give it one more attempt with a bit more description about the
> bug.
> 
> After calling dlclose in _dl_notify_unload_shlib_ group reference counts
> are decreased by following the object's grpref-list.  Unfortunately the
> references are removed from the list during the graph traversal.
> 
> dlclose will run all destructors of the unused objects and tries to
> unload all objects reachable from the closed object's child and
> grpref-list.  Since the grpref-list references were removed, the unused
> destructed object stays in memory as garbage.  Next time when this
> object is loaded it is found in the memory and crashes during load.
> 
> This patch unloads all unused objects instead of following the closed
> object's child and grpref list.
> 
> Best regards,
> Peter Hajdu
> 

Hello,

I can confirm that this works with sdl2 on a maybe 4 week old snapshot
on amd64.
I used that old patch that was flying around to get sdl2 up and
running, now I replaced ld.so with a fresh checkout, applied the
patch and sdl2 still works fine, no other problems observed.

Thanks a lot,

Alf

> [2 ldso_fix.diff ]
> Index: dlfcn.c
> ===
> RCS file: /cvs/src/libexec/ld.so/dlfcn.c,v
> retrieving revision 1.91
> diff -u -p -r1.91 dlfcn.c
> --- dlfcn.c   19 Sep 2015 20:56:47 -  1.91
> +++ dlfcn.c   21 Oct 2015 13:52:46 -
> @@ -302,7 +302,7 @@ _dl_real_close(void *handle)
>   object->opencount--;
>   _dl_notify_unload_shlib(object);
>   _dl_run_all_dtors();
> - _dl_unload_shlib(object);
> + _dl_unload_unused();
>   _dl_cleanup_objects();
>   return (0);
>  }
> Index: library.c
> ===
> RCS file: /cvs/src/libexec/ld.so/library.c,v
> retrieving revision 1.71
> diff -u -p -r1.71 library.c
> --- library.c 16 Jan 2015 16:18:07 -  1.71
> +++ library.c 21 Oct 2015 13:52:46 -
> @@ -74,6 +74,22 @@ _dl_unload_shlib(elf_object_t *object)
>   }
>  }
>  
> +void
> +_dl_unload_unused(void)
> +{
> + elf_object_t *obj, *next;
> +
> + for (obj = _dl_objects->next; obj != NULL; obj = next) {
> + next = obj->next;
> + if (OBJECT_REF_CNT(obj) != 0 || obj->status & STAT_UNLOADED)
> + continue;
> + obj->status |= STAT_UNLOADED;
> + _dl_load_list_free(obj->load_list);
> + _dl_munmap((void *)obj->load_base, obj->load_size);
> + _dl_remove_object(obj);
> + }
> +}
> +
>  elf_object_t *
>  _dl_tryload_shlib(const char *libname, int type, int flags)
>  {
> Index: library_mquery.c
> ===
> RCS file: /cvs/src/libexec/ld.so/library_mquery.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 library_mquery.c
> --- library_mquery.c  22 Jan 2015 05:48:17 -  1.49
> +++ library_mquery.c  21 Oct 2015 13:52:46 -
> @@ -79,6 +79,20 @@ _dl_unload_shlib(elf_object_t *object)
>   }
>  }
>  
> +void
> +_dl_unload_unused(void)
> +{
> + elf_object_t *obj, *next;
> +
> + for (obj = _dl_objects->next; obj != NULL; obj = next) {
> + next = obj->next;
> + if (OBJECT_REF_CNT(obj) != 0 || obj->status & STAT_UNLOADED)
> + continue;
> + obj->status |= STAT_UNLOADED;
> + _dl_load_list_free(obj->load_list);
> + _dl_remove_object(obj);
> + }
> +}
>  
>  elf_object_t *
>  _dl_tryload_shlib(const char *libname, int type, int flags)
> Index: resolve.h
> ===
> RCS file: /cvs/src/libexec/ld.so/resolve.h,v
> retrieving revision 1.73
> diff -u -p -r1.73 resolve.h
> --- resolve.h 19 Sep 2015 20:56:47 -  1.73
> +++ resolve.h 21 Oct 2015 13:52:46 -
> @@ -223,6 +223,7 @@ void _dl_unlink_dlopen(elf_object_t *dep
>  void _dl_notify_unload_shlib(elf_object_t *object);
>  void _dl_unload_shlib(elf_object_t *object);
>  void _dl_unload_dlopen(void);
> +void _dl_unload_unused(void);
>  
>  void _dl_run_all_dtors(void);
>  



Re: ld.so crash second attempt

2015-10-25 Thread Philip Guenther
On Sun, Oct 25, 2015 at 3:01 AM, Peter Hajdu
 wrote:
> I try to give it one more attempt with a bit more description about the
> bug.

I am, slooowly, looking at this.


Philip Guenther



ld.so crash second attempt

2015-10-25 Thread Peter Hajdu
Hi,

I try to give it one more attempt with a bit more description about the
bug.

After calling dlclose in _dl_notify_unload_shlib_ group reference counts
are decreased by following the object's grpref-list.  Unfortunately the
references are removed from the list during the graph traversal.

dlclose will run all destructors of the unused objects and tries to
unload all objects reachable from the closed object's child and
grpref-list.  Since the grpref-list references were removed, the unused
destructed object stays in memory as garbage.  Next time when this
object is loaded it is found in the memory and crashes during load.

This patch unloads all unused objects instead of following the closed
object's child and grpref list.

Best regards,
Peter Hajdu

Index: dlfcn.c
===
RCS file: /cvs/src/libexec/ld.so/dlfcn.c,v
retrieving revision 1.91
diff -u -p -r1.91 dlfcn.c
--- dlfcn.c 19 Sep 2015 20:56:47 -  1.91
+++ dlfcn.c 21 Oct 2015 13:52:46 -
@@ -302,7 +302,7 @@ _dl_real_close(void *handle)
object->opencount--;
_dl_notify_unload_shlib(object);
_dl_run_all_dtors();
-   _dl_unload_shlib(object);
+   _dl_unload_unused();
_dl_cleanup_objects();
return (0);
 }
Index: library.c
===
RCS file: /cvs/src/libexec/ld.so/library.c,v
retrieving revision 1.71
diff -u -p -r1.71 library.c
--- library.c   16 Jan 2015 16:18:07 -  1.71
+++ library.c   21 Oct 2015 13:52:46 -
@@ -74,6 +74,22 @@ _dl_unload_shlib(elf_object_t *object)
}
 }
 
+void
+_dl_unload_unused(void)
+{
+   elf_object_t *obj, *next;
+
+   for (obj = _dl_objects->next; obj != NULL; obj = next) {
+   next = obj->next;
+   if (OBJECT_REF_CNT(obj) != 0 || obj->status & STAT_UNLOADED)
+   continue;
+   obj->status |= STAT_UNLOADED;
+   _dl_load_list_free(obj->load_list);
+   _dl_munmap((void *)obj->load_base, obj->load_size);
+   _dl_remove_object(obj);
+   }
+}
+
 elf_object_t *
 _dl_tryload_shlib(const char *libname, int type, int flags)
 {
Index: library_mquery.c
===
RCS file: /cvs/src/libexec/ld.so/library_mquery.c,v
retrieving revision 1.49
diff -u -p -r1.49 library_mquery.c
--- library_mquery.c22 Jan 2015 05:48:17 -  1.49
+++ library_mquery.c21 Oct 2015 13:52:46 -
@@ -79,6 +79,20 @@ _dl_unload_shlib(elf_object_t *object)
}
 }
 
+void
+_dl_unload_unused(void)
+{
+   elf_object_t *obj, *next;
+
+   for (obj = _dl_objects->next; obj != NULL; obj = next) {
+   next = obj->next;
+   if (OBJECT_REF_CNT(obj) != 0 || obj->status & STAT_UNLOADED)
+   continue;
+   obj->status |= STAT_UNLOADED;
+   _dl_load_list_free(obj->load_list);
+   _dl_remove_object(obj);
+   }
+}
 
 elf_object_t *
 _dl_tryload_shlib(const char *libname, int type, int flags)
Index: resolve.h
===
RCS file: /cvs/src/libexec/ld.so/resolve.h,v
retrieving revision 1.73
diff -u -p -r1.73 resolve.h
--- resolve.h   19 Sep 2015 20:56:47 -  1.73
+++ resolve.h   21 Oct 2015 13:52:46 -
@@ -223,6 +223,7 @@ void _dl_unlink_dlopen(elf_object_t *dep
 void _dl_notify_unload_shlib(elf_object_t *object);
 void _dl_unload_shlib(elf_object_t *object);
 void _dl_unload_dlopen(void);
+void _dl_unload_unused(void);
 
 void _dl_run_all_dtors(void);