Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-21 Thread Dimitry Andric
On 2012-05-21 04:54, David Xu wrote:
...
 As I said, it depends on ordering the global objects are destructed, if 
 the object which deleting
 the current_thread_data_key is destructed lastly, the problem wont 
 happen, but now
 it is destructed too early. I believe there is no specification said 
 that which C++ object should be
 destructed first if they are in different compiled module and then are 
 linked together to generated
 a shared object, .so file.

Indeed, the order in which global constructors or destructors are called
is undefined.  Depending on the order is a bug (a.k.a. the static
initialization order fiasco).
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-21 Thread Konstantin Belousov
On Mon, May 21, 2012 at 10:54:54AM +0800, David Xu wrote:
 On 2012/5/21 1:24, Konstantin Belousov wrote:
 On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote:
 On Sun, May 20, 2012 at 8:03 AM, David Xulistlog2...@gmail.com  wrote:
 qdbus segfaults on my machine too, I tracked it down, and found the 
 problem
 is in QT,
 it deleted current_thread_data_key,  but it still uses it in some cxa 
 hooks,
   I  applied the
 following patch,  and it works fine.
 Thanks for the analysis David!
 
 I think the bug depends on linking order in QT library ? if the
 qthread_unix.cpp is linked
 as lastest module, the key will be deleted after all cxa hooks run, then 
 it
 will be fine,
 otherwise, it would crash.
 Is this really possible?
 No, I do not think it is possible.
 
 The only possibility for something weird happen is for atexit/__cxa_atexit
 functions to be registered from another atexit function, and then we
 indeed could call the newly registered function too late.
 
 I wonder if the following hack makes any change in the observed behaviour.
 
 diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c
 index 511172a..bab850c 100644
 --- a/lib/libc/stdlib/atexit.c
 +++ b/lib/libc/stdlib/atexit.c
 @@ -72,6 +72,7 @@ struct atexit {
   };
 
   static struct atexit *__atexit;/* points to head of LIFO 
   stack */
 +static int atexit_gen;
 
   /*
* Register the function described by 'fptr' to be called at application
 @@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr)
  __atexit = p;
  }
  p-fns[p-ind++] = *fptr;
 +atexit_gen++;
  _MUTEX_UNLOCK(atexit_mutex);
  return 0;
   }
 @@ -162,7 +164,7 @@ __cxa_finalize(void *dso)
  struct dl_phdr_info phdr_info;
  struct atexit *p;
  struct atexit_fn fn;
 -int n, has_phdr;
 +int atexit_gen_prev, n, has_phdr;
 
  if (dso != NULL)
  has_phdr = _rtld_addr_phdr(dso,phdr_info);
 @@ -170,6 +172,8 @@ __cxa_finalize(void *dso)
  has_phdr = 0;
 
  _MUTEX_LOCK(atexit_mutex);
 +retry:
 +atexit_gen_prev = atexit_gen;
  for (p = __atexit; p; p = p-next) {
  for (n = p-ind; --n= 0;) {
  if (p-fns[n].fn_type == ATEXIT_FN_EMPTY)
 @@ -196,6 +200,8 @@ __cxa_finalize(void *dso)
  _MUTEX_LOCK(atexit_mutex);
  }
  }
 +if (atexit_gen_prev != atexit_gen)
 +goto retry;
  _MUTEX_UNLOCK(atexit_mutex);
  if (dso == NULL)
  _MUTEX_DESTROY(atexit_mutex);
 I have tried your patch,  it does not fix the problem. As I said, it is 
 a bug in QT,
 the bug is pthread key current_thread_data_key is deleted by a global 
 C++ object
 too early, other C++ global objects still need this pthread key. The 
 following procedure
 shows how I found the problem:
 
 davidxu@xyf:~%gdb qdbus
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain 
 conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as i386-marcel-freebsd...(no debugging symbols 
 found)...
 (gdb) break __cxa_finalize
 Function __cxa_finalize not defined.
 Make breakpoint pending on future shared library load? (y or [n]) y
 Breakpoint 1 (__cxa_finalize) pending.
 (gdb) run
 Starting program: /usr/local/bin/qdbus
 (no debugging symbols found)...(no debugging symbols found)...(no 
 debugging symbols found)...(no debugging symbols found)...(no debugging 
 symbols found)...(no debugging symbols found)...(no debugging symbols 
 found)...[New LWP 100077]
 (no debugging symbols found)...(no debugging symbols found)...(no 
 debugging symbols found)...(no debugging symbols found)...(no debugging 
 symbols found)...(no debugging symbols found)...(no debugging symbols 
 found)...(no debugging symbols found)...Breakpoint 2 at 0x2864ac26
 Pending breakpoint __cxa_finalize resolved
 (no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)]
 (no debugging symbols found)...:1.0
  org.gnome.SessionManager
 :1.11
 :1.111
 :1.12
 :1.13
  org.gtk.vfs.Daemon
 :1.143
 :1.15
  org.pulseaudio.Server
 :1.17
  org.gnome.Panel
 :1.18
 :1.19
 :1.20
  org.gtk.Private.HalVolumeMonitor
 :1.21
  org.gtk.Private.GPhoto2VolumeMonitor
 :1.22
 :1.24
  org.gnome.ScreenSaver
 :1.25
 :1.27
 :1.28
 :1.29
 :1.30
 :1.31
  org.gnome.panel.applet.WnckletFactory
 :1.32
 :1.33
 :1.34
 :1.35
  org.gnome.panel.applet.CPUFreqAppletFactory
 :1.36
  org.gnome.panel.applet.NotificationAreaAppletFactory
 :1.37
  org.gnome.panel.applet.MultiLoadAppletFactory
 :1.38
 :1.39
 :1.4
  org.gnome.GConf
 :1.41
  org.gnome.panel.applet.ClockAppletFactory
 :1.49
 :1.5
  org.gnome.SettingsDaemon
 :1.50
 :1.53
 :1.64
 :1.7
  org.freedesktop.secrets
  org.gnome.keyring
 :1.75
  org.gtk.vfs.Metadata
 :1.76
  

Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-21 Thread Gustau Pérez i Querol




Now let me dig into qthread_unix.cpp, see how QThreadData::current() 
works:


QThreadData *QThreadData::current()
{
QThreadData *data = get_thread_data();
if (!data) {
void *a;
if 
(QInternal::activateCallbacks(QInternal::AdoptCurrentThread, a)) {

QThread *adopted = static_castQThread*(a);
Q_ASSERT(adopted);
data = QThreadData::get2(adopted);
set_thread_data(data);
adopted-d_func()-running = true;
adopted-d_func()-finished = false;
static_castQAdoptedThread *(adopted)-init();
} else {
data = new QThreadData;
QT_TRY {
set_thread_data(data);
data-thread = new QAdoptedThread(data);
} QT_CATCH(...) {
clear_thread_data();
data-deref();
data = 0;
QT_RETHROW;
}
data-deref();
}
if (!QCoreApplicationPrivate::theMainThread)
QCoreApplicationPrivate::theMainThread = data-thread;
}
return data;
}

it calls get_thread_data(), if it returns NULL, it create a new 
thread, and try to

set the new thread as current thread data, it calls set_thread_data().

let's see how get_thread_data() and set_thread_data() work :

static QThreadData *get_thread_data()
{
#ifdef Q_OS_SYMBIAN
return reinterpret_castQThreadData *(Dll::Tls());
#else
pthread_once(current_thread_data_once, 
create_current_thread_data_key);
return reinterpret_castQThreadData 
*(pthread_getspecific(current_thread_data_key));

#endif
}

static void set_thread_data(QThreadData *data)
{
#ifdef Q_OS_SYMBIAN
qt_symbian_throwIfError(Dll::SetTls(data));
#endif
pthread_once(current_thread_data_once, 
create_current_thread_data_key);

pthread_setspecific(current_thread_data_key, data);
}


They just use pthread_getspecific and pthread_setspecific, the 
current_thread_data_key was only
created once which is guarded by pthread_once(), but as you know, the 
key has already
been deleted by Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key) 
which is a global
object which has been destructed early, the key is no longer 
recreated, it is a stale key.




  I was able to debug until the point where qthread_unix.cpp spawns a 
new thread because the get_thread_data call returns 0. I was unable to 
reach the full analysis, but now I get it. The explanation seems fine to 
me, thanks.


  What I don't get is why it works in stable. The functions registered 
to be executed at exit (atexit_register hasn't changed) get registered 
in same order in both branches (at least I checked them by printing the 
two atexit structures when calling exit in both stable and head). 
Wouldn't that mean that the problem of deleting the 
current_thread_data_key should happen in both branches?


   Gus
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-20 Thread David Xu

On 2012/4/30 22:13, Gustau Pérez i Querol wrote:


  Hi,

  the kde team is seeing some strange problems with the new version
(4.8.1) of devel/dbus-qt4 with current. It does work with stable. I
also suspect that the problem described below is affecting the
experimental cinnamon port (an alternative to gnome3, possible
replacement of gnome2).

  The problem happens with both i386 and amd64 with empty
/etc/malloc.conf and simple /etc/make.conf. Everything compiled with
base gcc (no clang). The kernel was compiled with no debug support,
but it can enable if needed. There are reports from avi...@freebsd.org
of the same behavior with clang compiled world and kernel and with
MALLOC_PRODUCTION=yes.

 When qdbus starts, it segfauts. The backtrace of the problem with
r234769 can be found here: http://pastebin.com/ryBXtqGF. When starting
the qdbus daemon by hand in a X+twm session, we see it calls calloc
many times and after a fixed number of times segfaults. We see it
segfaults at rb_gen (a quite large macro defined at
$SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h).

 If the daemon is started by hand, I'm able to skip all the calls
qdbus makes to calloc till the one causing the segfault. At that
point, at rb_gen, we don't exactly know what is going on or how to
debug the macro. Ktrace are available, but we were unable to find
anything new from them.

  With old versions of current before the jemalloc imports (as of
March 30th) the daemon segfaulted at malloc.c:2426. With revisions
during April 20 to 24th (can be more precise, it was during the
jemalloc imports) the daemon segfaulted at malloc_init. Bts are
available if needed, and if necessary I can go back to those revision
and recompile world+kernel to see its behavior.

  Any help from freebsd-current@ (perhaps Jason Evans can help us)
will be appreciated. Any additional info, like source revisions, can
be provided. I would like to stress that the experimental
devel/dbus-qt4 works fine with recent stable.

qdbus segfaults on my machine too, I tracked it down, and found the 
problem is in QT,
it deleted current_thread_data_key,  but it still uses it in some cxa 
hooks,  I  applied the

following patch,  and it works fine.

--- qthread_unix.cpp2012-05-20 13:23:09.0 +0800
+++ qthread_unix_new.cpp2012-05-20 13:22:45.0 +0800
@@ -156,7 +156,7 @@
 {
 pthread_key_delete(current_thread_data_key);
 }
-Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key)
+//Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key)


 // Utility functions for getting, setting and clearing thread specific 
data.

---
the Q_DESTRUCTOR_FUNCTION defined global a C++ object, and in its 
destructor,
it deletes the current_thread_data_key,  but in other cxa hooks, the key 
is still needed.

So, finally the QT library crashed.
I think the bug depends on linking order in QT library ? if the 
qthread_unix.cpp is linked
as lastest module, the key will be deleted after all cxa hooks run, then 
it will be fine,

otherwise, it would crash. This sounds like a bug in QT.

Regards,
David Xu
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-20 Thread Alberto Villa
On Sun, May 20, 2012 at 8:03 AM, David Xu listlog2...@gmail.com wrote:
 qdbus segfaults on my machine too, I tracked it down, and found the problem
 is in QT,
 it deleted current_thread_data_key,  but it still uses it in some cxa hooks,
  I  applied the
 following patch,  and it works fine.

Thanks for the analysis David!

 I think the bug depends on linking order in QT library ? if the
 qthread_unix.cpp is linked
 as lastest module, the key will be deleted after all cxa hooks run, then it
 will be fine,
 otherwise, it would crash.

Is this really possible?
-- 
Alberto Villa, FreeBSD committer avi...@freebsd.org
http://people.FreeBSD.org/~avilla
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-20 Thread Konstantin Belousov
On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote:
 On Sun, May 20, 2012 at 8:03 AM, David Xu listlog2...@gmail.com wrote:
  qdbus segfaults on my machine too, I tracked it down, and found the problem
  is in QT,
  it deleted current_thread_data_key,  but it still uses it in some cxa hooks,
   I  applied the
  following patch,  and it works fine.
 
 Thanks for the analysis David!
 
  I think the bug depends on linking order in QT library ? if the
  qthread_unix.cpp is linked
  as lastest module, the key will be deleted after all cxa hooks run, then it
  will be fine,
  otherwise, it would crash.
 
 Is this really possible?
No, I do not think it is possible.

The only possibility for something weird happen is for atexit/__cxa_atexit
functions to be registered from another atexit function, and then we
indeed could call the newly registered function too late.

I wonder if the following hack makes any change in the observed behaviour.

diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c
index 511172a..bab850c 100644
--- a/lib/libc/stdlib/atexit.c
+++ b/lib/libc/stdlib/atexit.c
@@ -72,6 +72,7 @@ struct atexit {
 };
 
 static struct atexit *__atexit;/* points to head of LIFO stack 
*/
+static int atexit_gen;
 
 /*
  * Register the function described by 'fptr' to be called at application
@@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr)
__atexit = p;
}
p-fns[p-ind++] = *fptr;
+   atexit_gen++;
_MUTEX_UNLOCK(atexit_mutex);
return 0;
 }
@@ -162,7 +164,7 @@ __cxa_finalize(void *dso)
struct dl_phdr_info phdr_info;
struct atexit *p;
struct atexit_fn fn;
-   int n, has_phdr;
+   int atexit_gen_prev, n, has_phdr;
 
if (dso != NULL)
has_phdr = _rtld_addr_phdr(dso, phdr_info);
@@ -170,6 +172,8 @@ __cxa_finalize(void *dso)
has_phdr = 0;
 
_MUTEX_LOCK(atexit_mutex);
+retry:
+   atexit_gen_prev = atexit_gen;
for (p = __atexit; p; p = p-next) {
for (n = p-ind; --n = 0;) {
if (p-fns[n].fn_type == ATEXIT_FN_EMPTY)
@@ -196,6 +200,8 @@ __cxa_finalize(void *dso)
_MUTEX_LOCK(atexit_mutex);
}
}
+   if (atexit_gen_prev != atexit_gen)
+   goto retry;
_MUTEX_UNLOCK(atexit_mutex);
if (dso == NULL)
_MUTEX_DESTROY(atexit_mutex);


pgpltDYlBjW1z.pgp
Description: PGP signature


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-20 Thread David Xu

On 2012/5/21 1:24, Konstantin Belousov wrote:

On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote:

On Sun, May 20, 2012 at 8:03 AM, David Xulistlog2...@gmail.com  wrote:

qdbus segfaults on my machine too, I tracked it down, and found the problem
is in QT,
it deleted current_thread_data_key,  but it still uses it in some cxa hooks,
  I  applied the
following patch,  and it works fine.

Thanks for the analysis David!


I think the bug depends on linking order in QT library ? if the
qthread_unix.cpp is linked
as lastest module, the key will be deleted after all cxa hooks run, then it
will be fine,
otherwise, it would crash.

Is this really possible?

No, I do not think it is possible.

The only possibility for something weird happen is for atexit/__cxa_atexit
functions to be registered from another atexit function, and then we
indeed could call the newly registered function too late.

I wonder if the following hack makes any change in the observed behaviour.

diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c
index 511172a..bab850c 100644
--- a/lib/libc/stdlib/atexit.c
+++ b/lib/libc/stdlib/atexit.c
@@ -72,6 +72,7 @@ struct atexit {
  };

  static struct atexit *__atexit;   /* points to head of LIFO stack 
*/
+static int atexit_gen;

  /*
   * Register the function described by 'fptr' to be called at application
@@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr)
__atexit = p;
}
p-fns[p-ind++] = *fptr;
+   atexit_gen++;
_MUTEX_UNLOCK(atexit_mutex);
return 0;
  }
@@ -162,7 +164,7 @@ __cxa_finalize(void *dso)
struct dl_phdr_info phdr_info;
struct atexit *p;
struct atexit_fn fn;
-   int n, has_phdr;
+   int atexit_gen_prev, n, has_phdr;

if (dso != NULL)
has_phdr = _rtld_addr_phdr(dso,phdr_info);
@@ -170,6 +172,8 @@ __cxa_finalize(void *dso)
has_phdr = 0;

_MUTEX_LOCK(atexit_mutex);
+retry:
+   atexit_gen_prev = atexit_gen;
for (p = __atexit; p; p = p-next) {
for (n = p-ind; --n= 0;) {
if (p-fns[n].fn_type == ATEXIT_FN_EMPTY)
@@ -196,6 +200,8 @@ __cxa_finalize(void *dso)
_MUTEX_LOCK(atexit_mutex);
}
}
+   if (atexit_gen_prev != atexit_gen)
+   goto retry;
_MUTEX_UNLOCK(atexit_mutex);
if (dso == NULL)
_MUTEX_DESTROY(atexit_mutex);
I have tried your patch,  it does not fix the problem. As I said, it is 
a bug in QT,
the bug is pthread key current_thread_data_key is deleted by a global 
C++ object
too early, other C++ global objects still need this pthread key. The 
following procedure

shows how I found the problem:

davidxu@xyf:~%gdb qdbus
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd...(no debugging symbols 
found)...

(gdb) break __cxa_finalize
Function __cxa_finalize not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (__cxa_finalize) pending.
(gdb) run
Starting program: /usr/local/bin/qdbus
(no debugging symbols found)...(no debugging symbols found)...(no 
debugging symbols found)...(no debugging symbols found)...(no debugging 
symbols found)...(no debugging symbols found)...(no debugging symbols 
found)...[New LWP 100077]
(no debugging symbols found)...(no debugging symbols found)...(no 
debugging symbols found)...(no debugging symbols found)...(no debugging 
symbols found)...(no debugging symbols found)...(no debugging symbols 
found)...(no debugging symbols found)...Breakpoint 2 at 0x2864ac26

Pending breakpoint __cxa_finalize resolved
(no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)]
(no debugging symbols found)...:1.0
 org.gnome.SessionManager
:1.11
:1.111
:1.12
:1.13
 org.gtk.vfs.Daemon
:1.143
:1.15
 org.pulseaudio.Server
:1.17
 org.gnome.Panel
:1.18
:1.19
:1.20
 org.gtk.Private.HalVolumeMonitor
:1.21
 org.gtk.Private.GPhoto2VolumeMonitor
:1.22
:1.24
 org.gnome.ScreenSaver
:1.25
:1.27
:1.28
:1.29
:1.30
:1.31
 org.gnome.panel.applet.WnckletFactory
:1.32
:1.33
:1.34
:1.35
 org.gnome.panel.applet.CPUFreqAppletFactory
:1.36
 org.gnome.panel.applet.NotificationAreaAppletFactory
:1.37
 org.gnome.panel.applet.MultiLoadAppletFactory
:1.38
:1.39
:1.4
 org.gnome.GConf
:1.41
 org.gnome.panel.applet.ClockAppletFactory
:1.49
:1.5
 org.gnome.SettingsDaemon
:1.50
:1.53
:1.64
:1.7
 org.freedesktop.secrets
 org.gnome.keyring
:1.75
 org.gtk.vfs.Metadata
:1.76
 org.gnome.Terminal.Display_0_0
:1.77
org.freedesktop.DBus
[Switching to Thread 29007300 (LWP 100077/qdbus)]

Breakpoint 2, 0x2864ac26 in 

Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-20 Thread David Xu

On 2012/5/21 10:54, David Xu wrote:

On 2012/5/21 1:24, Konstantin Belousov wrote:

On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote:
On Sun, May 20, 2012 at 8:03 AM, David Xulistlog2...@gmail.com  
wrote:
qdbus segfaults on my machine too, I tracked it down, and found the 
problem

is in QT,
it deleted current_thread_data_key,  but it still uses it in some 
cxa hooks,

  I  applied the
following patch,  and it works fine.

Thanks for the analysis David!


I think the bug depends on linking order in QT library ? if the
qthread_unix.cpp is linked
as lastest module, the key will be deleted after all cxa hooks run, 
then it

will be fine,
otherwise, it would crash.

Is this really possible?

No, I do not think it is possible.

The only possibility for something weird happen is for 
atexit/__cxa_atexit

functions to be registered from another atexit function, and then we
indeed could call the newly registered function too late.

I wonder if the following hack makes any change in the observed 
behaviour.


diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c
index 511172a..bab850c 100644
--- a/lib/libc/stdlib/atexit.c
+++ b/lib/libc/stdlib/atexit.c
@@ -72,6 +72,7 @@ struct atexit {
  };

  static struct atexit *__atexit;/* points to head of LIFO 
stack */

+static int atexit_gen;

  /*
   * Register the function described by 'fptr' to be called at 
application

@@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr)
  __atexit = p;
  }
  p-fns[p-ind++] = *fptr;
+atexit_gen++;
  _MUTEX_UNLOCK(atexit_mutex);
  return 0;
  }
@@ -162,7 +164,7 @@ __cxa_finalize(void *dso)
  struct dl_phdr_info phdr_info;
  struct atexit *p;
  struct atexit_fn fn;
-int n, has_phdr;
+int atexit_gen_prev, n, has_phdr;

  if (dso != NULL)
  has_phdr = _rtld_addr_phdr(dso,phdr_info);
@@ -170,6 +172,8 @@ __cxa_finalize(void *dso)
  has_phdr = 0;

  _MUTEX_LOCK(atexit_mutex);
+retry:
+atexit_gen_prev = atexit_gen;
  for (p = __atexit; p; p = p-next) {
  for (n = p-ind; --n= 0;) {
  if (p-fns[n].fn_type == ATEXIT_FN_EMPTY)
@@ -196,6 +200,8 @@ __cxa_finalize(void *dso)
  _MUTEX_LOCK(atexit_mutex);
  }
  }
+if (atexit_gen_prev != atexit_gen)
+goto retry;
  _MUTEX_UNLOCK(atexit_mutex);
  if (dso == NULL)
  _MUTEX_DESTROY(atexit_mutex);
I have tried your patch,  it does not fix the problem. As I said, it 
is a bug in QT,
the bug is pthread key current_thread_data_key is deleted by a global 
C++ object
too early, other C++ global objects still need this pthread key. The 
following procedure

shows how I found the problem:

davidxu@xyf:~%gdb qdbus
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and 
you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for 
details.
This GDB was configured as i386-marcel-freebsd...(no debugging 
symbols found)...

(gdb) break __cxa_finalize
Function __cxa_finalize not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (__cxa_finalize) pending.
(gdb) run
Starting program: /usr/local/bin/qdbus
(no debugging symbols found)...(no debugging symbols found)...(no 
debugging symbols found)...(no debugging symbols found)...(no 
debugging symbols found)...(no debugging symbols found)...(no 
debugging symbols found)...[New LWP 100077]
(no debugging symbols found)...(no debugging symbols found)...(no 
debugging symbols found)...(no debugging symbols found)...(no 
debugging symbols found)...(no debugging symbols found)...(no 
debugging symbols found)...(no debugging symbols found)...Breakpoint 2 
at 0x2864ac26

Pending breakpoint __cxa_finalize resolved
(no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)]
(no debugging symbols found)...:1.0
 org.gnome.SessionManager
:1.11
:1.111
:1.12
:1.13
 org.gtk.vfs.Daemon
:1.143
:1.15
 org.pulseaudio.Server
:1.17
 org.gnome.Panel
:1.18
:1.19
:1.20
 org.gtk.Private.HalVolumeMonitor
:1.21
 org.gtk.Private.GPhoto2VolumeMonitor
:1.22
:1.24
 org.gnome.ScreenSaver
:1.25
:1.27
:1.28
:1.29
:1.30
:1.31
 org.gnome.panel.applet.WnckletFactory
:1.32
:1.33
:1.34
:1.35
 org.gnome.panel.applet.CPUFreqAppletFactory
:1.36
 org.gnome.panel.applet.NotificationAreaAppletFactory
:1.37
 org.gnome.panel.applet.MultiLoadAppletFactory
:1.38
:1.39
:1.4
 org.gnome.GConf
:1.41
 org.gnome.panel.applet.ClockAppletFactory
:1.49
:1.5
 org.gnome.SettingsDaemon
:1.50
:1.53
:1.64
:1.7
 org.freedesktop.secrets
 org.gnome.keyring
:1.75
 org.gtk.vfs.Metadata
:1.76
 org.gnome.Terminal.Display_0_0
:1.77
org.freedesktop.DBus
[Switching to Thread 29007300 (LWP 100077/qdbus)]

Breakpoint 2, 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7
(gdb) print 

Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-18 Thread Alberto Villa
On Tue, May 1, 2012 at 8:18 PM, Gustau Pérez i Querol
gpe...@entel.upc.edu wrote:
  So the problem seems to be not related to jemalloc or malloc. As the
 experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem has
 do to with some differences between head and stable. When we get more hints
 where the problem is, I will post them in a new thread in freebsd-current@.

Gus has been away for a while, but before disappearing he found a
workaround to be building devel/dbus-qt4 with -fno-use-cxa-atexit. So
I had a look around, and found this NetBSD bug report:
http://www.archivum.info/fa.netbsd.bugs/2007-12/00070/lib-37654-libc's-atexit_mutex-should-be-fully-recursive.html

Since qdbus crashes after exit(3) here too, that might be an
explanation. Or, at least, something related.

kib@ and kan@ are CCed as per avg@ suggestion.
-- 
Alberto Villa, FreeBSD committer avi...@freebsd.org
http://people.FreeBSD.org/~avilla
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-18 Thread Konstantin Belousov
On Fri, May 18, 2012 at 07:01:25PM +0200, Alberto Villa wrote:
 On Tue, May 1, 2012 at 8:18 PM, Gustau P?rez i Querol
 gpe...@entel.upc.edu wrote:
   So the problem seems to be not related to jemalloc or malloc. As the
  experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem has
  do to with some differences between head and stable. When we get more hints
  where the problem is, I will post them in a new thread in freebsd-current@.
 
 Gus has been away for a while, but before disappearing he found a
 workaround to be building devel/dbus-qt4 with -fno-use-cxa-atexit. So
 I had a look around, and found this NetBSD bug report:
 http://www.archivum.info/fa.netbsd.bugs/2007-12/00070/lib-37654-libc's-atexit_mutex-should-be-fully-recursive.html
 
 Since qdbus crashes after exit(3) here too, that might be an
 explanation. Or, at least, something related.
 
 kib@ and kan@ are CCed as per avg@ suggestion.

You provided zero information.

The reference to NetBSD is completely meaningless, we drop atexit_mutex
when calling registered atexit handlers.

At least bother to provide useful bug report if you suspect a bug in base
system and want it fixed.


pgp0sGxGE8Xls.pgp
Description: PGP signature


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-18 Thread Andriy Gapon
on 18/05/2012 20:01 Alberto Villa said the following:
 On Tue, May 1, 2012 at 8:18 PM, Gustau Pérez i Querol
 gpe...@entel.upc.edu wrote:
  So the problem seems to be not related to jemalloc or malloc. As the
 experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem has
 do to with some differences between head and stable. When we get more hints
 where the problem is, I will post them in a new thread in freebsd-current@.
 
 Gus has been away for a while, but before disappearing he found a
 workaround to be building devel/dbus-qt4 with -fno-use-cxa-atexit. So
 I had a look around, and found this NetBSD bug report:
 http://www.archivum.info/fa.netbsd.bugs/2007-12/00070/lib-37654-libc's-atexit_mutex-should-be-fully-recursive.html
 
 Since qdbus crashes after exit(3) here too, that might be an
 explanation. Or, at least, something related.
 
 kib@ and kan@ are CCed as per avg@ suggestion.

Alberto,

you have add new people to the discussion, but unfortunately too little of the
original context is present here...  That is, this email doesn't even include a
description of an actual problem.
Could you please provide the useful context either as a link to a mailing list
archive or in some other equally useful way?

Thank you!
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-18 Thread Alberto Villa
On Fri, May 18, 2012 at 11:28 PM, Andriy Gapon a...@freebsd.org wrote:
 you have add new people to the discussion, but unfortunately too little of the
 original context is present here...  That is, this email doesn't even include 
 a
 description of an actual problem.
 Could you please provide the useful context either as a link to a mailing list
 archive or in some other equally useful way?

Sorry, Gmail showed the thread with all the history, but I see that in
the archives it's considered as two different conversations.

Here's the original thread:
http://lists.freebsd.org/pipermail/freebsd-current/2012-April/033547.html

I think I understand that the NetBSD problem is not related to our
case, Also, Gustau told me that he narrowed the problem down to
__pthread_cxa_finalize. He will add new information very soon, anyway.
-- 
Alberto Villa, FreeBSD committer avi...@freebsd.org
http://people.FreeBSD.org/~avilla
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-18 Thread Konstantin Belousov
On Sat, May 19, 2012 at 12:16:59AM +0200, Alberto Villa wrote:
 On Fri, May 18, 2012 at 11:28 PM, Andriy Gapon a...@freebsd.org wrote:
  you have add new people to the discussion, but unfortunately too little of 
  the
  original context is present here...  That is, this email doesn't even 
  include a
  description of an actual problem.
  Could you please provide the useful context either as a link to a mailing 
  list
  archive or in some other equally useful way?
 
 Sorry, Gmail showed the thread with all the history, but I see that in
 the archives it's considered as two different conversations.
 
 Here's the original thread:
 http://lists.freebsd.org/pipermail/freebsd-current/2012-April/033547.html
 
 I think I understand that the NetBSD problem is not related to our
 case, Also, Gustau told me that he narrowed the problem down to
 __pthread_cxa_finalize. He will add new information very soon, anyway.

Well, there is still not much to read. And, http://pastebin.com/ryBXtqGF.
shows 'Unknown Paste ID!'.

That said, why do you think that the problem is in system and not in the
application ? The fact that the issue does not manifests itself under
stable/9 is not enough to arrive at this conclusion.


pgpbvpMe6D9v5.pgp
Description: PGP signature


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-18 Thread Alberto Villa
On Sat, May 19, 2012 at 12:37 AM, Konstantin Belousov
kostik...@gmail.com wrote:
 Well, there is still not much to read. And, http://pastebin.com/ryBXtqGF.
 shows 'Unknown Paste ID!'.

Eh, sorry, Gus will provide updated data.

 That said, why do you think that the problem is in system and not in the
 application ? The fact that the issue does not manifests itself under
 stable/9 is not enough to arrive at this conclusion.

We thought it because it suddenly appeared, but neither me nor Gus are
sure of this. We asked for help because this is affecting the whole Qt
update, and as a kde@ member this is a major concern for me (and many
others, I guess). Whether the issue will be found in the system or in
the application is mostly of no interest.

That said, if there is no information to examine at the moment, let's
just wait for Gus mail. Sorry for the noise, then.
-- 
Alberto Villa, FreeBSD committer avi...@freebsd.org
http://people.FreeBSD.org/~avilla
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-18 Thread Konstantin Belousov
On Sat, May 19, 2012 at 12:49:02AM +0200, Alberto Villa wrote:
 On Sat, May 19, 2012 at 12:37 AM, Konstantin Belousov
 kostik...@gmail.com wrote:
  Well, there is still not much to read. And, http://pastebin.com/ryBXtqGF.
  shows 'Unknown Paste ID!'.
 
 Eh, sorry, Gus will provide updated data.
 
  That said, why do you think that the problem is in system and not in the
  application ? The fact that the issue does not manifests itself under
  stable/9 is not enough to arrive at this conclusion.
 
 We thought it because it suddenly appeared, but neither me nor Gus are
 sure of this. We asked for help because this is affecting the whole Qt
 update, and as a kde@ member this is a major concern for me (and many
 others, I guess). Whether the issue will be found in the system or in
 the application is mostly of no interest.
 
 That said, if there is no information to examine at the moment, let's
 just wait for Gus mail. Sorry for the noise, then.

How to reproduce the issue locally ? (I do not want to install all KDE
to my test box).


pgpRfrNUfMFpK.pgp
Description: PGP signature


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-18 Thread Alberto Villa
On Sat, May 19, 2012 at 12:52 AM, Konstantin Belousov
kostik...@gmail.com wrote:
 How to reproduce the issue locally ? (I do not want to install all KDE
 to my test box).

Just build devel/dbus-qt4 on 10-CURRENT and run qdbus. It should crash
(should you have D-Bus running, which you probably don't have, it
would first print all D-Bus connections and then crash on exit).
-- 
Alberto Villa, FreeBSD committer avi...@freebsd.org
http://people.FreeBSD.org/~avilla
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-05-01 Thread Gustau Pérez i Querol

Al 30/04/2012 21:34, En/na Jason Evans ha escrit:

On Apr 30, 2012, at 7:13 AM, Gustau Pérez i Querol wrote:

  the kde team is seeing some strange problems with the new version (4.8.1) of 
devel/dbus-qt4 with current. It does work with stable. I also suspect that the 
problem described below is affecting the experimental cinnamon port (an 
alternative to gnome3, possible replacement of gnome2).

  The problem happens with both i386 and amd64 with empty /etc/malloc.conf and 
simple /etc/make.conf. Everything compiled with base gcc (no clang). The kernel 
was compiled with no debug support, but it can enable if needed. There are 
reports from avi...@freebsd.org of the same behavior with clang compiled world 
and kernel and with   MALLOC_PRODUCTION=yes.

When qdbus starts, it segfauts. The backtrace of the problem with r234769 can 
be found here: http://pastebin.com/ryBXtqGF. When starting the qdbus daemon by 
hand in a X+twm session, we see it calls calloc many times and after a fixed 
number of times segfaults. We see it segfaults at rb_gen (a quite large macro 
defined at $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h).

If the daemon is started by hand, I'm able to skip all the calls qdbus makes to 
calloc till the one causing the segfault. At that point, at rb_gen, we don't 
exactly know what is going on or how to debug the macro. Ktrace are available, 
but we were unable to find anything new from them.

  With old versions of current before the jemalloc imports (as of March 30th) 
the daemon segfaulted at malloc.c:2426. With revisions during April 20 to 24th 
(can be more precise, it was during the jemalloc imports) the daemon segfaulted 
at malloc_init. Bts are available if needed, and if necessary I can go back to 
those revision and recompile world+kernel to see its behavior.

  Any help from freebsd-current@ (perhaps Jason Evans can help us) will be 
appreciated. Any additional info, like source revisions, can be provided. I 
would like to stress that the experimental devel/dbus-qt4 works fine with 
recent stable.

The crash is happening in page run management, so there is some pretty bad 
memory corruption going on by the time of the crash.  If I understand you 
correctly, you have reproduced the crash on a system that does *not* have 
MALLOC_PRODUCTION defined, which means that none of the assertions in jemalloc 
caught the problem.

Adrian Chadd made the excellent suggestion of trying valgrind; it's likely to 
point out the problem almost immediately.  If that doesn't work, the utrace 
functionality in malloc may help you figure out what activity has occurred by 
the time of the crash, and give you a better understanding of what happened to 
memory around the address that is involved in the crash.


   Thanks all for your suggestions. It would appear devel/dbus-qt4 has 
some problems with multithread management, the daemon has a problem 
which consists in starting a lot of threads and leading it to be 
finished due to stack exhaustion.


  Valgrind suggested to increase the stack size, doing so made things 
even worse; the qdbus daemon was able to spawn even more threads, 
causing the machine to need more memory than the physically allocated 
(that is, it started to use swap).


  So the problem seems to be not related to jemalloc or malloc. As the 
experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem 
has do to with some differences between head and stable. When we get 
more hints where the problem is, I will post them in a new thread in 
freebsd-current@.


  Anyhow, thanks again for your suggestions!

  Gus
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-04-30 Thread Adrian Chadd
Hi,

Please install valgrind and run the program inside valgrind. See what
kind of errors it generates.



Adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: jemalloc: qdbus sigsegv in malloc_init

2012-04-30 Thread Jason Evans
On Apr 30, 2012, at 7:13 AM, Gustau Pérez i Querol wrote:
  the kde team is seeing some strange problems with the new version (4.8.1) of 
 devel/dbus-qt4 with current. It does work with stable. I also suspect that 
 the problem described below is affecting the experimental cinnamon port (an 
 alternative to gnome3, possible replacement of gnome2).
 
  The problem happens with both i386 and amd64 with empty /etc/malloc.conf and 
 simple /etc/make.conf. Everything compiled with base gcc (no clang). The 
 kernel was compiled with no debug support, but it can enable if needed. There 
 are reports from avi...@freebsd.org of the same behavior with clang compiled 
 world and kernel and with   MALLOC_PRODUCTION=yes.
 
 When qdbus starts, it segfauts. The backtrace of the problem with r234769 can 
 be found here: http://pastebin.com/ryBXtqGF. When starting the qdbus daemon 
 by hand in a X+twm session, we see it calls calloc many times and after a 
 fixed number of times segfaults. We see it segfaults at rb_gen (a quite large 
 macro defined at $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h).
 
 If the daemon is started by hand, I'm able to skip all the calls qdbus makes 
 to calloc till the one causing the segfault. At that point, at rb_gen, we 
 don't exactly know what is going on or how to debug the macro. Ktrace are 
 available, but we were unable to find anything new from them.
 
  With old versions of current before the jemalloc imports (as of March 30th) 
 the daemon segfaulted at malloc.c:2426. With revisions during April 20 to 
 24th (can be more precise, it was during the jemalloc imports) the daemon 
 segfaulted at malloc_init. Bts are available if needed, and if necessary I 
 can go back to those revision and recompile world+kernel to see its behavior.
 
  Any help from freebsd-current@ (perhaps Jason Evans can help us) will be 
 appreciated. Any additional info, like source revisions, can be provided. I 
 would like to stress that the experimental devel/dbus-qt4 works fine with 
 recent stable.

The crash is happening in page run management, so there is some pretty bad 
memory corruption going on by the time of the crash.  If I understand you 
correctly, you have reproduced the crash on a system that does *not* have 
MALLOC_PRODUCTION defined, which means that none of the assertions in jemalloc 
caught the problem.

Adrian Chadd made the excellent suggestion of trying valgrind; it's likely to 
point out the problem almost immediately.  If that doesn't work, the utrace 
functionality in malloc may help you figure out what activity has occurred by 
the time of the crash, and give you a better understanding of what happened to 
memory around the address that is involved in the crash.

Jason___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org