Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On 2012-05-21 04:54, David Xu wrote: ... As I said, it depends on ordering the global objects are destructed, if the object which deleting the current_thread_data_key is destructed lastly, the problem wont happen, but now it is destructed too early. I believe there is no specification said that which C++ object should be destructed first if they are in different compiled module and then are linked together to generated a shared object, .so file. Indeed, the order in which global constructors or destructors are called is undefined. Depending on the order is a bug (a.k.a. the static initialization order fiasco). ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On Mon, May 21, 2012 at 10:54:54AM +0800, David Xu wrote: On 2012/5/21 1:24, Konstantin Belousov wrote: On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote: On Sun, May 20, 2012 at 8:03 AM, David Xulistlog2...@gmail.com wrote: qdbus segfaults on my machine too, I tracked it down, and found the problem is in QT, it deleted current_thread_data_key, but it still uses it in some cxa hooks, I applied the following patch, and it works fine. Thanks for the analysis David! I think the bug depends on linking order in QT library ? if the qthread_unix.cpp is linked as lastest module, the key will be deleted after all cxa hooks run, then it will be fine, otherwise, it would crash. Is this really possible? No, I do not think it is possible. The only possibility for something weird happen is for atexit/__cxa_atexit functions to be registered from another atexit function, and then we indeed could call the newly registered function too late. I wonder if the following hack makes any change in the observed behaviour. diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c index 511172a..bab850c 100644 --- a/lib/libc/stdlib/atexit.c +++ b/lib/libc/stdlib/atexit.c @@ -72,6 +72,7 @@ struct atexit { }; static struct atexit *__atexit;/* points to head of LIFO stack */ +static int atexit_gen; /* * Register the function described by 'fptr' to be called at application @@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr) __atexit = p; } p-fns[p-ind++] = *fptr; +atexit_gen++; _MUTEX_UNLOCK(atexit_mutex); return 0; } @@ -162,7 +164,7 @@ __cxa_finalize(void *dso) struct dl_phdr_info phdr_info; struct atexit *p; struct atexit_fn fn; -int n, has_phdr; +int atexit_gen_prev, n, has_phdr; if (dso != NULL) has_phdr = _rtld_addr_phdr(dso,phdr_info); @@ -170,6 +172,8 @@ __cxa_finalize(void *dso) has_phdr = 0; _MUTEX_LOCK(atexit_mutex); +retry: +atexit_gen_prev = atexit_gen; for (p = __atexit; p; p = p-next) { for (n = p-ind; --n= 0;) { if (p-fns[n].fn_type == ATEXIT_FN_EMPTY) @@ -196,6 +200,8 @@ __cxa_finalize(void *dso) _MUTEX_LOCK(atexit_mutex); } } +if (atexit_gen_prev != atexit_gen) +goto retry; _MUTEX_UNLOCK(atexit_mutex); if (dso == NULL) _MUTEX_DESTROY(atexit_mutex); I have tried your patch, it does not fix the problem. As I said, it is a bug in QT, the bug is pthread key current_thread_data_key is deleted by a global C++ object too early, other C++ global objects still need this pthread key. The following procedure shows how I found the problem: davidxu@xyf:~%gdb qdbus GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd...(no debugging symbols found)... (gdb) break __cxa_finalize Function __cxa_finalize not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (__cxa_finalize) pending. (gdb) run Starting program: /usr/local/bin/qdbus (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...[New LWP 100077] (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...Breakpoint 2 at 0x2864ac26 Pending breakpoint __cxa_finalize resolved (no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)] (no debugging symbols found)...:1.0 org.gnome.SessionManager :1.11 :1.111 :1.12 :1.13 org.gtk.vfs.Daemon :1.143 :1.15 org.pulseaudio.Server :1.17 org.gnome.Panel :1.18 :1.19 :1.20 org.gtk.Private.HalVolumeMonitor :1.21 org.gtk.Private.GPhoto2VolumeMonitor :1.22 :1.24 org.gnome.ScreenSaver :1.25 :1.27 :1.28 :1.29 :1.30 :1.31 org.gnome.panel.applet.WnckletFactory :1.32 :1.33 :1.34 :1.35 org.gnome.panel.applet.CPUFreqAppletFactory :1.36 org.gnome.panel.applet.NotificationAreaAppletFactory :1.37 org.gnome.panel.applet.MultiLoadAppletFactory :1.38 :1.39 :1.4 org.gnome.GConf :1.41 org.gnome.panel.applet.ClockAppletFactory :1.49 :1.5 org.gnome.SettingsDaemon :1.50 :1.53 :1.64 :1.7 org.freedesktop.secrets org.gnome.keyring :1.75 org.gtk.vfs.Metadata :1.76
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
Now let me dig into qthread_unix.cpp, see how QThreadData::current() works: QThreadData *QThreadData::current() { QThreadData *data = get_thread_data(); if (!data) { void *a; if (QInternal::activateCallbacks(QInternal::AdoptCurrentThread, a)) { QThread *adopted = static_castQThread*(a); Q_ASSERT(adopted); data = QThreadData::get2(adopted); set_thread_data(data); adopted-d_func()-running = true; adopted-d_func()-finished = false; static_castQAdoptedThread *(adopted)-init(); } else { data = new QThreadData; QT_TRY { set_thread_data(data); data-thread = new QAdoptedThread(data); } QT_CATCH(...) { clear_thread_data(); data-deref(); data = 0; QT_RETHROW; } data-deref(); } if (!QCoreApplicationPrivate::theMainThread) QCoreApplicationPrivate::theMainThread = data-thread; } return data; } it calls get_thread_data(), if it returns NULL, it create a new thread, and try to set the new thread as current thread data, it calls set_thread_data(). let's see how get_thread_data() and set_thread_data() work : static QThreadData *get_thread_data() { #ifdef Q_OS_SYMBIAN return reinterpret_castQThreadData *(Dll::Tls()); #else pthread_once(current_thread_data_once, create_current_thread_data_key); return reinterpret_castQThreadData *(pthread_getspecific(current_thread_data_key)); #endif } static void set_thread_data(QThreadData *data) { #ifdef Q_OS_SYMBIAN qt_symbian_throwIfError(Dll::SetTls(data)); #endif pthread_once(current_thread_data_once, create_current_thread_data_key); pthread_setspecific(current_thread_data_key, data); } They just use pthread_getspecific and pthread_setspecific, the current_thread_data_key was only created once which is guarded by pthread_once(), but as you know, the key has already been deleted by Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key) which is a global object which has been destructed early, the key is no longer recreated, it is a stale key. I was able to debug until the point where qthread_unix.cpp spawns a new thread because the get_thread_data call returns 0. I was unable to reach the full analysis, but now I get it. The explanation seems fine to me, thanks. What I don't get is why it works in stable. The functions registered to be executed at exit (atexit_register hasn't changed) get registered in same order in both branches (at least I checked them by printing the two atexit structures when calling exit in both stable and head). Wouldn't that mean that the problem of deleting the current_thread_data_key should happen in both branches? Gus ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On 2012/4/30 22:13, Gustau Pérez i Querol wrote: Hi, the kde team is seeing some strange problems with the new version (4.8.1) of devel/dbus-qt4 with current. It does work with stable. I also suspect that the problem described below is affecting the experimental cinnamon port (an alternative to gnome3, possible replacement of gnome2). The problem happens with both i386 and amd64 with empty /etc/malloc.conf and simple /etc/make.conf. Everything compiled with base gcc (no clang). The kernel was compiled with no debug support, but it can enable if needed. There are reports from avi...@freebsd.org of the same behavior with clang compiled world and kernel and with MALLOC_PRODUCTION=yes. When qdbus starts, it segfauts. The backtrace of the problem with r234769 can be found here: http://pastebin.com/ryBXtqGF. When starting the qdbus daemon by hand in a X+twm session, we see it calls calloc many times and after a fixed number of times segfaults. We see it segfaults at rb_gen (a quite large macro defined at $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h). If the daemon is started by hand, I'm able to skip all the calls qdbus makes to calloc till the one causing the segfault. At that point, at rb_gen, we don't exactly know what is going on or how to debug the macro. Ktrace are available, but we were unable to find anything new from them. With old versions of current before the jemalloc imports (as of March 30th) the daemon segfaulted at malloc.c:2426. With revisions during April 20 to 24th (can be more precise, it was during the jemalloc imports) the daemon segfaulted at malloc_init. Bts are available if needed, and if necessary I can go back to those revision and recompile world+kernel to see its behavior. Any help from freebsd-current@ (perhaps Jason Evans can help us) will be appreciated. Any additional info, like source revisions, can be provided. I would like to stress that the experimental devel/dbus-qt4 works fine with recent stable. qdbus segfaults on my machine too, I tracked it down, and found the problem is in QT, it deleted current_thread_data_key, but it still uses it in some cxa hooks, I applied the following patch, and it works fine. --- qthread_unix.cpp2012-05-20 13:23:09.0 +0800 +++ qthread_unix_new.cpp2012-05-20 13:22:45.0 +0800 @@ -156,7 +156,7 @@ { pthread_key_delete(current_thread_data_key); } -Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key) +//Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key) // Utility functions for getting, setting and clearing thread specific data. --- the Q_DESTRUCTOR_FUNCTION defined global a C++ object, and in its destructor, it deletes the current_thread_data_key, but in other cxa hooks, the key is still needed. So, finally the QT library crashed. I think the bug depends on linking order in QT library ? if the qthread_unix.cpp is linked as lastest module, the key will be deleted after all cxa hooks run, then it will be fine, otherwise, it would crash. This sounds like a bug in QT. Regards, David Xu ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On Sun, May 20, 2012 at 8:03 AM, David Xu listlog2...@gmail.com wrote: qdbus segfaults on my machine too, I tracked it down, and found the problem is in QT, it deleted current_thread_data_key, but it still uses it in some cxa hooks, I applied the following patch, and it works fine. Thanks for the analysis David! I think the bug depends on linking order in QT library ? if the qthread_unix.cpp is linked as lastest module, the key will be deleted after all cxa hooks run, then it will be fine, otherwise, it would crash. Is this really possible? -- Alberto Villa, FreeBSD committer avi...@freebsd.org http://people.FreeBSD.org/~avilla ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote: On Sun, May 20, 2012 at 8:03 AM, David Xu listlog2...@gmail.com wrote: qdbus segfaults on my machine too, I tracked it down, and found the problem is in QT, it deleted current_thread_data_key, but it still uses it in some cxa hooks, I applied the following patch, and it works fine. Thanks for the analysis David! I think the bug depends on linking order in QT library ? if the qthread_unix.cpp is linked as lastest module, the key will be deleted after all cxa hooks run, then it will be fine, otherwise, it would crash. Is this really possible? No, I do not think it is possible. The only possibility for something weird happen is for atexit/__cxa_atexit functions to be registered from another atexit function, and then we indeed could call the newly registered function too late. I wonder if the following hack makes any change in the observed behaviour. diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c index 511172a..bab850c 100644 --- a/lib/libc/stdlib/atexit.c +++ b/lib/libc/stdlib/atexit.c @@ -72,6 +72,7 @@ struct atexit { }; static struct atexit *__atexit;/* points to head of LIFO stack */ +static int atexit_gen; /* * Register the function described by 'fptr' to be called at application @@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr) __atexit = p; } p-fns[p-ind++] = *fptr; + atexit_gen++; _MUTEX_UNLOCK(atexit_mutex); return 0; } @@ -162,7 +164,7 @@ __cxa_finalize(void *dso) struct dl_phdr_info phdr_info; struct atexit *p; struct atexit_fn fn; - int n, has_phdr; + int atexit_gen_prev, n, has_phdr; if (dso != NULL) has_phdr = _rtld_addr_phdr(dso, phdr_info); @@ -170,6 +172,8 @@ __cxa_finalize(void *dso) has_phdr = 0; _MUTEX_LOCK(atexit_mutex); +retry: + atexit_gen_prev = atexit_gen; for (p = __atexit; p; p = p-next) { for (n = p-ind; --n = 0;) { if (p-fns[n].fn_type == ATEXIT_FN_EMPTY) @@ -196,6 +200,8 @@ __cxa_finalize(void *dso) _MUTEX_LOCK(atexit_mutex); } } + if (atexit_gen_prev != atexit_gen) + goto retry; _MUTEX_UNLOCK(atexit_mutex); if (dso == NULL) _MUTEX_DESTROY(atexit_mutex); pgpltDYlBjW1z.pgp Description: PGP signature
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On 2012/5/21 1:24, Konstantin Belousov wrote: On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote: On Sun, May 20, 2012 at 8:03 AM, David Xulistlog2...@gmail.com wrote: qdbus segfaults on my machine too, I tracked it down, and found the problem is in QT, it deleted current_thread_data_key, but it still uses it in some cxa hooks, I applied the following patch, and it works fine. Thanks for the analysis David! I think the bug depends on linking order in QT library ? if the qthread_unix.cpp is linked as lastest module, the key will be deleted after all cxa hooks run, then it will be fine, otherwise, it would crash. Is this really possible? No, I do not think it is possible. The only possibility for something weird happen is for atexit/__cxa_atexit functions to be registered from another atexit function, and then we indeed could call the newly registered function too late. I wonder if the following hack makes any change in the observed behaviour. diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c index 511172a..bab850c 100644 --- a/lib/libc/stdlib/atexit.c +++ b/lib/libc/stdlib/atexit.c @@ -72,6 +72,7 @@ struct atexit { }; static struct atexit *__atexit; /* points to head of LIFO stack */ +static int atexit_gen; /* * Register the function described by 'fptr' to be called at application @@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr) __atexit = p; } p-fns[p-ind++] = *fptr; + atexit_gen++; _MUTEX_UNLOCK(atexit_mutex); return 0; } @@ -162,7 +164,7 @@ __cxa_finalize(void *dso) struct dl_phdr_info phdr_info; struct atexit *p; struct atexit_fn fn; - int n, has_phdr; + int atexit_gen_prev, n, has_phdr; if (dso != NULL) has_phdr = _rtld_addr_phdr(dso,phdr_info); @@ -170,6 +172,8 @@ __cxa_finalize(void *dso) has_phdr = 0; _MUTEX_LOCK(atexit_mutex); +retry: + atexit_gen_prev = atexit_gen; for (p = __atexit; p; p = p-next) { for (n = p-ind; --n= 0;) { if (p-fns[n].fn_type == ATEXIT_FN_EMPTY) @@ -196,6 +200,8 @@ __cxa_finalize(void *dso) _MUTEX_LOCK(atexit_mutex); } } + if (atexit_gen_prev != atexit_gen) + goto retry; _MUTEX_UNLOCK(atexit_mutex); if (dso == NULL) _MUTEX_DESTROY(atexit_mutex); I have tried your patch, it does not fix the problem. As I said, it is a bug in QT, the bug is pthread key current_thread_data_key is deleted by a global C++ object too early, other C++ global objects still need this pthread key. The following procedure shows how I found the problem: davidxu@xyf:~%gdb qdbus GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd...(no debugging symbols found)... (gdb) break __cxa_finalize Function __cxa_finalize not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (__cxa_finalize) pending. (gdb) run Starting program: /usr/local/bin/qdbus (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...[New LWP 100077] (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...Breakpoint 2 at 0x2864ac26 Pending breakpoint __cxa_finalize resolved (no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)] (no debugging symbols found)...:1.0 org.gnome.SessionManager :1.11 :1.111 :1.12 :1.13 org.gtk.vfs.Daemon :1.143 :1.15 org.pulseaudio.Server :1.17 org.gnome.Panel :1.18 :1.19 :1.20 org.gtk.Private.HalVolumeMonitor :1.21 org.gtk.Private.GPhoto2VolumeMonitor :1.22 :1.24 org.gnome.ScreenSaver :1.25 :1.27 :1.28 :1.29 :1.30 :1.31 org.gnome.panel.applet.WnckletFactory :1.32 :1.33 :1.34 :1.35 org.gnome.panel.applet.CPUFreqAppletFactory :1.36 org.gnome.panel.applet.NotificationAreaAppletFactory :1.37 org.gnome.panel.applet.MultiLoadAppletFactory :1.38 :1.39 :1.4 org.gnome.GConf :1.41 org.gnome.panel.applet.ClockAppletFactory :1.49 :1.5 org.gnome.SettingsDaemon :1.50 :1.53 :1.64 :1.7 org.freedesktop.secrets org.gnome.keyring :1.75 org.gtk.vfs.Metadata :1.76 org.gnome.Terminal.Display_0_0 :1.77 org.freedesktop.DBus [Switching to Thread 29007300 (LWP 100077/qdbus)] Breakpoint 2, 0x2864ac26 in
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On 2012/5/21 10:54, David Xu wrote: On 2012/5/21 1:24, Konstantin Belousov wrote: On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote: On Sun, May 20, 2012 at 8:03 AM, David Xulistlog2...@gmail.com wrote: qdbus segfaults on my machine too, I tracked it down, and found the problem is in QT, it deleted current_thread_data_key, but it still uses it in some cxa hooks, I applied the following patch, and it works fine. Thanks for the analysis David! I think the bug depends on linking order in QT library ? if the qthread_unix.cpp is linked as lastest module, the key will be deleted after all cxa hooks run, then it will be fine, otherwise, it would crash. Is this really possible? No, I do not think it is possible. The only possibility for something weird happen is for atexit/__cxa_atexit functions to be registered from another atexit function, and then we indeed could call the newly registered function too late. I wonder if the following hack makes any change in the observed behaviour. diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c index 511172a..bab850c 100644 --- a/lib/libc/stdlib/atexit.c +++ b/lib/libc/stdlib/atexit.c @@ -72,6 +72,7 @@ struct atexit { }; static struct atexit *__atexit;/* points to head of LIFO stack */ +static int atexit_gen; /* * Register the function described by 'fptr' to be called at application @@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr) __atexit = p; } p-fns[p-ind++] = *fptr; +atexit_gen++; _MUTEX_UNLOCK(atexit_mutex); return 0; } @@ -162,7 +164,7 @@ __cxa_finalize(void *dso) struct dl_phdr_info phdr_info; struct atexit *p; struct atexit_fn fn; -int n, has_phdr; +int atexit_gen_prev, n, has_phdr; if (dso != NULL) has_phdr = _rtld_addr_phdr(dso,phdr_info); @@ -170,6 +172,8 @@ __cxa_finalize(void *dso) has_phdr = 0; _MUTEX_LOCK(atexit_mutex); +retry: +atexit_gen_prev = atexit_gen; for (p = __atexit; p; p = p-next) { for (n = p-ind; --n= 0;) { if (p-fns[n].fn_type == ATEXIT_FN_EMPTY) @@ -196,6 +200,8 @@ __cxa_finalize(void *dso) _MUTEX_LOCK(atexit_mutex); } } +if (atexit_gen_prev != atexit_gen) +goto retry; _MUTEX_UNLOCK(atexit_mutex); if (dso == NULL) _MUTEX_DESTROY(atexit_mutex); I have tried your patch, it does not fix the problem. As I said, it is a bug in QT, the bug is pthread key current_thread_data_key is deleted by a global C++ object too early, other C++ global objects still need this pthread key. The following procedure shows how I found the problem: davidxu@xyf:~%gdb qdbus GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd...(no debugging symbols found)... (gdb) break __cxa_finalize Function __cxa_finalize not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (__cxa_finalize) pending. (gdb) run Starting program: /usr/local/bin/qdbus (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...[New LWP 100077] (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...Breakpoint 2 at 0x2864ac26 Pending breakpoint __cxa_finalize resolved (no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)] (no debugging symbols found)...:1.0 org.gnome.SessionManager :1.11 :1.111 :1.12 :1.13 org.gtk.vfs.Daemon :1.143 :1.15 org.pulseaudio.Server :1.17 org.gnome.Panel :1.18 :1.19 :1.20 org.gtk.Private.HalVolumeMonitor :1.21 org.gtk.Private.GPhoto2VolumeMonitor :1.22 :1.24 org.gnome.ScreenSaver :1.25 :1.27 :1.28 :1.29 :1.30 :1.31 org.gnome.panel.applet.WnckletFactory :1.32 :1.33 :1.34 :1.35 org.gnome.panel.applet.CPUFreqAppletFactory :1.36 org.gnome.panel.applet.NotificationAreaAppletFactory :1.37 org.gnome.panel.applet.MultiLoadAppletFactory :1.38 :1.39 :1.4 org.gnome.GConf :1.41 org.gnome.panel.applet.ClockAppletFactory :1.49 :1.5 org.gnome.SettingsDaemon :1.50 :1.53 :1.64 :1.7 org.freedesktop.secrets org.gnome.keyring :1.75 org.gtk.vfs.Metadata :1.76 org.gnome.Terminal.Display_0_0 :1.77 org.freedesktop.DBus [Switching to Thread 29007300 (LWP 100077/qdbus)] Breakpoint 2, 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7 (gdb) print
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On Tue, May 1, 2012 at 8:18 PM, Gustau Pérez i Querol gpe...@entel.upc.edu wrote: So the problem seems to be not related to jemalloc or malloc. As the experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem has do to with some differences between head and stable. When we get more hints where the problem is, I will post them in a new thread in freebsd-current@. Gus has been away for a while, but before disappearing he found a workaround to be building devel/dbus-qt4 with -fno-use-cxa-atexit. So I had a look around, and found this NetBSD bug report: http://www.archivum.info/fa.netbsd.bugs/2007-12/00070/lib-37654-libc's-atexit_mutex-should-be-fully-recursive.html Since qdbus crashes after exit(3) here too, that might be an explanation. Or, at least, something related. kib@ and kan@ are CCed as per avg@ suggestion. -- Alberto Villa, FreeBSD committer avi...@freebsd.org http://people.FreeBSD.org/~avilla ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On Fri, May 18, 2012 at 07:01:25PM +0200, Alberto Villa wrote: On Tue, May 1, 2012 at 8:18 PM, Gustau P?rez i Querol gpe...@entel.upc.edu wrote: So the problem seems to be not related to jemalloc or malloc. As the experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem has do to with some differences between head and stable. When we get more hints where the problem is, I will post them in a new thread in freebsd-current@. Gus has been away for a while, but before disappearing he found a workaround to be building devel/dbus-qt4 with -fno-use-cxa-atexit. So I had a look around, and found this NetBSD bug report: http://www.archivum.info/fa.netbsd.bugs/2007-12/00070/lib-37654-libc's-atexit_mutex-should-be-fully-recursive.html Since qdbus crashes after exit(3) here too, that might be an explanation. Or, at least, something related. kib@ and kan@ are CCed as per avg@ suggestion. You provided zero information. The reference to NetBSD is completely meaningless, we drop atexit_mutex when calling registered atexit handlers. At least bother to provide useful bug report if you suspect a bug in base system and want it fixed. pgp0sGxGE8Xls.pgp Description: PGP signature
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
on 18/05/2012 20:01 Alberto Villa said the following: On Tue, May 1, 2012 at 8:18 PM, Gustau Pérez i Querol gpe...@entel.upc.edu wrote: So the problem seems to be not related to jemalloc or malloc. As the experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem has do to with some differences between head and stable. When we get more hints where the problem is, I will post them in a new thread in freebsd-current@. Gus has been away for a while, but before disappearing he found a workaround to be building devel/dbus-qt4 with -fno-use-cxa-atexit. So I had a look around, and found this NetBSD bug report: http://www.archivum.info/fa.netbsd.bugs/2007-12/00070/lib-37654-libc's-atexit_mutex-should-be-fully-recursive.html Since qdbus crashes after exit(3) here too, that might be an explanation. Or, at least, something related. kib@ and kan@ are CCed as per avg@ suggestion. Alberto, you have add new people to the discussion, but unfortunately too little of the original context is present here... That is, this email doesn't even include a description of an actual problem. Could you please provide the useful context either as a link to a mailing list archive or in some other equally useful way? Thank you! -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On Fri, May 18, 2012 at 11:28 PM, Andriy Gapon a...@freebsd.org wrote: you have add new people to the discussion, but unfortunately too little of the original context is present here... That is, this email doesn't even include a description of an actual problem. Could you please provide the useful context either as a link to a mailing list archive or in some other equally useful way? Sorry, Gmail showed the thread with all the history, but I see that in the archives it's considered as two different conversations. Here's the original thread: http://lists.freebsd.org/pipermail/freebsd-current/2012-April/033547.html I think I understand that the NetBSD problem is not related to our case, Also, Gustau told me that he narrowed the problem down to __pthread_cxa_finalize. He will add new information very soon, anyway. -- Alberto Villa, FreeBSD committer avi...@freebsd.org http://people.FreeBSD.org/~avilla ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On Sat, May 19, 2012 at 12:16:59AM +0200, Alberto Villa wrote: On Fri, May 18, 2012 at 11:28 PM, Andriy Gapon a...@freebsd.org wrote: you have add new people to the discussion, but unfortunately too little of the original context is present here... That is, this email doesn't even include a description of an actual problem. Could you please provide the useful context either as a link to a mailing list archive or in some other equally useful way? Sorry, Gmail showed the thread with all the history, but I see that in the archives it's considered as two different conversations. Here's the original thread: http://lists.freebsd.org/pipermail/freebsd-current/2012-April/033547.html I think I understand that the NetBSD problem is not related to our case, Also, Gustau told me that he narrowed the problem down to __pthread_cxa_finalize. He will add new information very soon, anyway. Well, there is still not much to read. And, http://pastebin.com/ryBXtqGF. shows 'Unknown Paste ID!'. That said, why do you think that the problem is in system and not in the application ? The fact that the issue does not manifests itself under stable/9 is not enough to arrive at this conclusion. pgpbvpMe6D9v5.pgp Description: PGP signature
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On Sat, May 19, 2012 at 12:37 AM, Konstantin Belousov kostik...@gmail.com wrote: Well, there is still not much to read. And, http://pastebin.com/ryBXtqGF. shows 'Unknown Paste ID!'. Eh, sorry, Gus will provide updated data. That said, why do you think that the problem is in system and not in the application ? The fact that the issue does not manifests itself under stable/9 is not enough to arrive at this conclusion. We thought it because it suddenly appeared, but neither me nor Gus are sure of this. We asked for help because this is affecting the whole Qt update, and as a kde@ member this is a major concern for me (and many others, I guess). Whether the issue will be found in the system or in the application is mostly of no interest. That said, if there is no information to examine at the moment, let's just wait for Gus mail. Sorry for the noise, then. -- Alberto Villa, FreeBSD committer avi...@freebsd.org http://people.FreeBSD.org/~avilla ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On Sat, May 19, 2012 at 12:49:02AM +0200, Alberto Villa wrote: On Sat, May 19, 2012 at 12:37 AM, Konstantin Belousov kostik...@gmail.com wrote: Well, there is still not much to read. And, http://pastebin.com/ryBXtqGF. shows 'Unknown Paste ID!'. Eh, sorry, Gus will provide updated data. That said, why do you think that the problem is in system and not in the application ? The fact that the issue does not manifests itself under stable/9 is not enough to arrive at this conclusion. We thought it because it suddenly appeared, but neither me nor Gus are sure of this. We asked for help because this is affecting the whole Qt update, and as a kde@ member this is a major concern for me (and many others, I guess). Whether the issue will be found in the system or in the application is mostly of no interest. That said, if there is no information to examine at the moment, let's just wait for Gus mail. Sorry for the noise, then. How to reproduce the issue locally ? (I do not want to install all KDE to my test box). pgpRfrNUfMFpK.pgp Description: PGP signature
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On Sat, May 19, 2012 at 12:52 AM, Konstantin Belousov kostik...@gmail.com wrote: How to reproduce the issue locally ? (I do not want to install all KDE to my test box). Just build devel/dbus-qt4 on 10-CURRENT and run qdbus. It should crash (should you have D-Bus running, which you probably don't have, it would first print all D-Bus connections and then crash on exit). -- Alberto Villa, FreeBSD committer avi...@freebsd.org http://people.FreeBSD.org/~avilla ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
Al 30/04/2012 21:34, En/na Jason Evans ha escrit: On Apr 30, 2012, at 7:13 AM, Gustau Pérez i Querol wrote: the kde team is seeing some strange problems with the new version (4.8.1) of devel/dbus-qt4 with current. It does work with stable. I also suspect that the problem described below is affecting the experimental cinnamon port (an alternative to gnome3, possible replacement of gnome2). The problem happens with both i386 and amd64 with empty /etc/malloc.conf and simple /etc/make.conf. Everything compiled with base gcc (no clang). The kernel was compiled with no debug support, but it can enable if needed. There are reports from avi...@freebsd.org of the same behavior with clang compiled world and kernel and with MALLOC_PRODUCTION=yes. When qdbus starts, it segfauts. The backtrace of the problem with r234769 can be found here: http://pastebin.com/ryBXtqGF. When starting the qdbus daemon by hand in a X+twm session, we see it calls calloc many times and after a fixed number of times segfaults. We see it segfaults at rb_gen (a quite large macro defined at $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h). If the daemon is started by hand, I'm able to skip all the calls qdbus makes to calloc till the one causing the segfault. At that point, at rb_gen, we don't exactly know what is going on or how to debug the macro. Ktrace are available, but we were unable to find anything new from them. With old versions of current before the jemalloc imports (as of March 30th) the daemon segfaulted at malloc.c:2426. With revisions during April 20 to 24th (can be more precise, it was during the jemalloc imports) the daemon segfaulted at malloc_init. Bts are available if needed, and if necessary I can go back to those revision and recompile world+kernel to see its behavior. Any help from freebsd-current@ (perhaps Jason Evans can help us) will be appreciated. Any additional info, like source revisions, can be provided. I would like to stress that the experimental devel/dbus-qt4 works fine with recent stable. The crash is happening in page run management, so there is some pretty bad memory corruption going on by the time of the crash. If I understand you correctly, you have reproduced the crash on a system that does *not* have MALLOC_PRODUCTION defined, which means that none of the assertions in jemalloc caught the problem. Adrian Chadd made the excellent suggestion of trying valgrind; it's likely to point out the problem almost immediately. If that doesn't work, the utrace functionality in malloc may help you figure out what activity has occurred by the time of the crash, and give you a better understanding of what happened to memory around the address that is involved in the crash. Thanks all for your suggestions. It would appear devel/dbus-qt4 has some problems with multithread management, the daemon has a problem which consists in starting a lot of threads and leading it to be finished due to stack exhaustion. Valgrind suggested to increase the stack size, doing so made things even worse; the qdbus daemon was able to spawn even more threads, causing the machine to need more memory than the physically allocated (that is, it started to use swap). So the problem seems to be not related to jemalloc or malloc. As the experimental 4.8.1 devel/dbus-qt4 port works fine in stable, the problem has do to with some differences between head and stable. When we get more hints where the problem is, I will post them in a new thread in freebsd-current@. Anyhow, thanks again for your suggestions! Gus ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
Hi, Please install valgrind and run the program inside valgrind. See what kind of errors it generates. Adrian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: jemalloc: qdbus sigsegv in malloc_init
On Apr 30, 2012, at 7:13 AM, Gustau Pérez i Querol wrote: the kde team is seeing some strange problems with the new version (4.8.1) of devel/dbus-qt4 with current. It does work with stable. I also suspect that the problem described below is affecting the experimental cinnamon port (an alternative to gnome3, possible replacement of gnome2). The problem happens with both i386 and amd64 with empty /etc/malloc.conf and simple /etc/make.conf. Everything compiled with base gcc (no clang). The kernel was compiled with no debug support, but it can enable if needed. There are reports from avi...@freebsd.org of the same behavior with clang compiled world and kernel and with MALLOC_PRODUCTION=yes. When qdbus starts, it segfauts. The backtrace of the problem with r234769 can be found here: http://pastebin.com/ryBXtqGF. When starting the qdbus daemon by hand in a X+twm session, we see it calls calloc many times and after a fixed number of times segfaults. We see it segfaults at rb_gen (a quite large macro defined at $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h). If the daemon is started by hand, I'm able to skip all the calls qdbus makes to calloc till the one causing the segfault. At that point, at rb_gen, we don't exactly know what is going on or how to debug the macro. Ktrace are available, but we were unable to find anything new from them. With old versions of current before the jemalloc imports (as of March 30th) the daemon segfaulted at malloc.c:2426. With revisions during April 20 to 24th (can be more precise, it was during the jemalloc imports) the daemon segfaulted at malloc_init. Bts are available if needed, and if necessary I can go back to those revision and recompile world+kernel to see its behavior. Any help from freebsd-current@ (perhaps Jason Evans can help us) will be appreciated. Any additional info, like source revisions, can be provided. I would like to stress that the experimental devel/dbus-qt4 works fine with recent stable. The crash is happening in page run management, so there is some pretty bad memory corruption going on by the time of the crash. If I understand you correctly, you have reproduced the crash on a system that does *not* have MALLOC_PRODUCTION defined, which means that none of the assertions in jemalloc caught the problem. Adrian Chadd made the excellent suggestion of trying valgrind; it's likely to point out the problem almost immediately. If that doesn't work, the utrace functionality in malloc may help you figure out what activity has occurred by the time of the crash, and give you a better understanding of what happened to memory around the address that is involved in the crash. Jason___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org