Public bug reported: (I wrote this bug report with a lot of "newbie" info regarding all the stuff I did to analyze it, I'm including this info in here so that if other people find this bug maybe someone can learn something from it)
Today my entire X session hung when I was playing some songs in totem. Since the keyboard wasn't responding and the screen wasn't being updated I logged in through ssh from another machine. Once logged in I launched "top" and there I saw "totem" and "pulseaudio" both taking abount 1 percent CPU each constantly. I then killed pulse audio and this for some reason caused totem to start spinning with 100% CPU consumption while slowly also growing in VIRT size (it was both being stuck in a loop and it was also allocating more and more memory). At this point I launched gdb on totem to see why it was taking 100% CPU. Anyway, what I did was that I ran "info threads" inside gdb to see which threads where standing still in blocking functions like "select", "poll" and "pthread_cond_wait" etc. In this case it turned out that only one thread was really alive and I used "thread 3" to move to this thread in gdb. The stack I saw in there looked like this: #0 0x00007fb1251ddf92 in free () from /lib/libc.so.6 #1 0x00007fb12d24f189 in gst_object_get_path_string (object=<value optimized out>) at gstobject.c:1108 #2 0x00007fb12d264de8 in gst_element_message_full (element=0x7fb10c01b000, type=GST_MESSAGE_ERROR, domain=3398, code=1, text=0x163c880 "Disconnected: Connection terminated", debug=0x0, file=0x7fb11ca4a2a5 "pulsesink.c", function=0x7fb11ca4a950 "gst_pulsesink_write", line=536) at gstelement.c:1665 #3 0x00007fb11ca460bd in gst_pulsesink_write (asink=<value optimized out>, data=0x7fb114148020, length=<value optimized out>) at pulsesink.c:556 #4 0x00007fb129ade348 in ?? () from /usr/lib/libgstaudio-0.10.so.0 #5 0x00007fb1255372a4 in ?? () from /usr/lib/libglib-2.0.so.0 #6 0x00007fb12933b3ea in start_thread () from /lib/libpthread.so.0 #7 0x00007fb125248c6d in clone () from /lib/libc.so.6 #8 0x0000000000000000 in ?? () It seems that clone calls start_thread which then calls some unknown function "??" inside libglib-2.0.so.0 and that this unknown function "??" calls another unknown function inside the library libgstaudio-0.10.so.0 Clearly this was going to be hard with a lot of unknown functions in the stacktrace so I quit gdb and I installed debug symbols for these libraries. To install debug libraries I needed to figure out what package installed the file libgstaudio-0.10.so.0 onto my system. For this I used the following command: dpkg -S libgstaudio-0.10.so.0 It prints that this particular file was installed through the libgstreamer-plugins-base0.10-0 package and thus I need to find debug symbols for this package. While trying to debug something else a long time ago I had previous completed step 1) from this guide (see below). And this meant that I had all the blah-dbgsym packages available already and I could install debug symbols using normal "apt-get install something-dbgsym". Guide how to make it possible to install yadayada-dbgsym packages in ubuntu: https://wiki.ubuntu.com/DebuggingProgramCrash So anyway I just typed "sudo apt-get install libgstreamer-plugins-base" and then I pressed TAB too see what the TAB completion options was. It printed stuff like: libgstreamer-plugins-base0.10-0 libgstreamer-plugins-base0.10-0-dbgsym libgstreamer-plugins-base0.10-dev Clearly, the package that I was looking for was "libgstreamer-plugins- base0.10-0-dbgsym" so I installed this package. Then I did the same thing for libglib-2.0.so.0 which led to to install the libglib2.0-0-dbgsym package. By this time the VIRT size was already like 9GB and I only got 8GB so the system started to swap to the harddrive and there was just no way I could save a core and attach to the bugzilla report etc :-) Anyway, I quickly launched gdb again. I looked at "info threads" and I tried to find the thread that was running crazy in a loop. I used "thread N" to switch to it. Now, because I had debug symbols I got this really nice stacktrace with no unknown function names: #0 0x00007fb1251db102 in _IO_str_overflow_internal () from /lib/libc.so.6 #1 0x00007fb1251da116 in _IO_default_xsputn_internal () from /lib/libc.so.6 #2 0x00007fb1251acb1d in vfprintf () from /lib/libc.so.6 #3 0x00007fb125261340 in __vasprintf_chk () from /lib/libc.so.6 #4 0x00007fb125542deb in IA__g_vasprintf (string=0x43dacce0, format=<value optimized out>, args=<value optimized out>) at /usr/include/bits/stdio2.h:199 #5 0x00007fb12552f6e1 in IA__g_strdup_printf (format=0xfbad8000 <Address 0xfbad8000 out of bounds>) at /build/buildd/glib2.0-2.18.1/glib/gstrfuncs.c:218 #6 0x00007fb12d264e08 in gst_element_message_full (element=0x7fb10c01b000, type=GST_MESSAGE_ERROR, domain=3398, code=1, text=<value optimized out>, debug=0x0, file=0x7fb11ca4a2a5 "pulsesink.c", function=0x7fb11ca4a950 "gst_pulsesink_write", line=536) at gstelement.c:1670 #7 0x00007fb11ca460bd in gst_pulsesink_write (asink=<value optimized out>, data=0x7fb114148020, length=<value optimized out>) at pulsesink.c:556 #8 0x00007fb129ade348 in audioringbuffer_thread_func (buf=0x1bc9440) at gstaudiosink.c:226 #9 0x00007fb1255372a4 in g_thread_create_proxy (data=0x7fb1141d3780) at /build/buildd/glib2.0-2.18.1/glib/gthread.c:635 #10 0x00007fb12933b3ea in start_thread () from /lib/libpthread.so.0 #11 0x00007fb125248c6d in clone () from /lib/libc.so.6 #12 0x0000000000000000 in ?? () I also did a more in depth "bt full" and I found three stack frames that was more interesting than the others. Look here: #6 0x00007fb12d264e08 in gst_element_message_full (element=0x7fb10c01b000, type=GST_MESSAGE_ERROR, domain=3398, code=1, text=<value optimized out>, debug=0x0, file=0x7fb11ca4a2a5 "pulsesink.c", function=0x7fb11ca4a950 "gst_pulsesink_write", line=536) at gstelement.c:1670 gerror = <value optimized out> name = <value optimized out> sent_text = (gchar *) 0x163c880 "Disconnected: Connection terminated" sent_debug = <value optimized out> message = <value optimized out> __PRETTY_FUNCTION__ = "gst_element_message_full" #7 0x00007fb11ca460bd in gst_pulsesink_write (asink=<value optimized out>, data=0x7fb114148020, length=<value optimized out>) at pulsesink.c:556 __txt = (gchar *) 0x163c880 "Disconnected: Connection terminated" __dbg = (gchar *) 0x0 l = <value optimized out> pulsesink = <value optimized out> sum = <value optimized out> __FUNCTION__ = "gst_pulsesink_write" #8 0x00007fb129ade348 in audioringbuffer_thread_func (buf=0x1bc9440) at gstaudiosink.c:226 written = 105 left = 3528 len = 3528 readptr = (guint8 *) 0x7fb114148020 "" readseg = 4 sink = (GstAudioSink *) 0x7fb10c01b000 writefunc = (WriteFunc) 0x7fb11ca45ee0 <gst_pulsesink_write> __FUNCTION__ = "audioringbuffer_thread_func" It looks like the program was calling gst_pulsesink_write to write some data into pulse audio, however there was some error condition which caused a call to gst_element_message_full. Since I just abruptly _killed_ pulseaudio while totem was still running (and using it) so that might have something to do with it right? :P Anyway, this is theory is confirmed by the local variables in the stackframe for gst_pulsesink_write. Specifically it says: sent_text = (gchar *) 0x163c880 "Disconnected: Connection terminated" How the question becomes: if there is a reasonable error message coming back saying that the connection to pulseaudio has been terminated... then why would totem just keep trying to write data to pulseaudio? Seems like pretty insane behavior, instead totem should stop trying to send audio data to pulse and instead it should show some error message. Anyway, to continue debugging I put breakpoints on some of these functions to figure out how deep in the stack the loop was. For example I used "break function_name" put breakpoints on: * gst_element_message_full * gst_pulsesink_write * audioringbuffer_thread_func It just seemed unlikely that there should be an infinite loop bug inside libc.so so I didn't bother with those functions. Anyway, then I resumed gdb by typing "c" and I immediately hit a breakpoint. I continued it again and again and basically the pattern was that both the functions gst_element_message_full and gst_pulsesink_write were being continuously hit. However, I never hit the breakpoint on "audioringbuffer_thread_func" so evidently this function didn't exit. At this point it was clear that a loop inside audioringbuffer_thread_func was responsible for continously calling gst_pulsesink_write which in turn continously called gst_element_message_full. Time to look at some code. The first thing in the code I wanted to investigate was the function called audioringbuffer_thread_func which according to my theory should have an infinite loop. I googled for gstreamer and found it's homepage. Went to the developer section and used cvs to fetch the code for "gstreamer". I went into the directory created by cvs and I did "grep -r audioringbuffer_thread_func *" which actually found nothing :-( My guess was that this code probably was located in the plugin and from the two first stacktraces I remembered that this function was actually in /usr/lib/libgstaudio-0.10.so.0 which meant that it was in the libgstreamer-plugins-base0.10-0 package. Luckily gstreamer lists several modules inside it's CVS and one is the for the base plugins. So I ended up re-running CVS pointing it to the gst-plugins-base module and then I got a new directory full of source code. When I re-ran the command "grep -r audioringbuffer_thread_func *" on this code it found a match inside gstaudiosink.c (which is of course were gdb said we'd find it even though gdb was not smart enough to say in what folder and not in what source repository I should look for it). Anyway, I fired up my trusty old gedit and boom there was the code for audioringbuffer_thread_func(). After look at the code for audioringbuffer_thread_func() for a while I noted that there wasn't any call to gst_pulsesink_write() in there. Still, gdb clearly said that the function audioringbuffer_thread_func() had called into the function gst_pulsesink_write(). At this point the problem could have been that I had the wrong version of the source file because the source file might have been changed between now and whenever ubuntu took their copy of the source and created the package for it. If I had used the "apt-get source blah" command I would have gotten the exact right source for the relevant package but why make life too easy. However, there is actually a call to writefunc() inside the function audioringbuffer_thread_func() and a few lines above that it's seen that writefunc is just a function pointer so it's pretty likely. That the function pointer writefunc pointed to the function gst_pulsesink_write() when my code was running. From looking at audioringbuffer_thread_func() it's also clear that there is some kind of error handling in here because if the number of written bytes is less than zero then execution will at least break out of the inner while loop. Okay, fair enough. What about the code inside gst_pulsesink_write()? I ran "grep -r gst_pulsesink_write *" which returned nothing both in the gst-plugins-base and also in gstreamer itself. Probably the pulse audio plugin for gstreamer was not included in the main source repository nor in the base plugins. So I went off to google for the source repository that held the gst_pulsesink_write code and eventually managed to figure out that it's in the gst-plugins-good package. There is probably some smarter way to figure that out though. For this code I found that "grep -r gst_pulsesink_write *" returns a match inside the file pulsesink.c which is exactly where gdb said I would find it. So, I looked at gst_pulsesink_write(). It has a while loop nested inside a for loop. There is also some macros used, called CHECK_DEAD_GOTO and GST_ELEMENT_ERROR. At this point I needed to know what the definition of CHECK_DEAD_GOTO was but the editor that I was using (gedit) does not have a right-click function for "Go to definition" (maybe there is some gedit plugin for it, I dunno). So instead, what I did was that I dropped into a terminal, navigated to the cvs directory for gst-plugins-good and then I ran "cscope -R". This is a very old but useful text mode tool for doing "Go to definition". By pressing TAB it's possible to toggle between the upper and the lower panels. I just pressed down arrow once so that I ended up in the "Find global definition" filed and then I entered CHECK_DEAD_GOTO. This shows a list of all the files where CHECK_DEAD_GOTO is defined and I wanted the definition from pulsesink.c of course. Now it seems that basically CHECK_DEAD_GOTO just checks if there was an error and if there was then ut jumps to the designated label which in this case was "unlock_and_fail". Just from the name it actually sounds like "check dead" could be the error that you end up with when pulseaudio it dead. So what happens if "check dead" is raised and the code jumps to "unlock_and_fail" ? Well, the code just unlocks the pulseaudio mainloop and then it returns zero. ehm, wait what?! It returns zero? Hmm, but won't that make the parent function _continue_ calling gst_pulsesink_write again and again like there is no tomorrow? .... smells like some kind of misunderstanding? Okay so now we sort of understand what causes the bug. Now we need to talk to some gstreamer dev and ask the the return code policy here actually is. What should the gstreamer plugin return to make gstreamer understand that there is no point in sending more data to the plugin? Looking at what the other plugins do would also be a good starting point. Another thing is that we didn't actually understand why X.org sort of hung in the first place but at least we found out why totem was frozen taking 100% CPU. ** Affects: gst-plugins-good Importance: Unknown Status: Unknown ** Affects: gst-plugins-good0.10 (Ubuntu) Importance: Undecided Status: New ** Bug watch added: GNOME Bug Tracker #554771 http://bugzilla.gnome.org/show_bug.cgi?id=554771 ** Also affects: gst-plugins-good via http://bugzilla.gnome.org/show_bug.cgi?id=554771 Importance: Unknown Status: Unknown -- totem spins CPU 100% while allocating (infinitely) because pulseaudio process is not available https://bugs.launchpad.net/bugs/277296 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs