Public bug reported:

(I wrote this bug report with a lot of "newbie" info regarding all the
stuff I did to analyze it, I'm including this info in here so that if
other people find this bug maybe someone can learn something from it)

Today my entire X session hung when I was playing some songs in totem. Since 
the keyboard wasn't responding and the screen wasn't being updated I logged in 
through ssh from another machine. Once logged in I launched "top" and there I 
saw "totem" and "pulseaudio" both taking abount 1 percent CPU each constantly. 
I then killed pulse audio and this for some reason caused totem to start 
spinning with 100% CPU
consumption while slowly also growing in VIRT size (it was both being stuck in 
a loop and it was also allocating more and more memory). At this point I 
launched gdb on totem to see why it was taking 100% CPU. 

Anyway, what I did was that I ran "info threads" inside gdb to see which
threads where standing still in blocking functions like "select", "poll"
and "pthread_cond_wait" etc. In this case it turned out that only one
thread was really alive and I used "thread 3" to move to this thread in
gdb. The stack I saw in there looked like this:

#0  0x00007fb1251ddf92 in free () from /lib/libc.so.6
#1  0x00007fb12d24f189 in gst_object_get_path_string (object=<value optimized 
out>) at gstobject.c:1108
#2  0x00007fb12d264de8 in gst_element_message_full (element=0x7fb10c01b000, 
type=GST_MESSAGE_ERROR, domain=3398, code=1, 
    text=0x163c880 "Disconnected: Connection terminated", debug=0x0, 
file=0x7fb11ca4a2a5 "pulsesink.c", 
    function=0x7fb11ca4a950 "gst_pulsesink_write", line=536) at 
gstelement.c:1665
#3  0x00007fb11ca460bd in gst_pulsesink_write (asink=<value optimized out>, 
data=0x7fb114148020, length=<value optimized out>) at pulsesink.c:556
#4  0x00007fb129ade348 in ?? () from /usr/lib/libgstaudio-0.10.so.0
#5  0x00007fb1255372a4 in ?? () from /usr/lib/libglib-2.0.so.0
#6  0x00007fb12933b3ea in start_thread () from /lib/libpthread.so.0
#7  0x00007fb125248c6d in clone () from /lib/libc.so.6
#8  0x0000000000000000 in ?? ()

It seems that clone calls start_thread which then calls some unknown
function "??" inside libglib-2.0.so.0 and that this unknown function
"??" calls another unknown function inside the library
libgstaudio-0.10.so.0 Clearly this was going to be hard with a lot of
unknown functions in the stacktrace so I quit gdb and I installed debug
symbols for these libraries.

To install debug libraries I needed to figure out what package installed
the file libgstaudio-0.10.so.0 onto my system. For this I used the
following command:

dpkg -S libgstaudio-0.10.so.0

It prints that this particular file was installed through the
libgstreamer-plugins-base0.10-0 package and thus I need to find debug
symbols for this package.

While trying to debug something else a long time ago I had previous completed 
step 1) from this guide (see below). And this meant that I had all the 
blah-dbgsym packages available already and I could install debug symbols using 
normal "apt-get install something-dbgsym". Guide how to make it possible to 
install yadayada-dbgsym packages in ubuntu:
https://wiki.ubuntu.com/DebuggingProgramCrash

So anyway I just typed "sudo apt-get install libgstreamer-plugins-base" and 
then I pressed TAB too see what the TAB completion options was. It printed 
stuff like:
libgstreamer-plugins-base0.10-0         libgstreamer-plugins-base0.10-0-dbgsym  
libgstreamer-plugins-base0.10-dev

Clearly, the package that I was looking for was "libgstreamer-plugins-
base0.10-0-dbgsym" so I installed this package. Then I did the same
thing for libglib-2.0.so.0 which led to to install the
libglib2.0-0-dbgsym package.

By this time the VIRT size was already like 9GB and I only got 8GB so
the system started to swap to the harddrive and there was just no way I
could save a core and attach to the bugzilla report etc :-)

Anyway, I quickly launched gdb again. I looked at "info threads" and I
tried to find the thread that was running crazy in a loop. I used
"thread N" to switch to it. Now, because I had debug symbols I got this
really nice stacktrace with no unknown function names:

#0  0x00007fb1251db102 in _IO_str_overflow_internal () from /lib/libc.so.6
#1  0x00007fb1251da116 in _IO_default_xsputn_internal () from /lib/libc.so.6
#2  0x00007fb1251acb1d in vfprintf () from /lib/libc.so.6
#3  0x00007fb125261340 in __vasprintf_chk () from /lib/libc.so.6
#4  0x00007fb125542deb in IA__g_vasprintf (string=0x43dacce0, format=<value 
optimized out>, args=<value optimized out>)
    at /usr/include/bits/stdio2.h:199
#5  0x00007fb12552f6e1 in IA__g_strdup_printf (format=0xfbad8000 <Address 
0xfbad8000 out of bounds>)
    at /build/buildd/glib2.0-2.18.1/glib/gstrfuncs.c:218
#6  0x00007fb12d264e08 in gst_element_message_full (element=0x7fb10c01b000, 
type=GST_MESSAGE_ERROR, domain=3398, code=1, text=<value optimized out>, 
    debug=0x0, file=0x7fb11ca4a2a5 "pulsesink.c", function=0x7fb11ca4a950 
"gst_pulsesink_write", line=536) at gstelement.c:1670
#7  0x00007fb11ca460bd in gst_pulsesink_write (asink=<value optimized out>, 
data=0x7fb114148020, length=<value optimized out>) at pulsesink.c:556
#8  0x00007fb129ade348 in audioringbuffer_thread_func (buf=0x1bc9440) at 
gstaudiosink.c:226
#9  0x00007fb1255372a4 in g_thread_create_proxy (data=0x7fb1141d3780) at 
/build/buildd/glib2.0-2.18.1/glib/gthread.c:635
#10 0x00007fb12933b3ea in start_thread () from /lib/libpthread.so.0
#11 0x00007fb125248c6d in clone () from /lib/libc.so.6
#12 0x0000000000000000 in ?? ()

I also did a more in depth "bt full" and I found three stack frames that
was more interesting than the others. Look here:

#6  0x00007fb12d264e08 in gst_element_message_full (element=0x7fb10c01b000, 
type=GST_MESSAGE_ERROR, domain=3398, code=1, text=<value optimized out>, 
    debug=0x0, file=0x7fb11ca4a2a5 "pulsesink.c", function=0x7fb11ca4a950 
"gst_pulsesink_write", line=536) at gstelement.c:1670
        gerror = <value optimized out>
        name = <value optimized out>
        sent_text = (gchar *) 0x163c880 "Disconnected: Connection terminated"
        sent_debug = <value optimized out>
        message = <value optimized out>
        __PRETTY_FUNCTION__ = "gst_element_message_full"
#7  0x00007fb11ca460bd in gst_pulsesink_write (asink=<value optimized out>, 
data=0x7fb114148020, length=<value optimized out>) at pulsesink.c:556
        __txt = (gchar *) 0x163c880 "Disconnected: Connection terminated"
        __dbg = (gchar *) 0x0
        l = <value optimized out>
        pulsesink = <value optimized out>
        sum = <value optimized out>
        __FUNCTION__ = "gst_pulsesink_write"
#8  0x00007fb129ade348 in audioringbuffer_thread_func (buf=0x1bc9440) at 
gstaudiosink.c:226
        written = 105
        left = 3528
        len = 3528
        readptr = (guint8 *) 0x7fb114148020 ""
        readseg = 4
        sink = (GstAudioSink *) 0x7fb10c01b000
        writefunc = (WriteFunc) 0x7fb11ca45ee0 <gst_pulsesink_write>
        __FUNCTION__ = "audioringbuffer_thread_func"

It looks like the program was calling gst_pulsesink_write to write some
data into pulse audio, however there was some error condition which
caused a call to gst_element_message_full. Since I just abruptly
_killed_ pulseaudio while totem was still running (and using it) so that
might have something to do with it right? :P Anyway, this is theory is
confirmed by the local variables in the stackframe for
gst_pulsesink_write. Specifically it says:

sent_text = (gchar *) 0x163c880 "Disconnected: Connection terminated"

How the question becomes: if there is a reasonable error message coming
back saying that the connection to pulseaudio has been terminated...
then why would totem just keep trying to write data to pulseaudio? Seems
like pretty insane behavior, instead totem should stop trying to send
audio data to pulse and instead it should show some error message.

Anyway, to continue debugging I put breakpoints on some of these
functions to figure out how deep in the stack the loop was. For example
I used "break function_name" put breakpoints on:

* gst_element_message_full
* gst_pulsesink_write
* audioringbuffer_thread_func

It just seemed unlikely that there should be an infinite loop bug inside
libc.so so I didn't bother with those functions. Anyway, then I resumed
gdb by typing "c" and I immediately hit a breakpoint. I continued it
again and again and basically the pattern was that both the functions
gst_element_message_full and gst_pulsesink_write were being continuously
hit. However, I never hit the breakpoint on
"audioringbuffer_thread_func" so evidently this function didn't exit.

At this point it was clear that a loop inside
audioringbuffer_thread_func was responsible for continously calling
gst_pulsesink_write which in turn continously called
gst_element_message_full. Time to look at some code.

The first thing in the code I wanted to investigate was the function
called audioringbuffer_thread_func which according to my theory should
have an infinite loop. I googled for gstreamer and found it's homepage.
Went to the developer section and used cvs to fetch the code for
"gstreamer". I went into the directory created by cvs and I did "grep -r
audioringbuffer_thread_func *" which actually found nothing :-( My guess
was that this code probably was located in the plugin and from the two
first stacktraces I remembered that this function was actually in
/usr/lib/libgstaudio-0.10.so.0 which meant that it was in the
libgstreamer-plugins-base0.10-0 package. Luckily gstreamer lists several
modules inside it's CVS and one is the for the base plugins. So I ended
up re-running CVS pointing it to the gst-plugins-base module and then I
got a new directory full of source code. When I re-ran the command "grep
-r audioringbuffer_thread_func *" on this code it found a match inside
gstaudiosink.c (which is of course were gdb said we'd find it even
though gdb was not smart enough to say in what folder and not in what
source repository I should look for it). Anyway, I fired up my trusty
old gedit and boom there was the code for audioringbuffer_thread_func().

After look at the code for audioringbuffer_thread_func() for a while I
noted that there wasn't any call to gst_pulsesink_write() in there.
Still, gdb clearly said that the function audioringbuffer_thread_func()
had called into the function gst_pulsesink_write(). At this point the
problem could have been that I had the wrong version of the source file
because the source file might have been changed between now and whenever
ubuntu took their copy of the source and created the package for it. If
I had used the "apt-get source blah" command I would have gotten the
exact right source for the relevant package but why make life too easy.

However, there is actually a call to writefunc() inside the function
audioringbuffer_thread_func() and a few lines above that it's seen that
writefunc is just a function pointer so it's pretty likely. That the
function pointer writefunc pointed to the function gst_pulsesink_write()
when my code was running. From looking at audioringbuffer_thread_func()
it's also clear that there is some kind of error handling in here
because if the number of written bytes is less than zero then execution
will at least break out of the inner while loop.

Okay, fair enough. What about the code inside gst_pulsesink_write()? I
ran "grep -r gst_pulsesink_write *" which returned nothing both in the
gst-plugins-base and also in gstreamer itself. Probably the pulse audio
plugin for gstreamer was not included in the main source repository nor
in the base plugins. So I went off to google for the source repository
that held the gst_pulsesink_write code and eventually managed to figure
out that it's in the gst-plugins-good package. There is probably some
smarter way to figure that out though. For this code I found that "grep
-r gst_pulsesink_write *" returns a match inside the file pulsesink.c
which is exactly where gdb said I would find it.

So, I looked at gst_pulsesink_write(). It has a while loop nested inside
a for loop. There is also some macros used, called CHECK_DEAD_GOTO and
GST_ELEMENT_ERROR.

At this point I needed to know what the definition of CHECK_DEAD_GOTO
was but the editor that I was using (gedit) does not have a right-click
function for "Go to definition" (maybe there is some gedit plugin for
it, I dunno). So instead, what I did was that I dropped into a terminal,
navigated to the cvs directory for gst-plugins-good and then I ran
"cscope -R". This is a very old but useful text mode tool for doing "Go
to definition". By pressing TAB it's possible to toggle between the
upper and the lower panels. I just pressed down arrow once so that I
ended up in the "Find global definition" filed and then I entered
CHECK_DEAD_GOTO.  This shows a list of all the files where
CHECK_DEAD_GOTO is defined and I wanted the definition from pulsesink.c
of course. Now it seems that basically CHECK_DEAD_GOTO just checks if
there was an error and if there was then ut jumps to the designated
label which in this case was "unlock_and_fail". Just from the name it
actually sounds like "check dead" could be the error that you end up
with when pulseaudio it dead.

So what happens if "check dead" is raised and the code jumps to
"unlock_and_fail" ?   Well, the code just unlocks the pulseaudio
mainloop and then it returns zero.

ehm, wait what?! It returns zero? Hmm, but won't that make the parent
function _continue_ calling gst_pulsesink_write again and again like
there is no tomorrow? .... smells like some kind of misunderstanding?

Okay so now we sort of understand what causes the bug. Now we need to
talk to some gstreamer dev and ask the the return code policy here
actually is. What should the gstreamer plugin return to make gstreamer
understand that there is no point in sending more data to the plugin?
Looking at what the other plugins do would also be a good starting
point.

Another thing is that we didn't actually understand why X.org sort of
hung in the first place but at least we found out why totem was frozen
taking 100% CPU.

** Affects: gst-plugins-good
     Importance: Unknown
         Status: Unknown

** Affects: gst-plugins-good0.10 (Ubuntu)
     Importance: Undecided
         Status: New

** Bug watch added: GNOME Bug Tracker #554771
   http://bugzilla.gnome.org/show_bug.cgi?id=554771

** Also affects: gst-plugins-good via
   http://bugzilla.gnome.org/show_bug.cgi?id=554771
   Importance: Unknown
       Status: Unknown

-- 
totem spins CPU 100% while allocating (infinitely) because pulseaudio process 
is not available
https://bugs.launchpad.net/bugs/277296
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to