Re: More 16 vs 24 bpp profiling

2007-11-17 Thread Bernardo Innocenti
On 09/11/2007 01:32 PM, Bernardo Innocenti wrote:

> The 16bpp codepath has to be broken somewhere if
> it takes twice the time to copy half the bits :-)

It strikes me that we don't see any time spent in
pixman_fill_mmx(), even though it's not inlinable.

For some reason, pixman thinks it cannot accelerate
16bpp fills with MMX, at least on the Geode.

Might be worth investigating...

-- 
   //  Bernardo Innocenti - http://www.codewiz.org/
 \X/ One Laptop Per Child - http://www.laptop.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-12 Thread Bernardo Innocenti
On 09/12/2007 11:40 AM, Marco Pesenti Gritti wrote:
> On 9/12/07, Jordan Crouse <[EMAIL PROTECTED]> wrote:
>> On 12/09/07 17:21 +0200, Marco Pesenti Gritti wrote:
>>> Yeah this the current check:
>>>
>>> (strcmp(vendor, "AuthenticAMD") == 0 ||
>>>  strcmp(vendor, "Geode by NSC") == 0))
>>> I think Jordan mentioned that the LX is not using "Geode by NSC" anymore.
>> We should be using AuthenticAMD now.  Check /proc/cpuinfo to make sure.
> 
> Yeah, that's correct.
> 
> I guess it would still be worth to make sure mmx is actually enabled
> in cairo, other parts of that check might be failing...

Visually, the pixman source looks right.

I still had no time to install a verbose version of it to
make sure it's doing the right thing.  Anybody feel free
to beat me on time (and publish their findings here).

-- 
   //  Bernardo Innocenti - http://www.codewiz.org/
 \X/ One Laptop Per Child - http://www.laptop.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-12 Thread Marco Pesenti Gritti
On 9/12/07, Jordan Crouse <[EMAIL PROTECTED]> wrote:
> On 12/09/07 17:21 +0200, Marco Pesenti Gritti wrote:
> > On 9/12/07, Dan Williams <[EMAIL PROTECTED]> wrote:
> > > On Tue, 2007-09-11 at 14:19 -0400, Bernardo Innocenti wrote:
> > > > On 09/11/2007 01:32 PM, Bernardo Innocenti wrote:
> > > >
> > > > > The 16bpp codepath has to be broken somewhere if
> > > > > it takes twice the time to copy half the bits :-)
> > > >
> > > > It strikes me that we don't see any time spent in
> > > > pixman_fill_mmx(), even though it's not inlinable.
> > > >
> > > > For some reason, pixman thinks it cannot accelerate
> > > > 16bpp fills with MMX, at least on the Geode.
> > > >
> > > > Might be worth investigating...
> > >
> > > We did have to patch the MMX check in pixman long ago, maybe that got
> > > broken somehow?  There were actually two, an update to the cpu flags and
> > > also a strcmp() on the processor ID that had to be fixed to get pixman
> > > to detect MMX capability on the geode.
> > >
> >
> > Yeah this the current check:
> >
> > (strcmp(vendor, "AuthenticAMD") == 0 ||
> >  strcmp(vendor, "Geode by NSC") == 0))
>
> > I think Jordan mentioned that the LX is not using "Geode by NSC" anymore.
>
> We should be using AuthenticAMD now.  Check /proc/cpuinfo to make sure.
>

Yeah, that's correct.

I guess it would still be worth to make sure mmx is actually enabled
in cairo, other parts of that check might be failing...

Marco
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-12 Thread Jordan Crouse
On 12/09/07 17:21 +0200, Marco Pesenti Gritti wrote:
> On 9/12/07, Dan Williams <[EMAIL PROTECTED]> wrote:
> > On Tue, 2007-09-11 at 14:19 -0400, Bernardo Innocenti wrote:
> > > On 09/11/2007 01:32 PM, Bernardo Innocenti wrote:
> > >
> > > > The 16bpp codepath has to be broken somewhere if
> > > > it takes twice the time to copy half the bits :-)
> > >
> > > It strikes me that we don't see any time spent in
> > > pixman_fill_mmx(), even though it's not inlinable.
> > >
> > > For some reason, pixman thinks it cannot accelerate
> > > 16bpp fills with MMX, at least on the Geode.
> > >
> > > Might be worth investigating...
> >
> > We did have to patch the MMX check in pixman long ago, maybe that got
> > broken somehow?  There were actually two, an update to the cpu flags and
> > also a strcmp() on the processor ID that had to be fixed to get pixman
> > to detect MMX capability on the geode.
> >
> 
> Yeah this the current check:
> 
> (strcmp(vendor, "AuthenticAMD") == 0 ||
>  strcmp(vendor, "Geode by NSC") == 0))

> I think Jordan mentioned that the LX is not using "Geode by NSC" anymore.

We should be using AuthenticAMD now.  Check /proc/cpuinfo to make sure.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-12 Thread Marco Pesenti Gritti
On 9/12/07, Dan Williams <[EMAIL PROTECTED]> wrote:
> On Tue, 2007-09-11 at 14:19 -0400, Bernardo Innocenti wrote:
> > On 09/11/2007 01:32 PM, Bernardo Innocenti wrote:
> >
> > > The 16bpp codepath has to be broken somewhere if
> > > it takes twice the time to copy half the bits :-)
> >
> > It strikes me that we don't see any time spent in
> > pixman_fill_mmx(), even though it's not inlinable.
> >
> > For some reason, pixman thinks it cannot accelerate
> > 16bpp fills with MMX, at least on the Geode.
> >
> > Might be worth investigating...
>
> We did have to patch the MMX check in pixman long ago, maybe that got
> broken somehow?  There were actually two, an update to the cpu flags and
> also a strcmp() on the processor ID that had to be fixed to get pixman
> to detect MMX capability on the geode.
>

Yeah this the current check:

(strcmp(vendor, "AuthenticAMD") == 0 ||
 strcmp(vendor, "Geode by NSC") == 0))

I think Jordan mentioned that the LX is not using "Geode by NSC" anymore.

Marco
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-12 Thread Dan Williams
On Tue, 2007-09-11 at 14:19 -0400, Bernardo Innocenti wrote:
> On 09/11/2007 01:32 PM, Bernardo Innocenti wrote:
> 
> > The 16bpp codepath has to be broken somewhere if
> > it takes twice the time to copy half the bits :-)
> 
> It strikes me that we don't see any time spent in
> pixman_fill_mmx(), even though it's not inlinable.
> 
> For some reason, pixman thinks it cannot accelerate
> 16bpp fills with MMX, at least on the Geode.
> 
> Might be worth investigating...

We did have to patch the MMX check in pixman long ago, maybe that got
broken somehow?  There were actually two, an update to the cpu flags and
also a strcmp() on the processor ID that had to be fixed to get pixman
to detect MMX capability on the geode.

Dan


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-11 Thread Bernardo Innocenti
On 09/11/2007 01:32 PM, Bernardo Innocenti wrote:

> The 16bpp codepath has to be broken somewhere if
> it takes twice the time to copy half the bits :-)

It strikes me that we don't see any time spent in
pixman_fill_mmx(), even though it's not inlinable.

For some reason, pixman thinks it cannot accelerate
16bpp fills with MMX, at least on the Geode.

Might be worth investigating...

-- 
   //  Bernardo Innocenti - http://www.codewiz.org/
 \X/ One Laptop Per Child - http://www.laptop.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-11 Thread Bernardo Innocenti
On 09/11/2007 01:44 PM, Jordan Crouse wrote:
> On 11/09/07 13:03 -0400, Bernardo Innocenti wrote:
>> NOTE ALEPH: I think we stopped development in the xf86-amd-devel
>> repo some time ago.  The correct driver nowadays would be the
>> fd.o one.  Jordan, do you confirm this?
> 
> I cannot.  OLPC should always and forever more use the xf86-amd-devel 
> tree.  The fd.o tree is for the rest of the world.

Sorry, then.  You told me that some time ago, but now from
the git logs I got the impression that development was
happening on the fd.o tree only.

This reminds me I still have to test Ajax's dcon hack.
At this time, I still have several higher priority tasks
assigned to me.  So, would you or aleph like to do give
it a shot?

Last time I tried, there was display corruption.  The
logic for checking damage seems weird, worth checking.

Details are here:
  https://dev.laptop.org/ticket/1412

-- 
   //  Bernardo Innocenti - http://www.codewiz.org/
 \X/ One Laptop Per Child - http://www.laptop.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-11 Thread Dan Williams
On Tue, 2007-09-11 at 13:03 -0400, Bernardo Innocenti wrote:
> (adding xorg-devel@ on Cc)
> 
> On 09/11/2007 11:29 AM, Jordan Crouse wrote:
> > On 11/09/07 13:05 +0200, Stefano Fedrigo wrote:
> >> I've done some more profiling on the 16 vs. 24 bpp issue.
> >> This time I used this test:
> >> https://dev.laptop.org/git?p=sugar;a=blob;f=tests/graphics/hipposcalability.py
> >>
> >> A simple speed test: I measured the time required to scroll down and up
> >> one time all the generated list.  Not extremely accurate, but I repeated 
> >> the
> >> test a few times with consistent results (+- 0.5 secs).  Mean times:
> >>
> >> xserver 1.4
> >> 16 bpp: 37.9
> >> 24 bpp: 40.7
> >>
> >> xserver 1.3
> >> 16: 46.4
> >> 24: 50.1
> >>
> >> At 24 bpp we're a little slower.  1.3 is 20% slower than 1.4. The pixman
> >> migration patch makes the difference: 1.3 spend most of that 20% in 
> >> memcpy().
> >>
> >> The oprofile reports are from xserver 1.4.  I don't see much difference
> >> between 16 and 24, except that at 24 bpp, less time is spent in pixman and 
> >> more
> >> in amd_drv.  At 16 bpp pixman_fill() takes twice the time.
> >>
> >> Unfortunately without a working callgraph it's not very clear to me what's
> >> happening in amd_drv.  At 24bpp gp_wait_until_idle() takes twice the 
> >> time...
> > 
> > What can we do to fix this?  I would really like to know who is calling
> > gp_wait_until_idle().
> 
> I think the invocation in lx_get_source_color() can safely
> go away, as exaGetPixmapFirstPixel() has always done
> correct locking even in 1.3.
> 
> But because the1x1 source pixmap used as solid color is
> still being uploaded to the framebuffer, I'd expect
> exaGetPixmapFirstPixel() to indirectly call the driver
> download hook and, thus, stall the GPU anyway.
> 
> If this tiny pixmap was at least reused, the second
> time it would be already in system memory.  And it
> seems that Cairo is trying to cache patterns in the CR.
> 
> Problem is, many GTK widgets like to create a new CR on every
> repaint event, thus rendering the cache quite effective for a
> typical workload of a window with several small widgets in it.
> But I've stumbled in the caching code a few months ago while
> debugging something else, so I may very well be mistaken.
> 
> On git's master, Michel Dänzer has recently been pushing
> a long run of EXA performance patches.  I've had only a
> quick glance, but it seems they may cure 
> 
> $ git-log 8cfcf9..e8093e | git-shortlog
> Michel Dänzer (14):
>   EXA: Track valid bits in Sys and FB separately.
>   Add DamagePendingRegion.
>   EXA: Support partial migration of pixmap contents between Sys and FB.
>   EXA: Hide pixmap pointer outside of exaPrepare/FinishAccess whenever 
> possible.
>   EXA: Improvements for trapezoids and triangles.
>   EXA: exaImageGlyphBlt improvements.
>   EXA: Improvements for 1x1 pixmaps.
>   EXA: RENDER improvements.
>   EXA: Remove superfluous manual damage tracking.
>   EXA: exaGetImage improvements.
>   EXA: exa(Shm)PutImage improvements.
>   EXA: Use exaShmPutImage for pushing glyphs to scratch pixmap in 
> exaGlyphs.
>   EXA: exaFillRegion{Solid,Tiled} improvements.
>   EXA: Exclude bits that will be overwritten from migration in 
> exaCopyNtoN.
> 
> 
> Aleph, I guess it may be useful to re-run tests after applying
> these patches.  In case merging them on the 1.4 branch happens
> to be difficult, using the code from master should be ok.
> They don't seem to have diverged too much, yet.
> 
> 
> > Also, I think we're spending way too much time in
> > gp_color_bitmap_to_screen_blt() - is there any way we
> > can get more indepth profiling in that one function?
> 
> Good idea!
> 
> Meanwhile, I looked at gp_color_bitmap_to_screen_blt() and it
> seems we're issuing a separate blit per horizontal line of the
> source data.   That is correct for the general case, where the
> destination width may not match the source pitch.
> 
> However, when we invoke gp_color_bitmap_to_screen_blt() for
> uploads, I'd expect the destination buffer to match the source,
> so a single blit would work.
> 
> If my guess is right, special casing "pitch == width*bpp"
> would be a big win.  Anyone minds adding an ErrorF()?
> 
> 
> NOTE ALEPH: I think we stopped development in the xf86-amd-devel
> repo some time ago.  The correct driver nowadays would be the
> fd.o one.  Jordan, do you confirm this?

Would be nice to know since I'll need to know where to get the sources
for the RPM updates from...  But ISTR mail about this to this to
xorg-devel@ last month, I think you're right.

Dan


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-11 Thread Jordan Crouse
On 11/09/07 13:03 -0400, Bernardo Innocenti wrote:
> 
> NOTE ALEPH: I think we stopped development in the xf86-amd-devel
> repo some time ago.  The correct driver nowadays would be the
> fd.o one.  Jordan, do you confirm this?

I cannot.  OLPC should always and forever more use the xf86-amd-devel 
tree.  The fd.o tree is for the rest of the world.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-11 Thread Bernardo Innocenti
On 09/11/2007 07:05 AM, Stefano Fedrigo wrote:

> At 16 bpp pixman_fill() takes twice the time.

This is just stupid!

The 16bpp codepath has to be broken somewhere if
it takes twice the time to copy half the bits :-)

-- 
   //  Bernardo Innocenti - http://www.codewiz.org/
 \X/ One Laptop Per Child - http://www.laptop.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-11 Thread Bernardo Innocenti
(adding xorg-devel@ on Cc)

On 09/11/2007 11:29 AM, Jordan Crouse wrote:
> On 11/09/07 13:05 +0200, Stefano Fedrigo wrote:
>> I've done some more profiling on the 16 vs. 24 bpp issue.
>> This time I used this test:
>> https://dev.laptop.org/git?p=sugar;a=blob;f=tests/graphics/hipposcalability.py
>>
>> A simple speed test: I measured the time required to scroll down and up
>> one time all the generated list.  Not extremely accurate, but I repeated the
>> test a few times with consistent results (+- 0.5 secs).  Mean times:
>>
>> xserver 1.4
>> 16 bpp: 37.9
>> 24 bpp: 40.7
>>
>> xserver 1.3
>> 16: 46.4
>> 24: 50.1
>>
>> At 24 bpp we're a little slower.  1.3 is 20% slower than 1.4. The pixman
>> migration patch makes the difference: 1.3 spend most of that 20% in memcpy().
>>
>> The oprofile reports are from xserver 1.4.  I don't see much difference
>> between 16 and 24, except that at 24 bpp, less time is spent in pixman and 
>> more
>> in amd_drv.  At 16 bpp pixman_fill() takes twice the time.
>>
>> Unfortunately without a working callgraph it's not very clear to me what's
>> happening in amd_drv.  At 24bpp gp_wait_until_idle() takes twice the time...
> 
> What can we do to fix this?  I would really like to know who is calling
> gp_wait_until_idle().

I think the invocation in lx_get_source_color() can safely
go away, as exaGetPixmapFirstPixel() has always done
correct locking even in 1.3.

But because the1x1 source pixmap used as solid color is
still being uploaded to the framebuffer, I'd expect
exaGetPixmapFirstPixel() to indirectly call the driver
download hook and, thus, stall the GPU anyway.

If this tiny pixmap was at least reused, the second
time it would be already in system memory.  And it
seems that Cairo is trying to cache patterns in the CR.

Problem is, many GTK widgets like to create a new CR on every
repaint event, thus rendering the cache quite effective for a
typical workload of a window with several small widgets in it.
But I've stumbled in the caching code a few months ago while
debugging something else, so I may very well be mistaken.

On git's master, Michel Dänzer has recently been pushing
a long run of EXA performance patches.  I've had only a
quick glance, but it seems they may cure 

$ git-log 8cfcf9..e8093e | git-shortlog
Michel Dänzer (14):
  EXA: Track valid bits in Sys and FB separately.
  Add DamagePendingRegion.
  EXA: Support partial migration of pixmap contents between Sys and FB.
  EXA: Hide pixmap pointer outside of exaPrepare/FinishAccess whenever 
possible.
  EXA: Improvements for trapezoids and triangles.
  EXA: exaImageGlyphBlt improvements.
  EXA: Improvements for 1x1 pixmaps.
  EXA: RENDER improvements.
  EXA: Remove superfluous manual damage tracking.
  EXA: exaGetImage improvements.
  EXA: exa(Shm)PutImage improvements.
  EXA: Use exaShmPutImage for pushing glyphs to scratch pixmap in exaGlyphs.
  EXA: exaFillRegion{Solid,Tiled} improvements.
  EXA: Exclude bits that will be overwritten from migration in exaCopyNtoN.


Aleph, I guess it may be useful to re-run tests after applying
these patches.  In case merging them on the 1.4 branch happens
to be difficult, using the code from master should be ok.
They don't seem to have diverged too much, yet.


> Also, I think we're spending way too much time in
> gp_color_bitmap_to_screen_blt() - is there any way we
> can get more indepth profiling in that one function?

Good idea!

Meanwhile, I looked at gp_color_bitmap_to_screen_blt() and it
seems we're issuing a separate blit per horizontal line of the
source data.   That is correct for the general case, where the
destination width may not match the source pitch.

However, when we invoke gp_color_bitmap_to_screen_blt() for
uploads, I'd expect the destination buffer to match the source,
so a single blit would work.

If my guess is right, special casing "pitch == width*bpp"
would be a big win.  Anyone minds adding an ErrorF()?


NOTE ALEPH: I think we stopped development in the xf86-amd-devel
repo some time ago.  The correct driver nowadays would be the
fd.o one.  Jordan, do you confirm this?

-- 
   //  Bernardo Innocenti - http://www.codewiz.org/
 \X/ One Laptop Per Child - http://www.laptop.org/

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-11 Thread Stefano Fedrigo
Jordan Crouse wrote:
>> Unfortunately without a working callgraph it's not very clear to me what's
>> happening in amd_drv.  At 24bpp gp_wait_until_idle() takes twice the time...
> 
> What can we do to fix this?  I would really like to know who is calling
> gp_wait_until_idle().

The problem seems to be that, on the XO, oprofile can only use an event timer 
that doesn't allow accurate stack tracing.  The process stack is not accessible 
inside the interrupt used for sampling.
I hope there is some way to solve this issue because, in kernel code, call 
tracing works, provided the kernel is compiled with frame pointers.
I had not the time to look at this in depth, perhaps selecting a different
interrupt source, if possible, is enough to have callgraphs in userspace
code.
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: More 16 vs 24 bpp profiling

2007-09-11 Thread Jordan Crouse
On 11/09/07 13:05 +0200, Stefano Fedrigo wrote:
> I've done some more profiling on the 16 vs. 24 bpp issue.
> This time I used this test:
> https://dev.laptop.org/git?p=sugar;a=blob;f=tests/graphics/hipposcalability.py
> 
> A simple speed test: I measured the time required to scroll down and up
> one time all the generated list.  Not extremely accurate, but I repeated the
> test a few times with consistent results (+- 0.5 secs).  Mean times:
> 
> xserver 1.4
> 16 bpp: 37.9
> 24 bpp: 40.7
> 
> xserver 1.3
> 16: 46.4
> 24: 50.1
> 
> At 24 bpp we're a little slower.  1.3 is 20% slower than 1.4. The pixman
> migration patch makes the difference: 1.3 spend most of that 20% in memcpy().
> 
> The oprofile reports are from xserver 1.4.  I don't see much difference
> between 16 and 24, except that at 24 bpp, less time is spent in pixman and 
> more
> in amd_drv.  At 16 bpp pixman_fill() takes twice the time.
> 
> Unfortunately without a working callgraph it's not very clear to me what's
> happening in amd_drv.  At 24bpp gp_wait_until_idle() takes twice the time...

What can we do to fix this?  I would really like to know who is calling
gp_wait_until_idle().

Also, I think we're spending way too much time in
gp_color_bitmap_to_screen_blt() - is there any way we can get more indepth
profiling in that one function?

Jordan


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


More 16 vs 24 bpp profiling

2007-09-11 Thread Stefano Fedrigo
I've done some more profiling on the 16 vs. 24 bpp issue.
This time I used this test:
https://dev.laptop.org/git?p=sugar;a=blob;f=tests/graphics/hipposcalability.py

A simple speed test: I measured the time required to scroll down and up
one time all the generated list.  Not extremely accurate, but I repeated the
test a few times with consistent results (+- 0.5 secs).  Mean times:

xserver 1.4
16 bpp: 37.9
24 bpp: 40.7

xserver 1.3
16: 46.4
24: 50.1

At 24 bpp we're a little slower.  1.3 is 20% slower than 1.4. The pixman
migration patch makes the difference: 1.3 spend most of that 20% in memcpy().

The oprofile reports are from xserver 1.4.  I don't see much difference
between 16 and 24, except that at 24 bpp, less time is spent in pixman and more
in amd_drv.  At 16 bpp pixman_fill() takes twice the time.

Unfortunately without a working callgraph it's not very clear to me what's
happening in amd_drv.  At 24bpp gp_wait_until_idle() takes twice the time...

CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
  TIMER:0|
  samples|  %|
--
 1767 44.4305 Xorg
	  TIMER:0|
	  samples|  %|
	--
	  883 49.9717 libpixman-1.so.0.9.5
	  307 17.3741 amd_drv.so
	  224 12.6769 Xorg
	  160  9.0549 libc-2.6.so
	  143  8.0928 libexa.so
	   42  2.3769 libfb.so
	6  0.3396 libextmod.so
	1  0.0566 anon (tgid:2307 range:0xb7ee6000-0xb7ee7000)
	1  0.0566 kbd_drv.so
 1740 43.7516 python
	  TIMER:0|
	  samples|  %|
	--
	  587 33.7356 libcairo.so.2.11.5
	  245 14.0805 libpython2.5.so.1.0
	  185 10.6322 libc-2.6.so
	  155  8.9080 libgobject-2.0.so.0.1200.13
	  106  6.0920 libglib-2.0.so.0.1200.13
	   94  5.4023 libpangoft2-1.0.so.0.1600.4
	   78  4.4828 libpthread-2.6.so
	   78  4.4828 libpango-1.0.so.0.1600.4
	   64  3.6782 libhippocanvas-1.so.0.0.0
	   26  1.4943 libX11.so.6.2.0
	   19  1.0920 libxcb.so.1.0.0
	   18  1.0345 libXrender.so.1.3.0
	   15  0.8621 libpangocairo-1.0.so.0.1600.4
	   12  0.6897 libm-2.6.so
	   11  0.6322 hippo.so
	   10  0.5747 _cairo.so
	9  0.5172 libxcb-xlib.so.0.0.0
	6  0.3448 _gobject.so
	5  0.2874 libgthread-2.0.so.0.1200.13
	5  0.2874 libgdk-x11-2.0.so.0.1000.14
	5  0.2874 libgtk-x11-2.0.so.0.1000.14
	3  0.1724 libfreetype.so.6.3.15
	2  0.1149 anon (tgid:2390 range:0xb7f9f000-0xb7fa)
	1  0.0575 libX11.so.6.2.0
	1  0.0575 timemodule.so
  440 11.0636 vmlinux

CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples  %image name   app name symbol name
359   9.0269  libpixman-1.so.0.9.5 Xorg pixman_rasterize_edges
326   8.1971  vmlinux  vmlinux  default_idle
245   6.1604  libpixman-1.so.0.9.5 Xorg pixman_fill
245   6.1604  libpython2.5.so.1.0  python   (no symbols)
162   4.0734  amd_drv.so   Xorg gp_color_bitmap_to_screen_blt
155   3.8974  libgobject-2.0.so.0.1200.13 python   (no symbols)
113   2.8413  amd_drv.so   Xorg gp_wait_until_idle
106   2.6653  libglib-2.0.so.0.1200.13 python   (no symbols)
942.3636  libpangoft2-1.0.so.0.1600.4 python   (no symbols)
781.9613  libpango-1.0.so.0.1600.4 python   (no symbols)
641.6093  libhippocanvas-1.so.0.0.0 python   (no symbols)
601.5087  libcairo.so.2.11.5   python   _PointDistanceSquaredToSegment
561.4081  libc-2.6.so  Xorg _int_malloc
501.2572  libc-2.6.so  python   _int_malloc
471.1818  libc-2.6.so  python   memcpy
401.0058  libc-2.6.so  Xorg memcpy
370.9303  libpixman-1.so.0.9.5 Xorg __divdi3
340.8549  libpthread-2.6.sopython   pthread_mutex_lock
320.8046  libc-2.6.so  python   msort_with_tmp
300.7543  libcairo.so.2.11.5   python   _cairo_bentley_ottmann_tessellate_polygon
270.6789  Xorg Xorg __divdi3
270.6789  libcairo.so.2.11.5   python   _cairo_skip_list_insert
260.6538  libpixman-1.so.0.9.5 Xorg _pixman_edge_tMultiInit
250.6286  libpixman-1.so.0.9.5 Xorg pixman_sample_floor_y
230.5783  libc-2.6.so  Xorg _int_free
230.5783  libcairo.so.2.11.5   python   _cairo_bo