Re: More 16 vs 24 bpp profiling
On 09/11/2007 01:32 PM, Bernardo Innocenti wrote: > The 16bpp codepath has to be broken somewhere if > it takes twice the time to copy half the bits :-) It strikes me that we don't see any time spent in pixman_fill_mmx(), even though it's not inlinable. For some reason, pixman thinks it cannot accelerate 16bpp fills with MMX, at least on the Geode. Might be worth investigating... -- // Bernardo Innocenti - http://www.codewiz.org/ \X/ One Laptop Per Child - http://www.laptop.org/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
On 09/12/2007 11:40 AM, Marco Pesenti Gritti wrote: > On 9/12/07, Jordan Crouse <[EMAIL PROTECTED]> wrote: >> On 12/09/07 17:21 +0200, Marco Pesenti Gritti wrote: >>> Yeah this the current check: >>> >>> (strcmp(vendor, "AuthenticAMD") == 0 || >>> strcmp(vendor, "Geode by NSC") == 0)) >>> I think Jordan mentioned that the LX is not using "Geode by NSC" anymore. >> We should be using AuthenticAMD now. Check /proc/cpuinfo to make sure. > > Yeah, that's correct. > > I guess it would still be worth to make sure mmx is actually enabled > in cairo, other parts of that check might be failing... Visually, the pixman source looks right. I still had no time to install a verbose version of it to make sure it's doing the right thing. Anybody feel free to beat me on time (and publish their findings here). -- // Bernardo Innocenti - http://www.codewiz.org/ \X/ One Laptop Per Child - http://www.laptop.org/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
On 9/12/07, Jordan Crouse <[EMAIL PROTECTED]> wrote: > On 12/09/07 17:21 +0200, Marco Pesenti Gritti wrote: > > On 9/12/07, Dan Williams <[EMAIL PROTECTED]> wrote: > > > On Tue, 2007-09-11 at 14:19 -0400, Bernardo Innocenti wrote: > > > > On 09/11/2007 01:32 PM, Bernardo Innocenti wrote: > > > > > > > > > The 16bpp codepath has to be broken somewhere if > > > > > it takes twice the time to copy half the bits :-) > > > > > > > > It strikes me that we don't see any time spent in > > > > pixman_fill_mmx(), even though it's not inlinable. > > > > > > > > For some reason, pixman thinks it cannot accelerate > > > > 16bpp fills with MMX, at least on the Geode. > > > > > > > > Might be worth investigating... > > > > > > We did have to patch the MMX check in pixman long ago, maybe that got > > > broken somehow? There were actually two, an update to the cpu flags and > > > also a strcmp() on the processor ID that had to be fixed to get pixman > > > to detect MMX capability on the geode. > > > > > > > Yeah this the current check: > > > > (strcmp(vendor, "AuthenticAMD") == 0 || > > strcmp(vendor, "Geode by NSC") == 0)) > > > I think Jordan mentioned that the LX is not using "Geode by NSC" anymore. > > We should be using AuthenticAMD now. Check /proc/cpuinfo to make sure. > Yeah, that's correct. I guess it would still be worth to make sure mmx is actually enabled in cairo, other parts of that check might be failing... Marco ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
On 12/09/07 17:21 +0200, Marco Pesenti Gritti wrote: > On 9/12/07, Dan Williams <[EMAIL PROTECTED]> wrote: > > On Tue, 2007-09-11 at 14:19 -0400, Bernardo Innocenti wrote: > > > On 09/11/2007 01:32 PM, Bernardo Innocenti wrote: > > > > > > > The 16bpp codepath has to be broken somewhere if > > > > it takes twice the time to copy half the bits :-) > > > > > > It strikes me that we don't see any time spent in > > > pixman_fill_mmx(), even though it's not inlinable. > > > > > > For some reason, pixman thinks it cannot accelerate > > > 16bpp fills with MMX, at least on the Geode. > > > > > > Might be worth investigating... > > > > We did have to patch the MMX check in pixman long ago, maybe that got > > broken somehow? There were actually two, an update to the cpu flags and > > also a strcmp() on the processor ID that had to be fixed to get pixman > > to detect MMX capability on the geode. > > > > Yeah this the current check: > > (strcmp(vendor, "AuthenticAMD") == 0 || > strcmp(vendor, "Geode by NSC") == 0)) > I think Jordan mentioned that the LX is not using "Geode by NSC" anymore. We should be using AuthenticAMD now. Check /proc/cpuinfo to make sure. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
On 9/12/07, Dan Williams <[EMAIL PROTECTED]> wrote: > On Tue, 2007-09-11 at 14:19 -0400, Bernardo Innocenti wrote: > > On 09/11/2007 01:32 PM, Bernardo Innocenti wrote: > > > > > The 16bpp codepath has to be broken somewhere if > > > it takes twice the time to copy half the bits :-) > > > > It strikes me that we don't see any time spent in > > pixman_fill_mmx(), even though it's not inlinable. > > > > For some reason, pixman thinks it cannot accelerate > > 16bpp fills with MMX, at least on the Geode. > > > > Might be worth investigating... > > We did have to patch the MMX check in pixman long ago, maybe that got > broken somehow? There were actually two, an update to the cpu flags and > also a strcmp() on the processor ID that had to be fixed to get pixman > to detect MMX capability on the geode. > Yeah this the current check: (strcmp(vendor, "AuthenticAMD") == 0 || strcmp(vendor, "Geode by NSC") == 0)) I think Jordan mentioned that the LX is not using "Geode by NSC" anymore. Marco ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
On Tue, 2007-09-11 at 14:19 -0400, Bernardo Innocenti wrote: > On 09/11/2007 01:32 PM, Bernardo Innocenti wrote: > > > The 16bpp codepath has to be broken somewhere if > > it takes twice the time to copy half the bits :-) > > It strikes me that we don't see any time spent in > pixman_fill_mmx(), even though it's not inlinable. > > For some reason, pixman thinks it cannot accelerate > 16bpp fills with MMX, at least on the Geode. > > Might be worth investigating... We did have to patch the MMX check in pixman long ago, maybe that got broken somehow? There were actually two, an update to the cpu flags and also a strcmp() on the processor ID that had to be fixed to get pixman to detect MMX capability on the geode. Dan ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
On 09/11/2007 01:32 PM, Bernardo Innocenti wrote: > The 16bpp codepath has to be broken somewhere if > it takes twice the time to copy half the bits :-) It strikes me that we don't see any time spent in pixman_fill_mmx(), even though it's not inlinable. For some reason, pixman thinks it cannot accelerate 16bpp fills with MMX, at least on the Geode. Might be worth investigating... -- // Bernardo Innocenti - http://www.codewiz.org/ \X/ One Laptop Per Child - http://www.laptop.org/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
On 09/11/2007 01:44 PM, Jordan Crouse wrote: > On 11/09/07 13:03 -0400, Bernardo Innocenti wrote: >> NOTE ALEPH: I think we stopped development in the xf86-amd-devel >> repo some time ago. The correct driver nowadays would be the >> fd.o one. Jordan, do you confirm this? > > I cannot. OLPC should always and forever more use the xf86-amd-devel > tree. The fd.o tree is for the rest of the world. Sorry, then. You told me that some time ago, but now from the git logs I got the impression that development was happening on the fd.o tree only. This reminds me I still have to test Ajax's dcon hack. At this time, I still have several higher priority tasks assigned to me. So, would you or aleph like to do give it a shot? Last time I tried, there was display corruption. The logic for checking damage seems weird, worth checking. Details are here: https://dev.laptop.org/ticket/1412 -- // Bernardo Innocenti - http://www.codewiz.org/ \X/ One Laptop Per Child - http://www.laptop.org/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
On Tue, 2007-09-11 at 13:03 -0400, Bernardo Innocenti wrote: > (adding xorg-devel@ on Cc) > > On 09/11/2007 11:29 AM, Jordan Crouse wrote: > > On 11/09/07 13:05 +0200, Stefano Fedrigo wrote: > >> I've done some more profiling on the 16 vs. 24 bpp issue. > >> This time I used this test: > >> https://dev.laptop.org/git?p=sugar;a=blob;f=tests/graphics/hipposcalability.py > >> > >> A simple speed test: I measured the time required to scroll down and up > >> one time all the generated list. Not extremely accurate, but I repeated > >> the > >> test a few times with consistent results (+- 0.5 secs). Mean times: > >> > >> xserver 1.4 > >> 16 bpp: 37.9 > >> 24 bpp: 40.7 > >> > >> xserver 1.3 > >> 16: 46.4 > >> 24: 50.1 > >> > >> At 24 bpp we're a little slower. 1.3 is 20% slower than 1.4. The pixman > >> migration patch makes the difference: 1.3 spend most of that 20% in > >> memcpy(). > >> > >> The oprofile reports are from xserver 1.4. I don't see much difference > >> between 16 and 24, except that at 24 bpp, less time is spent in pixman and > >> more > >> in amd_drv. At 16 bpp pixman_fill() takes twice the time. > >> > >> Unfortunately without a working callgraph it's not very clear to me what's > >> happening in amd_drv. At 24bpp gp_wait_until_idle() takes twice the > >> time... > > > > What can we do to fix this? I would really like to know who is calling > > gp_wait_until_idle(). > > I think the invocation in lx_get_source_color() can safely > go away, as exaGetPixmapFirstPixel() has always done > correct locking even in 1.3. > > But because the1x1 source pixmap used as solid color is > still being uploaded to the framebuffer, I'd expect > exaGetPixmapFirstPixel() to indirectly call the driver > download hook and, thus, stall the GPU anyway. > > If this tiny pixmap was at least reused, the second > time it would be already in system memory. And it > seems that Cairo is trying to cache patterns in the CR. > > Problem is, many GTK widgets like to create a new CR on every > repaint event, thus rendering the cache quite effective for a > typical workload of a window with several small widgets in it. > But I've stumbled in the caching code a few months ago while > debugging something else, so I may very well be mistaken. > > On git's master, Michel Dänzer has recently been pushing > a long run of EXA performance patches. I've had only a > quick glance, but it seems they may cure > > $ git-log 8cfcf9..e8093e | git-shortlog > Michel Dänzer (14): > EXA: Track valid bits in Sys and FB separately. > Add DamagePendingRegion. > EXA: Support partial migration of pixmap contents between Sys and FB. > EXA: Hide pixmap pointer outside of exaPrepare/FinishAccess whenever > possible. > EXA: Improvements for trapezoids and triangles. > EXA: exaImageGlyphBlt improvements. > EXA: Improvements for 1x1 pixmaps. > EXA: RENDER improvements. > EXA: Remove superfluous manual damage tracking. > EXA: exaGetImage improvements. > EXA: exa(Shm)PutImage improvements. > EXA: Use exaShmPutImage for pushing glyphs to scratch pixmap in > exaGlyphs. > EXA: exaFillRegion{Solid,Tiled} improvements. > EXA: Exclude bits that will be overwritten from migration in > exaCopyNtoN. > > > Aleph, I guess it may be useful to re-run tests after applying > these patches. In case merging them on the 1.4 branch happens > to be difficult, using the code from master should be ok. > They don't seem to have diverged too much, yet. > > > > Also, I think we're spending way too much time in > > gp_color_bitmap_to_screen_blt() - is there any way we > > can get more indepth profiling in that one function? > > Good idea! > > Meanwhile, I looked at gp_color_bitmap_to_screen_blt() and it > seems we're issuing a separate blit per horizontal line of the > source data. That is correct for the general case, where the > destination width may not match the source pitch. > > However, when we invoke gp_color_bitmap_to_screen_blt() for > uploads, I'd expect the destination buffer to match the source, > so a single blit would work. > > If my guess is right, special casing "pitch == width*bpp" > would be a big win. Anyone minds adding an ErrorF()? > > > NOTE ALEPH: I think we stopped development in the xf86-amd-devel > repo some time ago. The correct driver nowadays would be the > fd.o one. Jordan, do you confirm this? Would be nice to know since I'll need to know where to get the sources for the RPM updates from... But ISTR mail about this to this to xorg-devel@ last month, I think you're right. Dan ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
On 11/09/07 13:03 -0400, Bernardo Innocenti wrote: > > NOTE ALEPH: I think we stopped development in the xf86-amd-devel > repo some time ago. The correct driver nowadays would be the > fd.o one. Jordan, do you confirm this? I cannot. OLPC should always and forever more use the xf86-amd-devel tree. The fd.o tree is for the rest of the world. Jordan -- Jordan Crouse Systems Software Development Engineer Advanced Micro Devices, Inc. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
On 09/11/2007 07:05 AM, Stefano Fedrigo wrote: > At 16 bpp pixman_fill() takes twice the time. This is just stupid! The 16bpp codepath has to be broken somewhere if it takes twice the time to copy half the bits :-) -- // Bernardo Innocenti - http://www.codewiz.org/ \X/ One Laptop Per Child - http://www.laptop.org/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
(adding xorg-devel@ on Cc) On 09/11/2007 11:29 AM, Jordan Crouse wrote: > On 11/09/07 13:05 +0200, Stefano Fedrigo wrote: >> I've done some more profiling on the 16 vs. 24 bpp issue. >> This time I used this test: >> https://dev.laptop.org/git?p=sugar;a=blob;f=tests/graphics/hipposcalability.py >> >> A simple speed test: I measured the time required to scroll down and up >> one time all the generated list. Not extremely accurate, but I repeated the >> test a few times with consistent results (+- 0.5 secs). Mean times: >> >> xserver 1.4 >> 16 bpp: 37.9 >> 24 bpp: 40.7 >> >> xserver 1.3 >> 16: 46.4 >> 24: 50.1 >> >> At 24 bpp we're a little slower. 1.3 is 20% slower than 1.4. The pixman >> migration patch makes the difference: 1.3 spend most of that 20% in memcpy(). >> >> The oprofile reports are from xserver 1.4. I don't see much difference >> between 16 and 24, except that at 24 bpp, less time is spent in pixman and >> more >> in amd_drv. At 16 bpp pixman_fill() takes twice the time. >> >> Unfortunately without a working callgraph it's not very clear to me what's >> happening in amd_drv. At 24bpp gp_wait_until_idle() takes twice the time... > > What can we do to fix this? I would really like to know who is calling > gp_wait_until_idle(). I think the invocation in lx_get_source_color() can safely go away, as exaGetPixmapFirstPixel() has always done correct locking even in 1.3. But because the1x1 source pixmap used as solid color is still being uploaded to the framebuffer, I'd expect exaGetPixmapFirstPixel() to indirectly call the driver download hook and, thus, stall the GPU anyway. If this tiny pixmap was at least reused, the second time it would be already in system memory. And it seems that Cairo is trying to cache patterns in the CR. Problem is, many GTK widgets like to create a new CR on every repaint event, thus rendering the cache quite effective for a typical workload of a window with several small widgets in it. But I've stumbled in the caching code a few months ago while debugging something else, so I may very well be mistaken. On git's master, Michel Dänzer has recently been pushing a long run of EXA performance patches. I've had only a quick glance, but it seems they may cure $ git-log 8cfcf9..e8093e | git-shortlog Michel Dänzer (14): EXA: Track valid bits in Sys and FB separately. Add DamagePendingRegion. EXA: Support partial migration of pixmap contents between Sys and FB. EXA: Hide pixmap pointer outside of exaPrepare/FinishAccess whenever possible. EXA: Improvements for trapezoids and triangles. EXA: exaImageGlyphBlt improvements. EXA: Improvements for 1x1 pixmaps. EXA: RENDER improvements. EXA: Remove superfluous manual damage tracking. EXA: exaGetImage improvements. EXA: exa(Shm)PutImage improvements. EXA: Use exaShmPutImage for pushing glyphs to scratch pixmap in exaGlyphs. EXA: exaFillRegion{Solid,Tiled} improvements. EXA: Exclude bits that will be overwritten from migration in exaCopyNtoN. Aleph, I guess it may be useful to re-run tests after applying these patches. In case merging them on the 1.4 branch happens to be difficult, using the code from master should be ok. They don't seem to have diverged too much, yet. > Also, I think we're spending way too much time in > gp_color_bitmap_to_screen_blt() - is there any way we > can get more indepth profiling in that one function? Good idea! Meanwhile, I looked at gp_color_bitmap_to_screen_blt() and it seems we're issuing a separate blit per horizontal line of the source data. That is correct for the general case, where the destination width may not match the source pitch. However, when we invoke gp_color_bitmap_to_screen_blt() for uploads, I'd expect the destination buffer to match the source, so a single blit would work. If my guess is right, special casing "pitch == width*bpp" would be a big win. Anyone minds adding an ErrorF()? NOTE ALEPH: I think we stopped development in the xf86-amd-devel repo some time ago. The correct driver nowadays would be the fd.o one. Jordan, do you confirm this? -- // Bernardo Innocenti - http://www.codewiz.org/ \X/ One Laptop Per Child - http://www.laptop.org/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
Jordan Crouse wrote: >> Unfortunately without a working callgraph it's not very clear to me what's >> happening in amd_drv. At 24bpp gp_wait_until_idle() takes twice the time... > > What can we do to fix this? I would really like to know who is calling > gp_wait_until_idle(). The problem seems to be that, on the XO, oprofile can only use an event timer that doesn't allow accurate stack tracing. The process stack is not accessible inside the interrupt used for sampling. I hope there is some way to solve this issue because, in kernel code, call tracing works, provided the kernel is compiled with frame pointers. I had not the time to look at this in depth, perhaps selecting a different interrupt source, if possible, is enough to have callgraphs in userspace code. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: More 16 vs 24 bpp profiling
On 11/09/07 13:05 +0200, Stefano Fedrigo wrote: > I've done some more profiling on the 16 vs. 24 bpp issue. > This time I used this test: > https://dev.laptop.org/git?p=sugar;a=blob;f=tests/graphics/hipposcalability.py > > A simple speed test: I measured the time required to scroll down and up > one time all the generated list. Not extremely accurate, but I repeated the > test a few times with consistent results (+- 0.5 secs). Mean times: > > xserver 1.4 > 16 bpp: 37.9 > 24 bpp: 40.7 > > xserver 1.3 > 16: 46.4 > 24: 50.1 > > At 24 bpp we're a little slower. 1.3 is 20% slower than 1.4. The pixman > migration patch makes the difference: 1.3 spend most of that 20% in memcpy(). > > The oprofile reports are from xserver 1.4. I don't see much difference > between 16 and 24, except that at 24 bpp, less time is spent in pixman and > more > in amd_drv. At 16 bpp pixman_fill() takes twice the time. > > Unfortunately without a working callgraph it's not very clear to me what's > happening in amd_drv. At 24bpp gp_wait_until_idle() takes twice the time... What can we do to fix this? I would really like to know who is calling gp_wait_until_idle(). Also, I think we're spending way too much time in gp_color_bitmap_to_screen_blt() - is there any way we can get more indepth profiling in that one function? Jordan ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
More 16 vs 24 bpp profiling
I've done some more profiling on the 16 vs. 24 bpp issue. This time I used this test: https://dev.laptop.org/git?p=sugar;a=blob;f=tests/graphics/hipposcalability.py A simple speed test: I measured the time required to scroll down and up one time all the generated list. Not extremely accurate, but I repeated the test a few times with consistent results (+- 0.5 secs). Mean times: xserver 1.4 16 bpp: 37.9 24 bpp: 40.7 xserver 1.3 16: 46.4 24: 50.1 At 24 bpp we're a little slower. 1.3 is 20% slower than 1.4. The pixman migration patch makes the difference: 1.3 spend most of that 20% in memcpy(). The oprofile reports are from xserver 1.4. I don't see much difference between 16 and 24, except that at 24 bpp, less time is spent in pixman and more in amd_drv. At 16 bpp pixman_fill() takes twice the time. Unfortunately without a working callgraph it's not very clear to me what's happening in amd_drv. At 24bpp gp_wait_until_idle() takes twice the time... CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt TIMER:0| samples| %| -- 1767 44.4305 Xorg TIMER:0| samples| %| -- 883 49.9717 libpixman-1.so.0.9.5 307 17.3741 amd_drv.so 224 12.6769 Xorg 160 9.0549 libc-2.6.so 143 8.0928 libexa.so 42 2.3769 libfb.so 6 0.3396 libextmod.so 1 0.0566 anon (tgid:2307 range:0xb7ee6000-0xb7ee7000) 1 0.0566 kbd_drv.so 1740 43.7516 python TIMER:0| samples| %| -- 587 33.7356 libcairo.so.2.11.5 245 14.0805 libpython2.5.so.1.0 185 10.6322 libc-2.6.so 155 8.9080 libgobject-2.0.so.0.1200.13 106 6.0920 libglib-2.0.so.0.1200.13 94 5.4023 libpangoft2-1.0.so.0.1600.4 78 4.4828 libpthread-2.6.so 78 4.4828 libpango-1.0.so.0.1600.4 64 3.6782 libhippocanvas-1.so.0.0.0 26 1.4943 libX11.so.6.2.0 19 1.0920 libxcb.so.1.0.0 18 1.0345 libXrender.so.1.3.0 15 0.8621 libpangocairo-1.0.so.0.1600.4 12 0.6897 libm-2.6.so 11 0.6322 hippo.so 10 0.5747 _cairo.so 9 0.5172 libxcb-xlib.so.0.0.0 6 0.3448 _gobject.so 5 0.2874 libgthread-2.0.so.0.1200.13 5 0.2874 libgdk-x11-2.0.so.0.1000.14 5 0.2874 libgtk-x11-2.0.so.0.1000.14 3 0.1724 libfreetype.so.6.3.15 2 0.1149 anon (tgid:2390 range:0xb7f9f000-0xb7fa) 1 0.0575 libX11.so.6.2.0 1 0.0575 timemodule.so 440 11.0636 vmlinux CPU: CPU with timer interrupt, speed 0 MHz (estimated) Profiling through timer interrupt samples %image name app name symbol name 359 9.0269 libpixman-1.so.0.9.5 Xorg pixman_rasterize_edges 326 8.1971 vmlinux vmlinux default_idle 245 6.1604 libpixman-1.so.0.9.5 Xorg pixman_fill 245 6.1604 libpython2.5.so.1.0 python (no symbols) 162 4.0734 amd_drv.so Xorg gp_color_bitmap_to_screen_blt 155 3.8974 libgobject-2.0.so.0.1200.13 python (no symbols) 113 2.8413 amd_drv.so Xorg gp_wait_until_idle 106 2.6653 libglib-2.0.so.0.1200.13 python (no symbols) 942.3636 libpangoft2-1.0.so.0.1600.4 python (no symbols) 781.9613 libpango-1.0.so.0.1600.4 python (no symbols) 641.6093 libhippocanvas-1.so.0.0.0 python (no symbols) 601.5087 libcairo.so.2.11.5 python _PointDistanceSquaredToSegment 561.4081 libc-2.6.so Xorg _int_malloc 501.2572 libc-2.6.so python _int_malloc 471.1818 libc-2.6.so python memcpy 401.0058 libc-2.6.so Xorg memcpy 370.9303 libpixman-1.so.0.9.5 Xorg __divdi3 340.8549 libpthread-2.6.sopython pthread_mutex_lock 320.8046 libc-2.6.so python msort_with_tmp 300.7543 libcairo.so.2.11.5 python _cairo_bentley_ottmann_tessellate_polygon 270.6789 Xorg Xorg __divdi3 270.6789 libcairo.so.2.11.5 python _cairo_skip_list_insert 260.6538 libpixman-1.so.0.9.5 Xorg _pixman_edge_tMultiInit 250.6286 libpixman-1.so.0.9.5 Xorg pixman_sample_floor_y 230.5783 libc-2.6.so Xorg _int_free 230.5783 libcairo.so.2.11.5 python _cairo_bo