On Wed, 2006-11-29 at 09:24 -0500, Xan Lopez wrote on cairo list: > > Anyway, thanks to the nice guys from opened-hand I received the patch > that fixes the symbol resolving brokeness of my oprofiler, so I can > now provide sane profiles. I'm attaching profiles for cairo, pango and > pangocairo from a timetext run.
Can you attach a merged profile too (one for all libraries, oprofile does that I believe)? With separate profiles, it's not as easy to see what's going on, without guessing and all... > An immediate comment from the pangocairo profile: is the glyph_extents > cache working? it's taking a lot of CPU time and we are exposing the > same ~20 characters characters all over again (this might be getting > borderline off-topic for this list, now I think about it. If I'm > bothering someone please just tell). I'm attaching the profiles > because I've not yet received the fd.o web space, sorry about it. On Wed, 2006-11-29 at 10:44 -0500, Xan Lopez wrote on gtk-i18n-list: > Hey, > In a follow up of my profiling experiments[1] I tried to figure out > why pango_cairo_fc_font_get_glyph_extents was appearing so high in the > charts. I'm using the timetext[2] program, which exposes a 67 > (including whitespace) character long string with 22 different > characters as much as possible in one minute. One I checked that the > cache was actually working I modified the function to just compute the > extents of the first character received and then use that value all > the time (yeah, weird experiments are fun). I hoped this would make > the function drop in the profile, but it did nothing! Puzzled, I > decided to count how many times was this function being called. As it > turns out, for 292 expose events of a 67 character long GtkLabel we > are calling it 56,191 times. That's almost 3 times per character per > expose event, 4 if you ignore whitespace. Does it sound right? Ok, I'm finally happy enough to reply to this. First, the string is 64 chars long. And your 56,191 should really be 56,192. And that is exactly 3 pango_font_get_glyph_extents() calls per character per expose, plus 2 initial ones: 56,192 = 292 * 64 * 3 + 64 * 2. The initial ones come from the shaper and PangoLayout, and whose results are cached in PangoLayout. The 3-per-draw came as a surprise to me. I used to think that the only place we were making those calls were in pango_renderer_draw_layout_line and that I nuked that in favor of using a new API call pango_glyph_string_get_width(). Here: http://cvs.gnome.org/viewcvs/pango/pango/pango-layout.c?r1=1.176&r2=1.177 http://cvs.gnome.org/viewcvs/pango/pango/pango-layout.c?r1=1.177&r2=1.178 So I expected Pango HEAD to make one fewer such calls per glyph. But that was not the case. Further investigations showed that it was actually working correctly there, but that an innocent-looking bugfix of mine was making another place in PangoLayout to make an extra call now: http://cvs.gnome.org/viewcvs/pango/pango/pango-layout.c?r1=1.189&r2=1.190 So, I fixed that today: http://cvs.gnome.org/viewcvs/pango/pango/pango-layout.c?r1=1.193&r2=1.194 and that brings us down to 2 get_glyph_extents() calls per glyph per expose. The remaining two are apparently generated by the PangoLayoutIter that pango-renderer.c creates to render the layout. Looking into those, one is done in update_run() to cache run_logical_rect. Of that cached value, we only use the width (unless user asks for the run logical_rect of course), so I switched that one over to pango_glyph_string_get_width() too right now: http://cvs.gnome.org/viewcvs/pango/pango/pango-layout.c?r1=1.194&r2=1.195 and now there's only 1 get_glyph_extents() calls per glyph per expose. That one is a bit harder to fix, unless one caches per-line or per-item extents. There's actually a patch for this already: http://bugzilla.gnome.org/show_bug.cgi?id=135683 The problem is, PangoLayout has API that gives away pointer to internal structures, such that you can modify the glyph widths, and that is legitimate use. For example Firefox+Pango uses that to justify lines. So I was under the impression that we cannot meaningfully cache much, until I figured, well: we can cache, until the user asks for that evil pointer! So, unless we have handed out a pointer to internal stuff, we can cache. And even when we have given the pointer away, we can cache as part of a single pango operation. So, I'm going to look more into this, and come up with a sensible scheme that doesn't introduce regressions. With that in place, it may make sense to revert some of my pango_glyph_string_get_width() uses above and use the cached values; maybe not. Donno. Anyway, that's all for now. Some numbers: For timetext.c [2]: Before: Drawn label 112412 times. Average time spent drawing (in seconds): 0.000105 After: Drawn label 123871 times. Average time spent drawing (in seconds): 0.000063 So, in this case, the expose time was improved 40%, and overall performance was improved 10%. I expect (much) higher numbers for the latter on tiny gadgets, but can't tell unless I'm offered one ;-). Offtopic: I use a small toy of mine to write /probes/ that can do cool things about what library calls are being made without having to compile/install pango all the time. It's a tiny script called bprobe, available from GNOME CVS. For example, this is the probe I used to figure out what's going on: http://cvs.gnome.org/viewcvs/*checkout*/bprobe/probes/pango-glyph_extents.probe?rev=HEAD However, this only works because Pango doesn't have the machinery to avoid PLT usage for local symbols. That's another thing to look into, may have measurable performance (and size) improvements on smaller devices. > [1]http://lists.freedesktop.org/archives/cairo/2006-November/008628.html > [2]http://folks.o-hand.com/jorn/pango-benchmarks/timetext.c Cheers, -- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759 _______________________________________________ Performance-list mailing list Performance-list@gnome.org http://mail.gnome.org/mailman/listinfo/performance-list