At 02:26 AM 10/16/03 -0700, Peter Kirk wrote:
You can never tell whether something is going to be a "performance
issue" -- not just "measurably slower," but actually affecting
usability -- until you do some profiling.  Guessing does no good.

Well, did the people who wrote this in the standard do some profiling, or did they just guess? There should be no place in a standard for statements which are just guesses.

Oh don't we just love making categorical statements today.


Scripts where the issue is expected to actually matter include Arabic and Hebrew. Both those scripts require the Bidi algorithm to be run (in addition to all the other rendering related tasks). There are two phases to that algorithm: level assignment and reversal. Assigning levels is a linear process, but reversal depends on both the input size and the number of levels. So, it's essentially equivalent to an O(Nxm) where m is a not quite constant, but small number.

Arabic would need positional shaping in addition to the bid algorithm.

Normalization has mapping and reordering phases. The reordering is O(n log(n)) where n is the length of a combining sequence. Realistically that's a small number. The rest of the algorithm is O(N) with N the length of the input. For NFC there's a decomposition and the composition phase, so the number of steps per character is not as trivial as a strcpy, but then again, neither is bidi.

The rest of rendering has to map characters to glyphs, add glyph extents, calculate line breaks, determine glyph positions, and finally rasterize outlines and copy bits. (When rendering to PDF, the last two steps would be slightly different). That's a whole lot of passes over the data as well, many of them with a non-trivial number of steps per input character.

Given this context, it's more than an educated guess that normalization at rendering time will not dominate the performance, particularly not when optimized.

Even for pure ASCII data (which never need normalization), the rendering & display tasks will take more steps per character than a normalization quick check (especially one opitmized for ASCII input ;-).

Therefore, I regard the statement in the text of the standard as quite defensible (if understood in context) and to be better supported than a mere 'guess'. It's a well-educated guess, probably even a PhD. guess.

However, if someone has measurements from a well-tuned system, it would be nice to know some realistic values for the relative cost of normalization and display.

A./



Reply via email to