|
Here's an update on the timings of various 16 bit string
compares.
Test File
u_strcmp u_strcmpCodePointOrder
wcscmp (Windows C Lib function)
---------------------------------------------------------------
Asian Names
37
40
36
Latin Names
56
59
60
Times are in ns on a 866 MHz Pentium III, running Windows
2000.
The test is a binary search within a list of roughly 10000
names.
The bottom line is that the cost to compare UTF-16 strings in
code point order is very small - the 3 ns is close to being
in the noise for these measurements.
The implication for the UTF-8S debate, as best I understand
it, is that solving the UTF-8 / UTF-16 sort order
incompatibility by comparing the UTF-16 strings in code
point order would be perfectly viable from a performance
standpoint. (UTF-8 strings naturally compare in code point
order)
-----
The differences from the times quoted in Mark's note, below,
are these:
1. Loop overhead is removed from the measurement.
(The loop overhead is
actually quite a bit smaller than the
difference in numbers would seem
to suggest, because of processor cache
interactions. For the new numbers,
the string data, because of the structure
of the test, will always be
in L1 cache. With the old
numbers, data cache misses are significant,
making the old results perhaps more
representative real use.)
2. I tweaked the ICU code to make it a tiny bit
faster.
|
- Cost of code point order comparison Mark Davis
- Andy Heninger

