Re: ISO 8859-11 (Thai) cross-mapping table

John Cowan Tue, 08 Oct 2002 06:08:32 -0700

Elliotte Rusty Harold scripsit:

> The Verifier class has a similar issue, though there it's a case of 
> determining whether or not any given character is a legal XML 
> character/name character/name-start character/ etc. This is done with 
> a trick introduced in JDOM where the code looks like this:


The underlying data structure here is called a "range table", and is
a list of ranges in codepoint order, expressed thus:

        start of first range
        end of first range + 1
        start of second range
        end of second range + 1

etc. etc.  What you are doing is equivalent to a linear search over
this range followed by loop unrolling.  However, you can do better,
especially in complex cases, with a *binary* search over the range
followed by loop unrolling.  The trick here is that if the binary
search returns an even value, it succeeds; an odd value fails.

But I agree that testing ASCII first is wise.  Mozilla bypasses
its general algorithm in the ASCII/Latin-1 case, providing a fast
256-element lookup table and then an extremely compact (but somewhat
slower than the above) data structure for the rest of Unicode.

-- 
John Cowan                                [EMAIL PROTECTED]
At times of peril or dubitation,          http://www.ccil.org/~cowan
Perform swift circular ambulation,        http://www.reutershealth.com
With loud and high-pitched ululation.

Re: ISO 8859-11 (Thai) cross-mapping table

Reply via email to