On Fri, Sep 11, 2009 at 6:21 PM, Donovan Jimenez <djime...@conduit-it.com> wrote: > Is it possible (and would it even help) to normalize all strings with > regards to surrogate pairs at indexing time instead?
Already done, in a way... there's only one way to represent a character outside the BMP in UTF-16 (which is the in-memory encoding used by Java String). Unless I misunderstood what you meant by normalization. -Yonik http://www.lucidimagination.com