Up until now, Scintilla has only supported Unicode's Basic Multilingual Plane and no other planes. This hasn't been a big problem since the other planes are mostly used for historical characters. Other planes require 4 byte sequences in UTF-8 and 2 code unit sequences in UTF-16.
An example file using other planes is at: http://www.scintilla.org/nonBMP.txt There is a shareware font called Code2001 that contains these characters and is available from http://www.code2000.net/ The modification mostly concentrates on Windows where UTF-16 is the system encoding so surrogates (the 2 code unit sequences) have to be considered when converting to or from for system calls. Most other platforms are UTF-8 based and already handle 1, 2,and 3 byte characters so adding 4 byte sequences was fairly simple. The files UniConversion.cxx and UniConversion.h encapsulate most of the changes although the function names exposed by UniConversion have changed with "UCS2" replaced with "UTF16". Platform maintainers should check their implementations for code that converts between UTF-8 and UCS-2. Since the UniConversion function names have changed, this should be reasonably obvious. If the name change makes maintenance of other platforms difficult, I'll swap back to the old names even though these are less accurate. Committed to CVS. Only tested on Windows XP so far. Neil _______________________________________________ Scintilla-interest mailing list [email protected] http://mailman.lyra.org/mailman/listinfo/scintilla-interest
