Up until now, Scintilla has only supported Unicode's Basic
Multilingual Plane and no other planes. This hasn't been a big problem
since the other planes are mostly used for historical characters.
Other planes require 4 byte sequences in UTF-8 and 2 code unit
sequences in UTF-16.

  An example file using other planes is at:
http://www.scintilla.org/nonBMP.txt
  There is a shareware font called Code2001 that contains these
characters and is available from
http://www.code2000.net/

  The modification mostly concentrates on Windows where UTF-16 is the
system encoding so surrogates (the 2 code unit sequences) have to be
considered when converting to or from for system calls. Most other
platforms are UTF-8 based and already handle 1, 2,and 3 byte
characters so adding 4 byte sequences was fairly simple. The files
UniConversion.cxx and UniConversion.h encapsulate most of the changes
although the function names exposed by UniConversion have changed with
"UCS2" replaced with "UTF16".

  Platform maintainers should check their implementations for code
that converts between UTF-8 and UCS-2. Since the UniConversion
function names have changed, this should be reasonably obvious. If the
name change makes maintenance of other platforms difficult, I'll swap
back to the old names even though these are less accurate.

  Committed to CVS. Only tested on Windows XP so far.

  Neil
_______________________________________________
Scintilla-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scintilla-interest

Reply via email to