*sigh* I was just completely wrong in my last post and I think we need to take away
my CVS write access for a week or so as punishment. ;)
Previously I'd stated that there's only one way SCSU encoding for a give string. It
turns out (now that I've read the SCSU spec) you can do varying degrees of SCSU
encoding so there are really many possible encodings. SCSU encoding/decoding require
a really complicated algorithm.
I've also been looking at a Unicode compression algorithm from IBM called BOCA that
DOES only have 1 compressed form per string and has a much simpler algorithm.
I'm really not even sure we can get worthwhile space savings from SCSU or BOCA plus
ZIP/LZSS so I'm going to retract this from the task list until I can get SCSU & BOCA
encoders implemented in Perl to do some tests.
--Chris