New paper on Unicode compression

Doug Ewell Wed, 31 Dec 2003 12:25:45 -0800

I'm pleased to announce the release of my new paper, "A survey of
Unicode compression":


http://users.adelphia.net/~dewell/compression.html

This 21-page paper is a moderately technical discussion of the various
ways in which Unicode text can be compressed for storage and
interchange.  Several different approaches are examined and evaluated.
Specific topics include:

* UTF-16, UTF-8, and 8-bit legacy character sets
* the Unicode "compression formats," SCSU and BOCU-1
* general-purpose compression algorithms (RLE, Huffman, LZW)
* using multiple compression techniques together
* using canonical equivalence to improve compression
* a detailed description of a SCSU encoder

Although it assumes a basic understanding of Unicode, certain terms
related to Unicode and information theory are explained.  No complicated
mathematical theory is included.  The paper is intended for anyone
interested in the details of Unicode compression, not just programmers,
although the sample SCSU encoder will probably be of interest only to
programmers.

It's available in HTML format, directly from the URL given above, or can
be downloaded in either Adobe PDF or Microsoft Word format (zipped or
unzipped).

Enjoy,

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

New paper on Unicode compression

Reply via email to