I'm pleased to announce the release of my new paper, "A survey of Unicode compression":
http://users.adelphia.net/~dewell/compression.html This 21-page paper is a moderately technical discussion of the various ways in which Unicode text can be compressed for storage and interchange. Several different approaches are examined and evaluated. Specific topics include: * UTF-16, UTF-8, and 8-bit legacy character sets * the Unicode "compression formats," SCSU and BOCU-1 * general-purpose compression algorithms (RLE, Huffman, LZW) * using multiple compression techniques together * using canonical equivalence to improve compression * a detailed description of a SCSU encoder Although it assumes a basic understanding of Unicode, certain terms related to Unicode and information theory are explained. No complicated mathematical theory is included. The paper is intended for anyone interested in the details of Unicode compression, not just programmers, although the sample SCSU encoder will probably be of interest only to programmers. It's available in HTML format, directly from the URL given above, or can be downloaded in either Adobe PDF or Microsoft Word format (zipped or unzipped). Enjoy, -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

