"Martin Duerst" <[EMAIL PROTECTED]> wrote: > Character-based compression schemes have been suggested by others. > But this is not necessary, you can take any generic data compression > method (e.g. zip,...) and it will compress very efficiently.
Or you can take SCSU-encoded data and apply the generic compression method to that, and it will probably compress even better. > The big advantage of using generic data compression is that > it's already available widely, and in some cases (e.g. modem dialup, > some operating systems, web browsers) it is already built in. > The main disadvantage is that it's not as efficient as specialized > methods for very short pieces of data. All of this is absolutely correct. But juuitchan had asked for "an extension of UTF-8" that would eliminate the specific redundancy of having all (or most) of the characters come from the same contiguous block. This is exactly what SCSU does best. Question: Are there really all-kana documents in the real world (other than children's books)? Or is this one of those exercises like writing an English-language novel without the letter E? -Doug Ewell Fullerton, California

