Re: All-kana documents

Doug Ewell Tue, 05 Mar 2002 09:17:07 -0800

"Martin Duerst" <[EMAIL PROTECTED]> wrote:

> Character-based compression schemes have been suggested by others.
> But this is not necessary, you can take any generic data compression
> method (e.g. zip,...) and it will compress very efficiently.


Or you can take SCSU-encoded data and apply the generic compression
method to that, and it will probably compress even better.

> The big advantage of using generic data compression is that
> it's already available widely, and in some cases (e.g. modem dialup,
> some operating systems, web browsers) it is already built in.
> The main disadvantage is that it's not as efficient as specialized
> methods for very short pieces of data.

All of this is absolutely correct.  But juuitchan had asked for "an
extension of UTF-8" that would eliminate the specific redundancy of
having all (or most) of the characters come from the same contiguous
block.  This is exactly what SCSU does best.

Question:  Are there really all-kana documents in the real world (other
than children's books)?  Or is this one of those exercises like writing
an English-language novel without the letter E?

-Doug Ewell
 Fullerton, California

Re: All-kana documents

Reply via email to