Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

> So SCSU and BOCU-* formats are NOT general purpose compressors. As
> they are defined only in terms of stream of Unicode code points, they
> are assumed to follow the conformance clauses of Unicode. As they
> recognize their input as Unicode text, they can recognize canonical
> equivalence, and thus this creates an opportunity for them to consider
> if a (de)normalization or de/re-composition would result in higher
> compression (interestingly, the composition exclusion could be
> reconsidered in the case of BOCU-1 and SCSU compressed streams,
> provided that the decompression to code points will redecompose the
> excluded compositions).

I have to say, if there's a flaw in Philippe's logic here, I don't see
it.  Anyone?

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/


Reply via email to