On 25/11/2003 16:38, Doug Ewell wrote:
Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
So SCSU and BOCU-* formats are NOT general purpose compressors. As
they are defined only in terms of stream of Unicode code points, they
are assumed to follow the conformance clauses of Unicode. As they
recognize their input as Unicode text, they can recognize canonical
equivalence, and thus this creates an opportunity for them to consider
if a (de)normalization or de/re-composition would result in higher
compression (interestingly, the composition exclusion could be
reconsidered in the case of BOCU-1 and SCSU compressed streams,
provided that the decompression to code points will redecompose the
excluded compositions).
I have to say, if there's a flaw in Philippe's logic here, I don't see
it. Anyone?
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
Yes, the compressor can make any canonically equivalent change, not just
composing composition exclusions but reordering combining marks in
different classes. The only flaw I see is that the compressor does not
have to undo these changes on decompression; at least no other process
is allowed to rely on it having done so.
--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/