I'm pretty sure it depends on whether you regard a text document as a sequence of characters, or as a sequence of glyphs. (Er - I mean "default grapheme clusters" of course). Regarded as a sequence of characters, normalisation changes that sequence. But regarded as a sequence of glyphs, normalisation leaves the sequence unchanged. So a compression algorithm could legitimately claim to be "lossless" if it did normalisation but operated at the glyph level.
I'm pretty sure you DON'T need to preserve the byte-stream bit for bit. For example, at the byte level, I see no reason to preserve invalid encoding sequences, and at the codepoint level I see no reason to preserve non-character codepoints. So - at the glyph level - we only need to preserve glyphs, no? It all depends on how the compression algorithm describes itself.
I think this might go wrong for "tailored grapheme clusters", but I don't know much about them.
Jill
- Re: Compression through normalization Arcane Jill
- Re: Compression through normalization Doug Ewell
- Re: Compression through normalization Mark Davis
- RE: Compression through normalization Philippe Verdy
- RE: Compression through normalization Philippe Verdy
- Re: Compression through normalization Doug Ewell
- RE: Compression through normalization Philippe Verdy
- Re: Compression through normalization Doug Ewell
- Re: Compression through normaliza... Peter Kirk
- RE: Compression through norma... Philippe Verdy
- Re: Compression through norma... Doug Ewell

