Re: Unicode, SMS and year 2012

Doug Ewell Sat, 28 Apr 2012 11:48:37 -0700

<anbu at peoplestring dot com> wrote:

Document encoded in SCSU or BOCU-1, given that the document contains
only ASCII characters, may appear corrupt on a system that doesn't
recognise SCSU or BOCU-1.

This is the curious point of view that ASCII compatibility (ortransparency) is a bad thing. It does not apply to BOCU-1, which is notASCII-transparent.

Documents encoded in *any* format are likely to appear corrupt on asystem that doesn't recognize the encoding. They are guaranteed toappear corrupt if character boundaries do not align with byteboundaries, which is what you propose here.

01100001100101011001101110100101010110011000101010100101011101110101

If I'm going to use a variable-length, non-byte-aligned encoding, wherethere is no chance of realigning in case of a flipped or dropped bit(which seems to be of great concern to many people), I might as well goahead and use a Huffman or LZ type of encoding (or a combination, likeDEFLATE).

Is this the same encoding you were proposing a little over a year ago,or an outgrowth of the same ideas?


--
Doug Ewell | Thornton, Colorado, USA

http://www.ewellic.org | @DougEwell

Re: Unicode, SMS and year 2012

Reply via email to