Re: UTF-8 Error Handling

Markus Scherer Fri, 28 Feb 2003 13:42:24 -0800

Yung-Fong Tang wrote:

Same thing for JIS x0208 (a TWO and only TWO bytes character set, not a variable length character set). If I am processing a ISO-2022-JP message and in the JIS x0208 mode and I got a 0x24 0xa8 I know the boundary of that problem is 16 bits, not 8 -bits nor 32 bits.


Not true. You don't know if
- a byte was dropped before or after 0x24
  -> the first sequence is only 1 byte
- a byte was corrupted to become 0xa8
  -> the sequence consists of two bytes
- a wild combination of multiple errors

With a single-unit encoding, you can always assume that an illegal unit is a one-unit error. With any multi-unit encoding, you can only guess.

markus

--
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Re: UTF-8 Error Handling

Reply via email to