Kenneth Whistler wrote:
That is not a good comparision. ASCII is a single byte character code standard. And when I got a 0x80 in ASCII string, I know where is the boundary- the boundary is the whole 8-bits of that 0x80 is bad. The scope is not the first 3 bits nor 9 bits- but the 8 bits data. I cannot tell the rest of the data is good or bad, but I know ASCII is only 8-bits and 8 bits only.Think of it this way. Does anyone expect the ASCII standard to tell, in detail, what a process should or should not do if it receives data which purports to be ASCII, but which contains an 0x80 byte in it? All the ASCII standard can really do is tell you that 0x80 is not defined in ASCII, and a conformant process shall not interpret 0x80 as an ASCII character. Beyond that, it is up to the software engineers to figure out who goofed up in mislabelling or corrupting the data, and what the process receiving the bad data should do about it.
Same thing for JIS x0208 (a TWO and only TWO bytes character set, not a variable length character set). If I am processing a ISO-2022-JP message and in the JIS x0208 mode and I got a 0x24 0xa8 I know the boundary of that problem is 16 bits, not 8 -bits nor 32 bits.
When you deal with encoding which need states (ISO-2022, ISO-2022-JP, etc) or variable length encoding (Shift_JIS, Big5, UTF-8), then the situration is different.

