Erik van der Poel wrote:
> Frank da Cruz wrote:
> > The irony is, when using ISO 2022 character-set designation and invocation,
> > you have to handle the escape sequences first to know if you're in UTF-8.
> > Therefore, this pushes the burden onto the end-user to preconfigure their
> > emulator for UTF-8 if that is what is being used, when ideally this should
> > happen automatically and transparently.
> 
> I may be misunderstanding the above, but ISO 2022 says:
> 
>   ESC 2/5 F shall mean that the other coding system uses
>   ESC 2/5 4/0 to return;
> 
>   ESC 2/5 2/15 F shall mean that the other coding system
>   does not use ESC 2/5 4/0 to return (it may have an alternative
>   means to return or none at all).
> 
> Registration number 196 is for UTF-8 without implementation level, and
> its escape sequence is ESC 2/5 4/7. I believe that ISO 2022 was designed
> that way so that a decoder that does not know UTF-8 (or any other coding
> system invoked by ESC 2/5 F) could simply "skip" the octets in that
> encoding until it gets to the octets ESC 2/5 4/0.
> 
> This means that it does not need to decode UTF-8 just to find the escape
> sequence ESC 2/5 4/0. UTF-8 does not do anything special with characters
> below U+0080 anyway (they're just single-byte ASCII), so it works, no?
> 
Yes, but I was thinking more about the ISO 2022 invocation features than the
designation ones:  LS2, LS3, LS1R, LS2R, LS3R, SS2, and SS3 are C1 controls.
The situation *could* arise where these would be used prior to announcing
(or switching to) UTF-8.  In this case, the end-user would have to configure
the software in advance to know whether the incoming byte stream is UTF-8.

Not a big deal; just an illustration of what happens when we can't use the
normal layering.

- Frank

Reply via email to