RE: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences

2020-08-17 Thread Shawn Steele via Unicode
IMO, encodings, particularly ones depending on state such as this, may have multiple ways to output the same, or similar, sequences. When means that pretty much any time an encoding transforms data any previous security or other validation style checks are no longer valid and any

Re: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences

2020-08-17 Thread Harriet Riddle via Unicode
In terms of deployed ISO-2022-JP encoders which don't follow WHATWG behaviour, here's Python's (apparently contributed to Python by one Hye-Shik Chang): >>> "a¥bc~¥d".encode("iso-2022-jp") b'a\x1b(J\\\x1b(Bbc~\x1b(J\\\x1b(Bd' This is so far as I can tell valid per the RFC (and of course ECMA-35

Re: Generating U+FFFD when there's no content between ISO-2022-JP escape sequences

2020-08-17 Thread Henri Sivonen via Unicode
Sorry about the delay. There is now https://www.unicode.org/L2/L2020/20202-empty-iso-2022-jp.pdf On Mon, Dec 10, 2018 at 1:14 PM Mark Davis ☕️ wrote: > > I tend to agree with your analysis that emitting U+FFFD when there is no > content between escapes in "shifting" encodings like ISO-2022-JP