IMO, encodings, particularly ones depending on state such as this, may have
multiple ways to output the same, or similar, sequences. When means that
pretty much any time an encoding transforms data any previous security or other
validation style checks are no longer valid and any
In terms of deployed ISO-2022-JP encoders which don't follow WHATWG behaviour,
here's Python's (apparently contributed to Python by one Hye-Shik Chang):
>>> "a¥bc~¥d".encode("iso-2022-jp")
b'a\x1b(J\\\x1b(Bbc~\x1b(J\\\x1b(Bd'
This is so far as I can tell valid per the RFC (and of course ECMA-35
Sorry about the delay. There is now
https://www.unicode.org/L2/L2020/20202-empty-iso-2022-jp.pdf
On Mon, Dec 10, 2018 at 1:14 PM Mark Davis ☕️ wrote:
>
> I tend to agree with your analysis that emitting U+FFFD when there is no
> content between escapes in "shifting" encodings like ISO-2022-JP
3 matches
Mail list logo