Ian Hickson wrote:
The way that IE and Firefox handle bytes with values greater than 0x7F
when a file is labelled as being encoded as ASCII differs -- IE ignores
the 8th bit, and only looks at the first seven bits, whereas Firefox
treats bytes in the range 0x80 to 0xFF as being encoded as Windows-1252.
This leads to security bugs, wherein the two browsers might treat the two
strings differently (in particular, what looks like <script></script> to
IE might look like something quite different to Firefox).
I believe the ASCII specification should have defined how to convert any
random byte stream into characters, including bytes that aren't in the
range 0-127. That it didn't means that every language that allows ASCII
has to define how to handle it, which is an abstraction violation, and
results in different specs having different rules. In many cases, the
layers above ASCII didn't define this, and we've ended up with very real
security problems, such as the example above.
Now in the case of ASCII doing this would be trivial -- e.g. just say that
all bytes that aren't in the range 0x00 - 0x7F must be treated as 0x3F,
and say that producers must not use bytes that aren't in the table. But
yes, it should be in the ASCII spec.
Your assumption seems to be that there's a single "good" way to define
this error handling. I disagree with that.
For instance, for XML, sending non-ASCII characters when the declared
encoding is US-ASCII is a fatal error, and I definitively want to stay
it that way.
BR, Julian