Ian Hickson wrote:
The way that IE and Firefox handle bytes with values greater than 0x7F when a file is labelled as being encoded as ASCII differs -- IE ignores the 8th bit, and only looks at the first seven bits, whereas Firefox treats bytes in the range 0x80 to 0xFF as being encoded as Windows-1252. This leads to security bugs, wherein the two browsers might treat the two strings differently (in particular, what looks like <script></script> to IE might look like something quite different to Firefox).

I believe the ASCII specification should have defined how to convert any random byte stream into characters, including bytes that aren't in the range 0-127. That it didn't means that every language that allows ASCII has to define how to handle it, which is an abstraction violation, and results in different specs having different rules. In many cases, the layers above ASCII didn't define this, and we've ended up with very real security problems, such as the example above.

Now in the case of ASCII doing this would be trivial -- e.g. just say that all bytes that aren't in the range 0x00 - 0x7F must be treated as 0x3F, and say that producers must not use bytes that aren't in the table. But yes, it should be in the ASCII spec.

Your assumption seems to be that there's a single "good" way to define this error handling. I disagree with that.

For instance, for XML, sending non-ASCII characters when the declared encoding is US-ASCII is a fatal error, and I definitively want to stay it that way.

BR, Julian




Reply via email to