On Wed, 18 Apr 2012 15:40:33 +0200, Glenn Maynard <[email protected]> wrote:
"This is a decoder error" seems odd; it's descriptive language ("this
thing must be made true") rather than declarative ("do this thing").
I'd suggest the declarative language "Emit a decoder error" and "Emit an
encoder error".
Yes. Awesome suggestion implemented.
"If code point is equal or greater than lower boundary" is more naturally
"greater than or equal to" (and "less than or equal to"). That said,
this would be much clearer with interval syntax:
"If code point is in the range [*lower boundary*, 0x10FFFF] and is not in
the range [0xD800, 0xDFFF], emit code point (and continue)."
which I think is easier to read, and also makes it clear that the "0xD800
to 0xDFFF" is a closed interval (0xD800 and 0xDFFF are included).
Then we'd first have to introduce interval syntax to the English language.
We could do that I suppose in the Terminology section if you think it
would be better.
An encoder contains one or more encoder error points. Unless stated
otherwise the encoder is terminated at that point.
Encoding form data, at least, doesn't abort on the first error; any
unrepresentable codepoints are encoded as as &x1234;. (It would sure be
nice if encoding to non-Unicode-based encodings would just *always* use
that syntax for non-ASCII, so the encoders could be dropped, but I guess
that would trigger bugs in pages that are currently masked...) Is there
any encoding path in browsers that does give up on the first error?
It has been proposed for the API.
And in URLs you do not get "&#...;" (though in WebKit you do) but you get
"?" (IE at the network layer, Opera earlier on) or the utf-8
representation (Gecko is totally weird).
Maybe we should align URLs with <form> here and use "&#...;" throughout if
that is compatible with content. Probably deserves a a discussion in its
own thread.
I do not know any cases beyond URLs, <form>, and the proposed API that
require an encoder in the platform.
--
Anne van Kesteren
http://annevankesteren.nl/