Ian Hickson wrote:
> On Mon, 22 Dec 2008, Edward Z. Yang wrote:
>> "in the range 0x0000 to 0x0008, U+000B, U+000E to 0x001F, 0x007F to 
>> 0x009F, 0xD800 to 0xDFFF , 0xFDD0 to 0xFDDFin the range 0x0000 to 
>> 0x0008, U+000B, U+000E to 0x001F, 0x007F to 0x009F, 0xD800 to 0xDFFF, 
>> 0xFDD0 to 0xFDDF"
>>
>> U+000B is not a range.
> 
> While this is technically true, I don't really see a better way to phrase 
> this that isn't verbose (e.g. "ranges and codepoints" or some such).
> 
> If it helps, consider the whole set of subranges and code points to be a 
> single discontinuous range, hence the use of the singular "range". :-)

The spec made me double-take when I read it (since it fairly clearly
separates range from codepoints). Also, I messed up the copypaste while
quoting, so the text I cited is not actually what's there, it's:

> in the ranges U+0001 to U+0008,  U+000B,  U+000E to U+001F,  U+007F  to 
> U+009F, U+D800 to U+DFFF, U+FDD0 to U+FDDF, and characters U+FFFE...

It seems fairly clear to me that U+000B should moved to the list of
characters (at the cost of the nice ordering) or we should collapse
ranges/characters into one "range".

> On Tue, 23 Dec 2008, Edward Z. Yang wrote:
> You're still checking the next input character at that point, so "P" is 
> still the "next input character", so the next six are "PUBLIC".
> 
> At least, that's how I'm defending what the spec says. :-)

The spec is pretty unambiguous about this:

> The next input character is the first character in the input stream that has 
> not yet been consumed. Initially, the next input character is the first 
> character in the input.

and, at the beginning of the section:

> Consume the next input character:

So, the spec is wrong.

> In practice I think having the text be clear ("PUBLIC") is less confusing 
> than having it be pedantic ("P" and "UBLIC" or "this and the next five" or 
> some such). It's not like people are going to assume the spec is allowing 
> "XPUBLIC" or "*PUBLIC" and so forth, right?

I understand this consideration, and there's several ways we could go
about doing this. I think the easiest would be to un-consume a
character, and then perform the checks, and then reconsume the character.

As for people making this mistake... well, you're looking at one. :-)

Cheers,
Edward

(accidentally emailed only Ian; re-sending to WHATWG list)

Reply via email to