Ian Hickson wrote: > On Mon, 22 Dec 2008, Edward Z. Yang wrote: >> "in the range 0x0000 to 0x0008, U+000B, U+000E to 0x001F, 0x007F to >> 0x009F, 0xD800 to 0xDFFF , 0xFDD0 to 0xFDDFin the range 0x0000 to >> 0x0008, U+000B, U+000E to 0x001F, 0x007F to 0x009F, 0xD800 to 0xDFFF, >> 0xFDD0 to 0xFDDF" >> >> U+000B is not a range. > > While this is technically true, I don't really see a better way to phrase > this that isn't verbose (e.g. "ranges and codepoints" or some such). > > If it helps, consider the whole set of subranges and code points to be a > single discontinuous range, hence the use of the singular "range". :-)
The spec made me double-take when I read it (since it fairly clearly separates range from codepoints). Also, I messed up the copypaste while quoting, so the text I cited is not actually what's there, it's: > in the ranges U+0001 to U+0008, U+000B, U+000E to U+001F, U+007F to > U+009F, U+D800 to U+DFFF, U+FDD0 to U+FDDF, and characters U+FFFE... It seems fairly clear to me that U+000B should moved to the list of characters (at the cost of the nice ordering) or we should collapse ranges/characters into one "range". > On Tue, 23 Dec 2008, Edward Z. Yang wrote: > You're still checking the next input character at that point, so "P" is > still the "next input character", so the next six are "PUBLIC". > > At least, that's how I'm defending what the spec says. :-) The spec is pretty unambiguous about this: > The next input character is the first character in the input stream that has > not yet been consumed. Initially, the next input character is the first > character in the input. and, at the beginning of the section: > Consume the next input character: So, the spec is wrong. > In practice I think having the text be clear ("PUBLIC") is less confusing > than having it be pedantic ("P" and "UBLIC" or "this and the next five" or > some such). It's not like people are going to assume the spec is allowing > "XPUBLIC" or "*PUBLIC" and so forth, right? I understand this consideration, and there's several ways we could go about doing this. I think the easiest would be to un-consume a character, and then perform the checks, and then reconsume the character. As for people making this mistake... well, you're looking at one. :-) Cheers, Edward (accidentally emailed only Ian; re-sending to WHATWG list)