Re: [whatwg] Parse errors for invalid characters
On Sat, 7 Sep 2013, Geoffrey Sneddon wrote: [...] this seems ... cubersome ... to implement in a conformance checker. Which reminds me, does # Conformance checkers must report at least one parse error # condition to the user if one or more parse error conditions exist # in the document and must not report parse error conditions if none # exist in the document. Conformance checkers may report more than # one parse error condition if more than one parse error condition # exists in the document. mean validator.nu and Firefox view source are non-conforming because they do nothing about document.write() ? I think we should exempt conformance checkers from scripts instead. They already are. From the Conformance classes section: Conformance checkers must check that the input document conforms when parsed without a browsing context (meaning that no scripts are run, and that the parser's scripting flag is disabled), and should also check that the input document conforms when parsed with a browsing context in which scripts execute, and that the scripts never cause non-conforming states to occur other than transiently during script execution itself. (This is only a SHOULD and not a MUST requirement because it has been proven to be impossible. [COMPUTABLE]) Right. (I feel like pedanting and pointing out this is untrue — it has not been proven impossible to do, it has been proven impossible to do in general. I'm not sure what the distinction is here. It wouldn't be that hard to design a conformance checker to check htmlscriptdocument.write(p)/script.) It wouldn't be very useful to have a conformance checker only check that literal string, and as soon as you start allowing more things, the complexity becomes astronomically high very quickly. But I'm all in favour of conformance checkers checking these things as much as possible. On the other hand, a JS console can reasonably report parse errors from script, so the parse errors are still worthwhile to have. Right. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parse errors for invalid characters
On Thu, 5 Sep 2013, Geoffrey Sneddon wrote: The phrasing content section states: Text nodes and attribute values must consist of Unicode characters, must not contain U+ characters, must not contain permanently undefined Unicode characters (noncharacters), and must not contain control characters other than space characters. And the pre-processing the input-stream section states: Any occurrences of any characters in the ranges U+0001 to U+0008, U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters U+000B, U+FFFE, U+, U+1FFFE, U+1, U+2FFFE, U+2, U+3FFFE, U+3, U+4FFFE, U+4, U+5FFFE, U+5, U+6FFFE, U+6, U+7FFFE, U+7, U+8FFFE, U+8, U+9FFFE, U+9, U+AFFFE, U+A, U+BFFFE, U+B, U+CFFFE, U+C, U+DFFFE, U+D, U+EFFFE, U+E, U+E, U+F, U+10FFFE, and U+10 are parse errors. These are all control characters or permanently undefined Unicode characters (noncharacters). Note the first uses Unicode characters, the second characters — the former excludes surrogates as a conformance requirement. Note that every disallowed non-surrogate character is a parse error. Therefore, it would make sense to make surrogates parse errors. Done. It should be noted that they can only occur in the input stream if they come from script (as they cannot be decoded from the input byte stream as the decoders will never emit a surrogate). Done. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parse errors for invalid characters
On 06/09/2013 04:05, Kang-Hao (Kenny) Lu wrote: (2013/09/06 6:08), Geoffrey Sneddon wrote: The phrasing content section states: Text nodes and attribute values must consist of Unicode characters, must not contain U+ characters, must not contain permanently undefined Unicode characters (noncharacters), and must not contain control characters other than space characters. This specification includes extra constraints on the exact value of Text nodes and attribute values depending on their precise context. And the pre-processing the input-stream section states: Any occurrences of any characters in the ranges U+0001 to U+0008, U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters U+000B, U+FFFE, U+, U+1FFFE, U+1, U+2FFFE, U+2, U+3FFFE, U+3, U+4FFFE, U+4, U+5FFFE, U+5, U+6FFFE, U+6, U+7FFFE, U+7, U+8FFFE, U+8, U+9FFFE, U+9, U+AFFFE, U+A, U+BFFFE, U+B, U+CFFFE, U+C, U+DFFFE, U+D, U+EFFFE, U+E, U+E, U+F, U+10FFFE, and U+10 are parse errors. These are all control characters or permanently undefined Unicode characters (noncharacters). Note the first uses Unicode characters, the second characters — the former excludes surrogates as a conformance requirement. Note that every disallowed non-surrogate character is a parse error. Except U+ or am I missing something? This is handled inline in the parser, as noted in the preprocessing section. It sometimes gets passed through as U+, sometimes gets changed to U+FFFD, sometimes gets ignored, but always creates a parser error. Therefore, it would make sense to make surrogates parse errors. It should be noted that they can only occur in the input stream if they come from script (as they cannot be decoded from the input byte stream as the decoders will never emit a surrogate). which means that this seems ... cubersome ... to implement in a conformance checker. Which reminds me, does # Conformance checkers must report at least one parse error # condition to the user if one or more parse error conditions exist # in the document and must not report parse error conditions if none # exist in the document. Conformance checkers may report more than # one parse error condition if more than one parse error condition # exists in the document. mean validator.nu and Firefox view source are non-conforming because they do nothing about document.write() ? I think we should exempt conformance checkers from scripts instead. They already are. From the Conformance classes section: Conformance checkers must check that the input document conforms when parsed without a browsing context (meaning that no scripts are run, and that the parser's scripting flag is disabled), and should also check that the input document conforms when parsed with a browsing context in which scripts execute, and that the scripts never cause non-conforming states to occur other than transiently during script execution itself. (This is only a SHOULD and not a MUST requirement because it has been proven to be impossible. [COMPUTABLE]) (I feel like pedanting and pointing out this is untrue — it has not been proven impossible to do, it has been proven impossible to do in general. It wouldn't be that hard to design a conformance checker to check htmlscriptdocument.write(p)/script.) On the other hand, a JS console can reasonably report parse errors from script, so the parse errors are still worthwhile to have. /Geoffrey.
Re: [whatwg] Parse errors for invalid characters
(2013/09/06 6:08), Geoffrey Sneddon wrote: The phrasing content section states: Text nodes and attribute values must consist of Unicode characters, must not contain U+ characters, must not contain permanently undefined Unicode characters (noncharacters), and must not contain control characters other than space characters. This specification includes extra constraints on the exact value of Text nodes and attribute values depending on their precise context. And the pre-processing the input-stream section states: Any occurrences of any characters in the ranges U+0001 to U+0008, U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters U+000B, U+FFFE, U+, U+1FFFE, U+1, U+2FFFE, U+2, U+3FFFE, U+3, U+4FFFE, U+4, U+5FFFE, U+5, U+6FFFE, U+6, U+7FFFE, U+7, U+8FFFE, U+8, U+9FFFE, U+9, U+AFFFE, U+A, U+BFFFE, U+B, U+CFFFE, U+C, U+DFFFE, U+D, U+EFFFE, U+E, U+E, U+F, U+10FFFE, and U+10 are parse errors. These are all control characters or permanently undefined Unicode characters (noncharacters). Note the first uses Unicode characters, the second characters — the former excludes surrogates as a conformance requirement. Note that every disallowed non-surrogate character is a parse error. Except U+ or am I missing something? Therefore, it would make sense to make surrogates parse errors. It should be noted that they can only occur in the input stream if they come from script (as they cannot be decoded from the input byte stream as the decoders will never emit a surrogate). which means that this seems ... cubersome ... to implement in a conformance checker. Which reminds me, does # Conformance checkers must report at least one parse error # condition to the user if one or more parse error conditions exist # in the document and must not report parse error conditions if none # exist in the document. Conformance checkers may report more than # one parse error condition if more than one parse error condition # exists in the document. mean validator.nu and Firefox view source are non-conforming because they do nothing about document.write() ? I think we should exempt conformance checkers from scripts instead. Cheers, Kenny -- Web Specialist, Opera Sphinx Game Force, Oupeng Browser, Beijing Try Oupeng: http://www.oupeng.com/