On 3/3/2010 7:06 PM, Philip Taylor wrote:
On Wed, Mar 3, 2010 at 10:55 AM, Brett Zamir<[email protected]> wrote:
On 3/2/2010 6:54 PM, Ian Hickson wrote:
On Tue, 2 Mar 2010, Elliotte Rusty Harold wrote:
Briefly it seems that<? causes the parser to go into Bogus comment
state, which is fair enough. (I wouldn't really recommend that anyone
use processing instructions in HTML syntax anyway.) However the parser
comes out of that state at the first>. Because processing instructions
can contain> and terminate only at the two character sequence ?> this
could cause PI processing to terminate early and leave a lot more error
handling and a confused parser state in the text yet to come.
In HTML4, PIs ended at the first>, not at ?>. "<?target data>" is the
syntax of PIs when the SGML options used by HTML4 are applied.
In any case, the parser in HTML5 is based on what browsers do, which is
also to terminate at the first>. It's unlikely that we can change that,
given backwards-compatibility needs.
Are there really a lot of folks out there depending on old HTML4-style
processing instructions not being broken?
Yes, e.g. a load of pages like
http://www.forex.com.cn/html/2008-01/821561.htm (to pick one example
at random) say:
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
and don't have the string "?>" anywhere.
Ok, fair enough. But while it is great that HTML5 seeks to be
transitional and backwards compatible, HTML5 (thankfully) already breaks
compatibility for the sake of XML compatibility (e.g., localName or
getElementsByTagNameNS). It seems to me that there should still be a
role of eventually transitioning into something more full-featured in a
fundamental, language-neutral way (e.g., supporting a fuller subset of
XML's features such as external entities and yes, XML-style processing
instructions); extensible, including the ability to include XML from
other namespaces which may also encourage or rely on using their own XML
processing instructions, for those who wish to experiment or supplement
the HTML standard behavior; and more harmonious and compatible with a
simpler syntax (i.e., XML's)--even if the more complex syntax is more
prominent and continues to be supported indefinitely.
Brett