Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
Yehuda, Can you help clarify here whether jQuery's behavior is intentional (i.e. use cases drive the need for executability), or if it's a side-effect of the implementation? On Fri, Jun 1, 2012 at 1:27 PM, Ryosuke Niwa rn...@webkit.org wrote: On Thu, May 31, 2012 at 7:55 AM, Henri Sivonen hsivo...@iki.fi wrote: On Sat, May 19, 2012 at 6:29 AM, Ryosuke Niwa rn...@webkit.org wrote: There appears to be a consensus to use document.parse (which is fine with me), so I would like to double-check which behavior we're picking. IMO, the only sane choice is to unset the already-started flag since doing otherwise implies script elements parsed by document.parse won't be executed when inserted into a document. I was expecting document.parse() to make scripts unexecutable. Are there use cases for creating executable scripts using this facility? jQuery appears to let script elements run: http://jsfiddle.net/kB8Fp/2/ Also, we're talking about using the same algorithm for template element. I would like script elements inside my template to run. - Ryosuke
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Jun 8, 2012, at 11:03 AM, Rafael Weinstein rafa...@google.com wrote: Yehuda, Can you help clarify here whether jQuery's behavior is intentional (i.e. use cases drive the need for executability), or if it's a side-effect of the implementation? I can't speak for jQuery, but in Prototype.js, this behaviour is intentional. --tobie
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Wed, Jun 6, 2012 at 6:49 PM, Tab Atkins Jr. jackalm...@gmail.com wrote: Flip-flopping is irrelevant. It's irrelevant in the sense of flip-flopping being bad in and of itself. Changing one's mind is okay and it's good to acknowledge past mistakes. However, in many cases if one's mind is changed after interoperable implementations have been made available to Web authors, it may be worse to try to fix past mistakes instead of just acknowledging them as mistakes and trying to not make more mistakes like that in the future. Only what is good for authors is. I believe in this case not changing the way SVG script content tokenizes would be best for authors. If deployed content would break as a result of a change, we either find a new way to accommodate the desired change, or drop it. But we need the compat data about that breakage before we can claim that it will occur. Support Existing Content is not the only relevant Design Principle here. Degrade Gracefully is also relevant. I believe it would be bad for authors if SVG-in-HTML content tokenized subtly differently in the long tail of old (old when viewed from the future) browsers which is likely to include IE9 and, at this rate, IE10 and also a mix of versions of the Android stock browser for a long time. (Also possibly various left-over versions of Firefox, Safari and Opera.) Past data suggests that IE is updated slowly and that the Android stock browser typically doesn't get updated at all (until it is abandoned by switching to another browser or by switching to another device). Arguments about it being okay to violate the Degrade Gracefully principle because the future is longer than the past (so it's always worthwhile to make things better for the future) would apply to pretty much all breaking changes to the Web platform and have the same problems this time as when the argument is applied to other breaking changes to the Web platform. The SVGWG would like to make things as good for authors as possible. Past positions don't matter, except insofar as the history of their effects on the specs persists. The reason why I care about correcting recounts of past SVG working group opinion on this topic is that I think it's better for learning from mistakes if the learning is based on the truth of what happened. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Thu, 7 Jun 2012, Henri Sivonen wrote: I believe in this case not changing the way SVG script content tokenizes would be best for authors. For what it's worth, I agree with Henri here. In my experience, spec churn is the number two way of making a spec fail. I think it's better to have something that works consistently everywhere than to have things work different across different browsers and even different versions of the same browser. That's the effect of spec churn. It also has the effect of putting test suites in unclear states, which is especially bad for test suites that have been copied into browser vendors' development environments (especially if they don't realise the spec has changed), and more subtly it has the effect of making developers more reluctant to be first adopters, since they start feeling first adopters have to pay a higher cost, and it makes authors feel like the specs aren't really worth anything because they keep changing. Plus, of course, there's the opportunity cost: making a minor improvement means we're spending lots of resources (speccing, implementating, testing, documenting, advocating) that we could instead be spending on making something else a _lot_ better. (The number one way of making a spec fail is to ignore backwards compatibility, of course. Which in a way is the same thing, just on a larger scale.) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Fri, Jun 1, 2012 at 10:25 AM, Jonas Sicking jo...@sicking.cc wrote: I think the SVG working group should learn to stand by its past mistakes. Not standing by them in the sense of thinking the past mistakes are great but in the sense of not causing further disturbances by flip-flopping. For what it's worth, I've not seen any flip-floppying on this. Over the years that I've asked the SVG WG the detailed question on if they prefer to have the parsing model for scripts in SVG-in-HTML I've consistently gotten the answer that they prefer this. At the time when SVG parsing was being added to text/html, vocal members of the SVG working group were adamant that parsing should work the same as for XML so that output from existing tools that had XML serializers could be copied and pasted into text/html in a text editor. Suggestions went as far as insisting a full XML parser be embedded inside the HTML parser. For [citation needed], see e.g. Requirement 1 in http://lists.w3.org/Archives/Public/public-html/2009Mar/0216.html (not the only place where the requirement was expressed but the first one I found when searching the archives) and requirements 1 and 2 as well as the first sentence under Summary in http://dev.w3.org/SVG/proposals/svg-html/svg-html-proposal.html . I'm also not sure how this is at all relevant here given that we should do what's best for authors, even when we learn over time what's best for authors. At this point, what's best for authors includes considerations of consistent behavior across already-deployed browsers (including IE9, soon IE10 and the Android stock browser) and future browsers. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Wed, Jun 6, 2012 at 3:42 AM, Henri Sivonen hsivo...@iki.fi wrote: On Fri, Jun 1, 2012 at 10:25 AM, Jonas Sicking jo...@sicking.cc wrote: I think the SVG working group should learn to stand by its past mistakes. Not standing by them in the sense of thinking the past mistakes are great but in the sense of not causing further disturbances by flip-flopping. For what it's worth, I've not seen any flip-floppying on this. Over the years that I've asked the SVG WG the detailed question on if they prefer to have the parsing model for scripts in SVG-in-HTML I've consistently gotten the answer that they prefer this. At the time when SVG parsing was being added to text/html, vocal members of the SVG working group were adamant that parsing should work the same as for XML so that output from existing tools that had XML serializers could be copied and pasted into text/html in a text editor. Suggestions went as far as insisting a full XML parser be embedded inside the HTML parser. For [citation needed], see e.g. Requirement 1 in http://lists.w3.org/Archives/Public/public-html/2009Mar/0216.html (not the only place where the requirement was expressed but the first one I found when searching the archives) and requirements 1 and 2 as well as the first sentence under Summary in http://dev.w3.org/SVG/proposals/svg-html/svg-html-proposal.html . I'm also not sure how this is at all relevant here given that we should do what's best for authors, even when we learn over time what's best for authors. At this point, what's best for authors includes considerations of consistent behavior across already-deployed browsers (including IE9, soon IE10 and the Android stock browser) and future browsers. Considering compatible behavior is indeed part of what's best for authors, but we shouldn't extend that to blanket denials of the possibility of change. Flip-flopping is irrelevant. Only what is good for authors is. If deployed content would break as a result of a change, we either find a new way to accommodate the desired change, or drop it. But we need the compat data about that breakage before we can claim that it will occur. The SVGWG would like to make things as good for authors as possible. Past positions don't matter, except insofar as the history of their effects on the specs persists. Compat breakage is painful, but so is manifestly hard-to-use incompatibilities between HTML and SVG. Let's fix those as much as we can. ~TJ
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Wed, Jun 6, 2012 at 3:42 AM, Henri Sivonen hsivo...@iki.fi wrote: On Fri, Jun 1, 2012 at 10:25 AM, Jonas Sicking jo...@sicking.cc wrote: I think the SVG working group should learn to stand by its past mistakes. Not standing by them in the sense of thinking the past mistakes are great but in the sense of not causing further disturbances by flip-flopping. For what it's worth, I've not seen any flip-floppying on this. Over the years that I've asked the SVG WG the detailed question on if they prefer to have the parsing model for scripts in SVG-in-HTML I've consistently gotten the answer that they prefer this. At the time when SVG parsing was being added to text/html, vocal members of the SVG working group were adamant that parsing should work the same as for XML so that output from existing tools that had XML serializers could be copied and pasted into text/html in a text editor. Suggestions went as far as insisting a full XML parser be embedded inside the HTML parser. For [citation needed], see e.g. Requirement 1 in http://lists.w3.org/Archives/Public/public-html/2009Mar/0216.html (not the only place where the requirement was expressed but the first one I found when searching the archives) and requirements 1 and 2 as well as the first sentence under Summary in http://dev.w3.org/SVG/proposals/svg-html/svg-html-proposal.html . Indeed. But every time I asked specifically about the parsing of script issue, I got the answer that it was more important that script-markup could be moved between HTML and SVG-in-HTML. I'm also not sure how this is at all relevant here given that we should do what's best for authors, even when we learn over time what's best for authors. At this point, what's best for authors includes considerations of consistent behavior across already-deployed browsers (including IE9, soon IE10 and the Android stock browser) and future browsers. I think that's a matter of opinion. / Jonas
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Thu, May 31, 2012 at 7:55 AM, Henri Sivonen hsivo...@iki.fi wrote: On Sat, May 19, 2012 at 6:29 AM, Ryosuke Niwa rn...@webkit.org wrote: There appears to be a consensus to use document.parse (which is fine with me), so I would like to double-check which behavior we're picking. IMO, the only sane choice is to unset the already-started flag since doing otherwise implies script elements parsed by document.parse won't be executed when inserted into a document. I was expecting document.parse() to make scripts unexecutable. Are there use cases for creating executable scripts using this facility? jQuery appears to let script elements run: http://jsfiddle.net/kB8Fp/2/ Also, we're talking about using the same algorithm for template element. I would like script elements inside my template to run. - Ryosuke
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
This seems sensible. I've updated the WebKit patch to do exactly this: https://bugs.webkit.org/show_bug.cgi?id=84646 It appears that the details of the proposal are now sorted out. I'll start a new thread describing the full API semantics. On Fri, May 18, 2012 at 8:29 PM, Ryosuke Niwa rn...@webkit.org wrote: Not that I want to start another bike-shedding, there is one clear distinction between innerHTML and createDocumentFragment, which is that innerHTML sets already-started flag on parsed script elements but createDocumentFragment does not (or rather it unsets it after the fragment parsing algorithm has ran). See http://html5.org/specs/dom-parsing.html#dom-range-createcontextualfragment There appears to be a consensus to use document.parse (which is fine with me), so I would like to double-check which behavior we're picking. IMO, the only sane choice is to unset the already-started flag since doing otherwise implies script elements parsed by document.parse won't be executed when inserted into a document. While we can change the behavior for template elements, I would rather have the same behavior between all 3 APIs (createDocumentFragment, parse, and template element) and let innerHTML be the outlier for legacy reasons. (Note: I intend to fix the bug in WebKit that already-started flag isn't unmarked in createDocumentFragment). - Ryosuke
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
Not that I want to start another bike-shedding, there is one clear distinction between innerHTML and createDocumentFragment, which is that innerHTML sets already-started flag on parsed script elements but createDocumentFragment does not (or rather it unsets it after the fragment parsing algorithm has ran). See http://html5.org/specs/dom-parsing.html#dom-range-createcontextualfragment There appears to be a consensus to use document.parse (which is fine with me), so I would like to double-check which behavior we're picking. IMO, the only sane choice is to unset the already-started flag since doing otherwise implies script elements parsed by document.parse won't be executed when inserted into a document. While we can change the behavior for template elements, I would rather have the same behavior between all 3 APIs (createDocumentFragment, parse, and template element) and let innerHTML be the outlier for legacy reasons. (Note: I intend to fix the bug in WebKit that already-started flag isn't unmarked in createDocumentFragment). - Ryosuke
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Wed, May 16, 2012 at 4:52 PM, Rafael Weinstein rafa...@google.com wrote: On Wed, May 16, 2012 at 4:49 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, May 16, 2012 at 4:29 PM, Rafael Weinstein rafa...@google.com wrote: Ok. I think I'm convinced on all points. I've uploaded a webkit patch which implements what we've agreed on here: https://bugs.webkit.org/show_bug.cgi?id=84646 I'm happy to report that this patch is nicer than the queued-token approach. Good call, Henri. On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz wyc...@gmail.com wrote: Yehuda Katz (ph) 718.877.1325 On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote: On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote: Issue 1: How to handle tokens which precede the first start tag Options: a) Queue them, and then later run them through tree construction once the implied context element has been picked b) Create a new insertion like waiting for context element, which probably ignores end tags and doctype and inserts character tokens and comments. Once the implied context element is picked, reset the insertion mode appropriately, and procede normally. I prefer b). I like b as well. I assume it means that the waiting for context element insertion mode would keep scanning until the ambiguity was resolved, and then enter the appropriate insertion mode. Am I misunderstanding? I think what Yehuda is getting at here is that there are a handful of tags which are allowed to appear anywhere, so it doesn't make sense to resolve the ambiguity based on their identity. I talked with Tab about this, and happily, that set seems to be style, script, meta, link. Happily, because this means that the new ImpliedContext insertion mode can handle start tags as follows (code from the above patch) if (token.name() == styleTag || token.name() == scriptTag || token.name() == metaTag || token.name() == linkTag) { processStartTagForInHead(token); // process following the rules for the in head insertion mode return; } m_fragmentContext.setContextTag(getImpliedContextTag(token.name())); set the context element resetInsertionModeAppropriately(); reset the insertion mode appropriately processStartTag(token); // reprocess the token So if I understand things correctly, that would mean that: document.parse(parsed as textscriptparsed as script content/scripttrtdtable content/td/tr); would return a fragment like: #fragment #text parsed as text script #text parsed as script content tr td #text table content Note that I added an explicit test case for this: #data parse as textscriptparse as spanscript/span/scripttrtdtable content/td/tr #errors #document-fragment #document | parse as text | script | parse as spanscript/span | tr | td | table content Is this correct? The important part here is that the contents of the script element is parsed according to the rules which normally apply when parsing scripts? (That of course leaves the terrible situation that script parsing is vastly different in HTML and SVG, but that's a bad problem that already exists) Yes. Exactly. / Jonas
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
Ok. I think I'm convinced on all points. I've uploaded a webkit patch which implements what we've agreed on here: https://bugs.webkit.org/show_bug.cgi?id=84646 I'm happy to report that this patch is nicer than the queued-token approach. Good call, Henri. On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz wyc...@gmail.com wrote: Yehuda Katz (ph) 718.877.1325 On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote: On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote: Issue 1: How to handle tokens which precede the first start tag Options: a) Queue them, and then later run them through tree construction once the implied context element has been picked b) Create a new insertion like waiting for context element, which probably ignores end tags and doctype and inserts character tokens and comments. Once the implied context element is picked, reset the insertion mode appropriately, and procede normally. I prefer b). I like b as well. I assume it means that the waiting for context element insertion mode would keep scanning until the ambiguity was resolved, and then enter the appropriate insertion mode. Am I misunderstanding? I think what Yehuda is getting at here is that there are a handful of tags which are allowed to appear anywhere, so it doesn't make sense to resolve the ambiguity based on their identity. I talked with Tab about this, and happily, that set seems to be style, script, meta, link. Happily, because this means that the new ImpliedContext insertion mode can handle start tags as follows (code from the above patch) if (token.name() == styleTag || token.name() == scriptTag || token.name() == metaTag || token.name() == linkTag) { processStartTagForInHead(token); // process following the rules for the in head insertion mode return; } m_fragmentContext.setContextTag(getImpliedContextTag(token.name())); set the context element resetInsertionModeAppropriately(); reset the insertion mode appropriately processStartTag(token); // reprocess the token I'm assuming the use case for this stuff isn't that authors throw random stuff at the API and then insert the result somewhere. I expect authors to pass string literals or somewhat cooked string literals to the API knowing where they're going to insert the result but not telling the insertion point to the API as a matter of convenience. If you know you are planning to insert stuff as a child of tbody, don't start your string literal with stuff that would tokenize as characters! (Firefox currently does not have the capability to queue tokens. Speculative parsing in Firefox is not based on queuing tokens. See https://developer.mozilla.org/en/Gecko/HTML_parser_threading for the details.) Issue 2: How to infer a non-HTML implied context element Options: a) By tagName alone. When multiple namespaces match, prefer HTML, and then either SVG or MathML (possibly on a per-tagName basis) b) Also inspect attributes for tagNames which may be in multiple namespaces AFAICT, the case where this really matters (if my assumptions about use cases are right) is a. (Fragment parsing makes scripts useless anyway by setting their already started flag, authors probably shouldn't be adding styles by parsing style, both HTML and SVG font are considered harmful and cross-browser support Content MathML is far off in the horizon.) So I prefer a) possibly with a-specific elaborations if we can come up with some. Generic solutions seem to involve more complexity. For example, if we supported a generic attribute for forcing SVG interpretation, would it put us on a slippery slope to support it when it appears on tokens that aren't the first start tag token in a contextless fragment parse? Issue 3: What form does the API take a) Document.innerHTML b) document.parse() c) document.createDocumentFragment() I prefer b) because: * It doesn't involve creating the fragment as a separate step. * It doesn't need to be foolishly consistent with the HTML vs. XML design errors of innerHTML. * It's shorted than document.createDocumentFragment(). * Unlike innerHTML, it is a method, so we can add more arguments later (or right away) to refine its behavior. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Wed, May 16, 2012 at 4:29 PM, Rafael Weinstein rafa...@google.com wrote: Ok. I think I'm convinced on all points. I've uploaded a webkit patch which implements what we've agreed on here: https://bugs.webkit.org/show_bug.cgi?id=84646 I'm happy to report that this patch is nicer than the queued-token approach. Good call, Henri. On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz wyc...@gmail.com wrote: Yehuda Katz (ph) 718.877.1325 On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote: On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote: Issue 1: How to handle tokens which precede the first start tag Options: a) Queue them, and then later run them through tree construction once the implied context element has been picked b) Create a new insertion like waiting for context element, which probably ignores end tags and doctype and inserts character tokens and comments. Once the implied context element is picked, reset the insertion mode appropriately, and procede normally. I prefer b). I like b as well. I assume it means that the waiting for context element insertion mode would keep scanning until the ambiguity was resolved, and then enter the appropriate insertion mode. Am I misunderstanding? I think what Yehuda is getting at here is that there are a handful of tags which are allowed to appear anywhere, so it doesn't make sense to resolve the ambiguity based on their identity. I talked with Tab about this, and happily, that set seems to be style, script, meta, link. Happily, because this means that the new ImpliedContext insertion mode can handle start tags as follows (code from the above patch) if (token.name() == styleTag || token.name() == scriptTag || token.name() == metaTag || token.name() == linkTag) { processStartTagForInHead(token); // process following the rules for the in head insertion mode return; } m_fragmentContext.setContextTag(getImpliedContextTag(token.name())); set the context element resetInsertionModeAppropriately(); reset the insertion mode appropriately processStartTag(token); // reprocess the token So if I understand things correctly, that would mean that: document.parse(parsed as textscriptparsed as script content/scripttrtdtable content/td/tr); would return a fragment like: #fragment #text parsed as text script #text parsed as script content tr td #text table content Is this correct? The important part here is that the contents of the script element is parsed according to the rules which normally apply when parsing scripts? (That of course leaves the terrible situation that script parsing is vastly different in HTML and SVG, but that's a bad problem that already exists) / Jonas
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Wed, May 16, 2012 at 4:49 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, May 16, 2012 at 4:29 PM, Rafael Weinstein rafa...@google.com wrote: Ok. I think I'm convinced on all points. I've uploaded a webkit patch which implements what we've agreed on here: https://bugs.webkit.org/show_bug.cgi?id=84646 I'm happy to report that this patch is nicer than the queued-token approach. Good call, Henri. On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz wyc...@gmail.com wrote: Yehuda Katz (ph) 718.877.1325 On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote: On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote: Issue 1: How to handle tokens which precede the first start tag Options: a) Queue them, and then later run them through tree construction once the implied context element has been picked b) Create a new insertion like waiting for context element, which probably ignores end tags and doctype and inserts character tokens and comments. Once the implied context element is picked, reset the insertion mode appropriately, and procede normally. I prefer b). I like b as well. I assume it means that the waiting for context element insertion mode would keep scanning until the ambiguity was resolved, and then enter the appropriate insertion mode. Am I misunderstanding? I think what Yehuda is getting at here is that there are a handful of tags which are allowed to appear anywhere, so it doesn't make sense to resolve the ambiguity based on their identity. I talked with Tab about this, and happily, that set seems to be style, script, meta, link. Happily, because this means that the new ImpliedContext insertion mode can handle start tags as follows (code from the above patch) if (token.name() == styleTag || token.name() == scriptTag || token.name() == metaTag || token.name() == linkTag) { processStartTagForInHead(token); // process following the rules for the in head insertion mode return; } m_fragmentContext.setContextTag(getImpliedContextTag(token.name())); set the context element resetInsertionModeAppropriately(); reset the insertion mode appropriately processStartTag(token); // reprocess the token So if I understand things correctly, that would mean that: document.parse(parsed as textscriptparsed as script content/scripttrtdtable content/td/tr); would return a fragment like: #fragment #text parsed as text script #text parsed as script content tr td #text table content Is this correct? The important part here is that the contents of the script element is parsed according to the rules which normally apply when parsing scripts? (That of course leaves the terrible situation that script parsing is vastly different in HTML and SVG, but that's a bad problem that already exists) Yes. Exactly. / Jonas
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Wed, May 16, 2012 at 4:52 PM, Rafael Weinstein rafa...@google.com wrote: On Wed, May 16, 2012 at 4:49 PM, Jonas Sicking jo...@sicking.cc wrote: On Wed, May 16, 2012 at 4:29 PM, Rafael Weinstein rafa...@google.com wrote: Ok. I think I'm convinced on all points. I've uploaded a webkit patch which implements what we've agreed on here: https://bugs.webkit.org/show_bug.cgi?id=84646 I'm happy to report that this patch is nicer than the queued-token approach. Good call, Henri. On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz wyc...@gmail.com wrote: Yehuda Katz (ph) 718.877.1325 On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote: On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote: Issue 1: How to handle tokens which precede the first start tag Options: a) Queue them, and then later run them through tree construction once the implied context element has been picked b) Create a new insertion like waiting for context element, which probably ignores end tags and doctype and inserts character tokens and comments. Once the implied context element is picked, reset the insertion mode appropriately, and procede normally. I prefer b). I like b as well. I assume it means that the waiting for context element insertion mode would keep scanning until the ambiguity was resolved, and then enter the appropriate insertion mode. Am I misunderstanding? I think what Yehuda is getting at here is that there are a handful of tags which are allowed to appear anywhere, so it doesn't make sense to resolve the ambiguity based on their identity. I talked with Tab about this, and happily, that set seems to be style, script, meta, link. Happily, because this means that the new ImpliedContext insertion mode can handle start tags as follows (code from the above patch) if (token.name() == styleTag || token.name() == scriptTag || token.name() == metaTag || token.name() == linkTag) { processStartTagForInHead(token); // process following the rules for the in head insertion mode return; } m_fragmentContext.setContextTag(getImpliedContextTag(token.name())); set the context element resetInsertionModeAppropriately(); reset the insertion mode appropriately processStartTag(token); // reprocess the token So if I understand things correctly, that would mean that: document.parse(parsed as textscriptparsed as script content/scripttrtdtable content/td/tr); would return a fragment like: #fragment #text parsed as text script #text parsed as script content tr td #text table content Is this correct? The important part here is that the contents of the script element is parsed according to the rules which normally apply when parsing scripts? (That of course leaves the terrible situation that script parsing is vastly different in HTML and SVG, but that's a bad problem that already exists) Yes. Exactly. That leaves the question of if the contents of the script should be parsed as a HTML script or an SVG script. The same question applies to style. Of course, ideally we would make the two parse the same way, but so far I've not been successful in convincing people here that that's a good idea. / Jonas
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote: Issue 1: How to handle tokens which precede the first start tag Options: a) Queue them, and then later run them through tree construction once the implied context element has been picked b) Create a new insertion like waiting for context element, which probably ignores end tags and doctype and inserts character tokens and comments. Once the implied context element is picked, reset the insertion mode appropriately, and procede normally. I prefer b). I'm assuming the use case for this stuff isn't that authors throw random stuff at the API and then insert the result somewhere. I expect authors to pass string literals or somewhat cooked string literals to the API knowing where they're going to insert the result but not telling the insertion point to the API as a matter of convenience. If you know you are planning to insert stuff as a child of tbody, don't start your string literal with stuff that would tokenize as characters! (Firefox currently does not have the capability to queue tokens. Speculative parsing in Firefox is not based on queuing tokens. See https://developer.mozilla.org/en/Gecko/HTML_parser_threading for the details.) Issue 2: How to infer a non-HTML implied context element Options: a) By tagName alone. When multiple namespaces match, prefer HTML, and then either SVG or MathML (possibly on a per-tagName basis) b) Also inspect attributes for tagNames which may be in multiple namespaces AFAICT, the case where this really matters (if my assumptions about use cases are right) is a. (Fragment parsing makes scripts useless anyway by setting their already started flag, authors probably shouldn't be adding styles by parsing style, both HTML and SVG font are considered harmful and cross-browser support Content MathML is far off in the horizon.) So I prefer a) possibly with a-specific elaborations if we can come up with some. Generic solutions seem to involve more complexity. For example, if we supported a generic attribute for forcing SVG interpretation, would it put us on a slippery slope to support it when it appears on tokens that aren't the first start tag token in a contextless fragment parse? Issue 3: What form does the API take a) Document.innerHTML b) document.parse() c) document.createDocumentFragment() I prefer b) because: * It doesn't involve creating the fragment as a separate step. * It doesn't need to be foolishly consistent with the HTML vs. XML design errors of innerHTML. * It's shorted than document.createDocumentFragment(). * Unlike innerHTML, it is a method, so we can add more arguments later (or right away) to refine its behavior. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
On Tue, May 15, 2012 at 12:46 PM, Henri Sivonen hsivo...@iki.fi wrote: On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote: Issue 1: How to handle tokens which precede the first start tag Options: a) Queue them, and then later run them through tree construction once the implied context element has been picked b) Create a new insertion like waiting for context element, which probably ignores end tags and doctype and inserts character tokens and comments. Once the implied context element is picked, reset the insertion mode appropriately, and procede normally. I prefer b). I'm assuming the use case for this stuff isn't that authors throw random stuff at the API and then insert the result somewhere. I expect authors to pass string literals or somewhat cooked string literals to the API knowing where they're going to insert the result but not telling the insertion point to the API as a matter of convenience. Exactly correct. That's the jQuery use-case exactly, which we're trying to solve. I'm totally fine with b) as well. Issue 2: How to infer a non-HTML implied context element Options: a) By tagName alone. When multiple namespaces match, prefer HTML, and then either SVG or MathML (possibly on a per-tagName basis) b) Also inspect attributes for tagNames which may be in multiple namespaces AFAICT, the case where this really matters (if my assumptions about use cases are right) is a. (Fragment parsing makes scripts useless anyway by setting their already started flag, authors probably shouldn't be adding styles by parsing style, both HTML and SVG font are considered harmful and cross-browser support Content MathML is far off in the horizon.) Yup, your assumptions are correct as far as I know. So I prefer a) possibly with a-specific elaborations if we can come up with some. Generic solutions seem to involve more complexity. For example, if we supported a generic attribute for forcing SVG interpretation, would it put us on a slippery slope to support it when it appears on tokens that aren't the first start tag token in a contextless fragment parse? That wouldn't make sense, though. If you see p foo a svgrect //a, you're already in an HTML context, and the svg stuff has to be wrapped in an svg to form the airtight namespace seal. But still, @svg is kinda a hacky solution. Shrug. Issue 3: What form does the API take a) Document.innerHTML b) document.parse() c) document.createDocumentFragment() I prefer b) because: * It doesn't involve creating the fragment as a separate step. * It doesn't need to be foolishly consistent with the HTML vs. XML design errors of innerHTML. * It's shorted than document.createDocumentFragment(). * Unlike innerHTML, it is a method, so we can add more arguments later (or right away) to refine its behavior. Possibly a second argument to force parsing in a particular way when it's ambiguous would be useful. ~TJ
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
Yehuda Katz (ph) 718.877.1325 On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote: On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote: Issue 1: How to handle tokens which precede the first start tag Options: a) Queue them, and then later run them through tree construction once the implied context element has been picked b) Create a new insertion like waiting for context element, which probably ignores end tags and doctype and inserts character tokens and comments. Once the implied context element is picked, reset the insertion mode appropriately, and procede normally. I prefer b). I like b as well. I assume it means that the waiting for context element insertion mode would keep scanning until the ambiguity was resolved, and then enter the appropriate insertion mode. Am I misunderstanding? I'm assuming the use case for this stuff isn't that authors throw random stuff at the API and then insert the result somewhere. I expect authors to pass string literals or somewhat cooked string literals to the API knowing where they're going to insert the result but not telling the insertion point to the API as a matter of convenience. If you know you are planning to insert stuff as a child of tbody, don't start your string literal with stuff that would tokenize as characters! (Firefox currently does not have the capability to queue tokens. Speculative parsing in Firefox is not based on queuing tokens. See https://developer.mozilla.org/en/Gecko/HTML_parser_threading for the details.) Issue 2: How to infer a non-HTML implied context element Options: a) By tagName alone. When multiple namespaces match, prefer HTML, and then either SVG or MathML (possibly on a per-tagName basis) b) Also inspect attributes for tagNames which may be in multiple namespaces AFAICT, the case where this really matters (if my assumptions about use cases are right) is a. (Fragment parsing makes scripts useless anyway by setting their already started flag, authors probably shouldn't be adding styles by parsing style, both HTML and SVG font are considered harmful and cross-browser support Content MathML is far off in the horizon.) So I prefer a) possibly with a-specific elaborations if we can come up with some. Generic solutions seem to involve more complexity. For example, if we supported a generic attribute for forcing SVG interpretation, would it put us on a slippery slope to support it when it appears on tokens that aren't the first start tag token in a contextless fragment parse? Issue 3: What form does the API take a) Document.innerHTML b) document.parse() c) document.createDocumentFragment() I prefer b) because: * It doesn't involve creating the fragment as a separate step. * It doesn't need to be foolishly consistent with the HTML vs. XML design errors of innerHTML. * It's shorted than document.createDocumentFragment(). * Unlike innerHTML, it is a method, so we can add more arguments later (or right away) to refine its behavior. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
Ok, So from the previous threads, there are appear to be three issues to resolve, and I'll list the options that I've noted. I'll follow up with my perspective of pros/cons and ask others to do the same. Please point out options or issues that I've missed. Issue 1: How to handle tokens which precede the first start tag Options: a) Queue them, and then later run them through tree construction once the implied context element has been picked b) Create a new insertion like waiting for context element, which probably ignores end tags and doctype and inserts character tokens and comments. Once the implied context element is picked, reset the insertion mode appropriately, and procede normally. --- Issue 2: How to infer a non-HTML implied context element Options: a) By tagName alone. When multiple namespaces match, prefer HTML, and then either SVG or MathML (possibly on a per-tagName basis) b) Also inspect attributes for tagNames which may be in multiple namespaces c) Allow for inline name spacing of elements which would normally inherit the proper namespace from svg or math d) Somewhat orthogonal, but later allow for template to have an optional context attribute (e.g. template context=svg), which explicitly picks a context element. e) Some combination of the above. --- Issue 3: What form does the API take a) Document.innerHTML b) document.parse() c) document.createDocumentFragment()
Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out
Ok, so I have some preferences, but they are *mild* preferences and any permutation of the options below is acceptable to me. On Fri, May 11, 2012 at 12:04 PM, Rafael Weinstein rafa...@google.com wrote: Ok, So from the previous threads, there are appear to be three issues to resolve, and I'll list the options that I've noted. I'll follow up with my perspective of pros/cons and ask others to do the same. Please point out options or issues that I've missed. Issue 1: How to handle tokens which precede the first start tag Options: a) Queue them, and then later run them through tree construction once the implied context element has been picked I like option (a) because you always get identical output for any input relative to if you had applied it via innerHTML to the appropriate implied context element. E.g. myHTMLElement.innerHTML = foobodybar; myDocumentFragment.innerHTML = foobodybar; myHTMLElement.innerHTML == myDocumentFragment.innerHTML; // true; Also, most browsers are already speculatively tokenizing ahead for resource preloading purposes, so the implementation complexity isn't especially daunting. b) Create a new insertion like waiting for context element, which probably ignores end tags and doctype and inserts character tokens and comments. Once the implied context element is picked, reset the insertion mode appropriately, and procede normally. --- Issue 2: How to infer a non-HTML implied context element Options: a) By tagName alone. When multiple namespaces match, prefer HTML, and then either SVG or MathML (possibly on a per-tagName basis) b) Also inspect attributes for tagNames which may be in multiple namespaces c) Allow for inline name spacing of elements which would normally inherit the proper namespace from svg or math d) Somewhat orthogonal, but later allow for template to have an optional context attribute (e.g. template context=svg), which explicitly picks a context element. e) Some combination of the above. I have a mild preference for not getting to fancy here. I'll mostly stay out of this and let those who know more about SVG and MathML sort it out, but I'll just note that the main concern cited for needing this to work is with web components, and there I think Hixie's idea of a declarative context element (e.g. option (d)) above is a nice addition and might alleviate the need to get fancy. --- Issue 3: What form does the API take a) Document.innerHTML b) document.parse() c) document.createDocumentFragment() I'm torn here between (a) (b). I like the familiarity of (a), but agree with Henri's point of about the namespace of the owner document being a downside. I don't like (c) for stylistic reasons.