Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-06-08 Thread Rafael Weinstein
Yehuda,

Can you help clarify here whether jQuery's behavior is intentional
(i.e. use cases drive the need for executability), or if it's a
side-effect of the implementation?

On Fri, Jun 1, 2012 at 1:27 PM, Ryosuke Niwa rn...@webkit.org wrote:
 On Thu, May 31, 2012 at 7:55 AM, Henri Sivonen hsivo...@iki.fi wrote:

 On Sat, May 19, 2012 at 6:29 AM, Ryosuke Niwa rn...@webkit.org wrote:
  There appears to be a consensus to use document.parse (which is fine
  with
  me), so I would like to double-check which behavior we're picking. IMO,
  the
  only sane choice is to unset the already-started flag since doing
  otherwise
  implies script elements parsed by document.parse won't be executed when
  inserted into a document.

 I was expecting document.parse() to make scripts unexecutable. Are
 there use cases for creating executable scripts using this facility?


 jQuery appears to let script elements run: http://jsfiddle.net/kB8Fp/2/

 Also, we're talking about using the same algorithm for template element.
 I would like script elements inside my template to run.

 - Ryosuke




Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-06-08 Thread Tobie Langel
On Jun 8, 2012, at 11:03 AM, Rafael Weinstein rafa...@google.com wrote:

 Yehuda,
 
 Can you help clarify here whether jQuery's behavior is intentional
 (i.e. use cases drive the need for executability), or if it's a
 side-effect of the implementation?

I can't speak for jQuery, but in Prototype.js, this behaviour is intentional.

--tobie



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-06-07 Thread Henri Sivonen
On Wed, Jun 6, 2012 at 6:49 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 Flip-flopping is irrelevant.

It's irrelevant in the sense of flip-flopping being bad in and of
itself. Changing one's mind is okay and it's good to acknowledge past
mistakes. However, in many cases if one's mind is changed after
interoperable implementations have been made available to Web authors,
it may be worse to try to fix past mistakes instead of just
acknowledging them as mistakes and trying to not make more mistakes
like that in the future.

 Only what is good for authors is.

I believe in this case not changing the way SVG script content
tokenizes would be best for authors.

 If
 deployed content would break as a result of a change, we either find a
 new way to accommodate the desired change, or drop it.  But we need
 the compat data about that breakage before we can claim that it will
 occur.

Support Existing Content is not the only relevant Design Principle
here. Degrade Gracefully is also relevant. I believe it would be bad
for authors if SVG-in-HTML content tokenized subtly differently in the
long tail of old (old when viewed from the future) browsers which is
likely to include IE9 and, at this rate, IE10 and also a mix of
versions of the Android stock browser for a long time. (Also possibly
various left-over versions of Firefox, Safari and Opera.)

Past data suggests that IE is updated slowly and that the Android
stock browser typically doesn't get updated at all (until it is
abandoned by switching to another browser or by switching to another
device).

Arguments about it being okay to violate the Degrade Gracefully
principle because the future is longer than the past (so it's always
worthwhile to make things better for the future) would apply to
pretty much all breaking changes to the Web platform and have the same
problems this time as when the argument is applied to other breaking
changes to the Web platform.

 The SVGWG would like to make things as good for authors as possible.
 Past positions don't matter, except insofar as the history of their
 effects on the specs persists.

The reason why I care about correcting recounts of past SVG working
group opinion on this topic is that I think it's better for learning
from mistakes if the learning is based on the truth of what happened.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-06-07 Thread Ian Hickson
On Thu, 7 Jun 2012, Henri Sivonen wrote:
 
 I believe in this case not changing the way SVG script content tokenizes 
 would be best for authors.

For what it's worth, I agree with Henri here. In my experience, spec churn 
is the number two way of making a spec fail. I think it's better to have 
something that works consistently everywhere than to have things work 
different across different browsers and even different versions of the 
same browser. That's the effect of spec churn. It also has the effect of 
putting test suites in unclear states, which is especially bad for test 
suites that have been copied into browser vendors' development 
environments (especially if they don't realise the spec has changed), and 
more subtly it has the effect of making developers more reluctant to be 
first adopters, since they start feeling first adopters have to pay a 
higher cost, and it makes authors feel like the specs aren't really worth 
anything because they keep changing. Plus, of course, there's the 
opportunity cost: making a minor improvement means we're spending lots of 
resources (speccing, implementating, testing, documenting, advocating) 
that we could instead be spending on making something else a _lot_ better.

(The number one way of making a spec fail is to ignore backwards 
compatibility, of course. Which in a way is the same thing, just on a 
larger scale.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-06-06 Thread Henri Sivonen
On Fri, Jun 1, 2012 at 10:25 AM, Jonas Sicking jo...@sicking.cc wrote:
 I think the SVG working group should learn to stand by its past
 mistakes. Not standing by them in the sense of thinking the past
 mistakes are great but in the sense of not causing further
 disturbances by flip-flopping.

 For what it's worth, I've not seen any flip-floppying on this. Over
 the years that I've asked the SVG WG the detailed question on if they
 prefer to have the parsing model for scripts in SVG-in-HTML I've
 consistently gotten the answer that they prefer this.

At the time when SVG parsing was being added to text/html, vocal
members of the SVG working group were adamant that parsing should work
the same as for XML so that output from existing tools that had XML
serializers could be copied and pasted into text/html in a text
editor. Suggestions went as far as insisting a full XML parser be
embedded inside the HTML parser.

For [citation needed], see e.g. Requirement 1 in
http://lists.w3.org/Archives/Public/public-html/2009Mar/0216.html (not
the only place where the requirement was expressed but the first one I
found when searching the archives) and requirements 1 and 2 as well as
the first sentence under Summary in
http://dev.w3.org/SVG/proposals/svg-html/svg-html-proposal.html .

 I'm also not sure how this is at all relevant here given that we
 should do what's best for authors, even when we learn over time what's
 best for authors.

At this point, what's best for authors includes considerations of
consistent behavior across already-deployed browsers (including IE9,
soon IE10 and the Android stock browser) and future browsers.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-06-06 Thread Tab Atkins Jr.
On Wed, Jun 6, 2012 at 3:42 AM, Henri Sivonen hsivo...@iki.fi wrote:
 On Fri, Jun 1, 2012 at 10:25 AM, Jonas Sicking jo...@sicking.cc wrote:
 I think the SVG working group should learn to stand by its past
 mistakes. Not standing by them in the sense of thinking the past
 mistakes are great but in the sense of not causing further
 disturbances by flip-flopping.

 For what it's worth, I've not seen any flip-floppying on this. Over
 the years that I've asked the SVG WG the detailed question on if they
 prefer to have the parsing model for scripts in SVG-in-HTML I've
 consistently gotten the answer that they prefer this.

 At the time when SVG parsing was being added to text/html, vocal
 members of the SVG working group were adamant that parsing should work
 the same as for XML so that output from existing tools that had XML
 serializers could be copied and pasted into text/html in a text
 editor. Suggestions went as far as insisting a full XML parser be
 embedded inside the HTML parser.

 For [citation needed], see e.g. Requirement 1 in
 http://lists.w3.org/Archives/Public/public-html/2009Mar/0216.html (not
 the only place where the requirement was expressed but the first one I
 found when searching the archives) and requirements 1 and 2 as well as
 the first sentence under Summary in
 http://dev.w3.org/SVG/proposals/svg-html/svg-html-proposal.html .

 I'm also not sure how this is at all relevant here given that we
 should do what's best for authors, even when we learn over time what's
 best for authors.

 At this point, what's best for authors includes considerations of
 consistent behavior across already-deployed browsers (including IE9,
 soon IE10 and the Android stock browser) and future browsers.

Considering compatible behavior is indeed part of what's best for
authors, but we shouldn't extend that to blanket denials of the
possibility of change.

Flip-flopping is irrelevant.  Only what is good for authors is.  If
deployed content would break as a result of a change, we either find a
new way to accommodate the desired change, or drop it.  But we need
the compat data about that breakage before we can claim that it will
occur.

The SVGWG would like to make things as good for authors as possible.
Past positions don't matter, except insofar as the history of their
effects on the specs persists.  Compat breakage is painful, but so is
manifestly hard-to-use incompatibilities between HTML and SVG.  Let's
fix those as much as we can.

~TJ



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-06-06 Thread Jonas Sicking
On Wed, Jun 6, 2012 at 3:42 AM, Henri Sivonen hsivo...@iki.fi wrote:
 On Fri, Jun 1, 2012 at 10:25 AM, Jonas Sicking jo...@sicking.cc wrote:
 I think the SVG working group should learn to stand by its past
 mistakes. Not standing by them in the sense of thinking the past
 mistakes are great but in the sense of not causing further
 disturbances by flip-flopping.

 For what it's worth, I've not seen any flip-floppying on this. Over
 the years that I've asked the SVG WG the detailed question on if they
 prefer to have the parsing model for scripts in SVG-in-HTML I've
 consistently gotten the answer that they prefer this.

 At the time when SVG parsing was being added to text/html, vocal
 members of the SVG working group were adamant that parsing should work
 the same as for XML so that output from existing tools that had XML
 serializers could be copied and pasted into text/html in a text
 editor. Suggestions went as far as insisting a full XML parser be
 embedded inside the HTML parser.

 For [citation needed], see e.g. Requirement 1 in
 http://lists.w3.org/Archives/Public/public-html/2009Mar/0216.html (not
 the only place where the requirement was expressed but the first one I
 found when searching the archives) and requirements 1 and 2 as well as
 the first sentence under Summary in
 http://dev.w3.org/SVG/proposals/svg-html/svg-html-proposal.html .

Indeed. But every time I asked specifically about the parsing of
script issue, I got the answer that it was more important that
script-markup could be moved between HTML and SVG-in-HTML.

 I'm also not sure how this is at all relevant here given that we
 should do what's best for authors, even when we learn over time what's
 best for authors.

 At this point, what's best for authors includes considerations of
 consistent behavior across already-deployed browsers (including IE9,
 soon IE10 and the Android stock browser) and future browsers.

I think that's a matter of opinion.

/ Jonas



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-06-01 Thread Ryosuke Niwa
On Thu, May 31, 2012 at 7:55 AM, Henri Sivonen hsivo...@iki.fi wrote:

  On Sat, May 19, 2012 at 6:29 AM, Ryosuke Niwa rn...@webkit.org wrote:
  There appears to be a consensus to use document.parse (which is fine with
  me), so I would like to double-check which behavior we're picking. IMO,
 the
  only sane choice is to unset the already-started flag since doing
 otherwise
  implies script elements parsed by document.parse won't be executed when
  inserted into a document.

 I was expecting document.parse() to make scripts unexecutable. Are
 there use cases for creating executable scripts using this facility?


jQuery appears to let script elements run: http://jsfiddle.net/kB8Fp/2/

Also, we're talking about using the same algorithm for template element.
I would like script elements inside my template to run.

- Ryosuke


Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-24 Thread Rafael Weinstein
This seems sensible. I've updated the WebKit patch to do exactly this:

https://bugs.webkit.org/show_bug.cgi?id=84646

It appears that the details of the proposal are now sorted out. I'll
start a new thread describing the full API  semantics.

On Fri, May 18, 2012 at 8:29 PM, Ryosuke Niwa rn...@webkit.org wrote:
 Not that I want to start another bike-shedding, there is one clear
 distinction between innerHTML and createDocumentFragment, which is that
 innerHTML sets already-started flag on parsed script elements
 but createDocumentFragment does not (or rather it unsets it after the
 fragment parsing algorithm has ran).
 See http://html5.org/specs/dom-parsing.html#dom-range-createcontextualfragment

 There appears to be a consensus to use document.parse (which is fine with
 me), so I would like to double-check which behavior we're picking. IMO, the
 only sane choice is to unset the already-started flag since doing otherwise
 implies script elements parsed by document.parse won't be executed when
 inserted into a document.

 While we can change the behavior for template elements, I would rather have
 the same behavior between all 3 APIs (createDocumentFragment, parse, and
 template element) and let innerHTML be the outlier for legacy reasons.

 (Note: I intend to fix the bug in WebKit that already-started flag isn't
 unmarked in createDocumentFragment).

 - Ryosuke




Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-18 Thread Ryosuke Niwa
Not that I want to start another bike-shedding, there is one clear
distinction between innerHTML and createDocumentFragment, which is that
innerHTML sets already-started flag on parsed script elements
but createDocumentFragment does not (or rather it unsets it after the
fragment parsing algorithm has ran). See
http://html5.org/specs/dom-parsing.html#dom-range-createcontextualfragment

There appears to be a consensus to use document.parse (which is fine with
me), so I would like to double-check which behavior we're picking. IMO, the
only sane choice is to unset the already-started flag since doing otherwise
implies script elements parsed by document.parse won't be executed when
inserted into a document.

While we can change the behavior for template elements, I would rather have
the same behavior between all 3 APIs (createDocumentFragment, parse, and
template element) and let innerHTML be the outlier for legacy reasons.

(Note: I intend to fix the bug in WebKit that already-started flag isn't
unmarked in createDocumentFragment).

- Ryosuke


Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-17 Thread Rafael Weinstein
On Wed, May 16, 2012 at 4:52 PM, Rafael Weinstein rafa...@google.com wrote:
 On Wed, May 16, 2012 at 4:49 PM, Jonas Sicking jo...@sicking.cc wrote:
 On Wed, May 16, 2012 at 4:29 PM, Rafael Weinstein rafa...@google.com wrote:
 Ok. I think I'm convinced on all points.

 I've uploaded a webkit patch which implements what we've agreed on here:

 https://bugs.webkit.org/show_bug.cgi?id=84646

 I'm happy to report that this patch is nicer than the queued-token
 approach. Good call, Henri.

 On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz wyc...@gmail.com wrote:

 Yehuda Katz
 (ph) 718.877.1325


 On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote:

 On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com
 wrote:
  Issue 1: How to handle tokens which precede the first start tag
 
  Options:
  a) Queue them, and then later run them through tree construction once
  the implied context element has been picked
 
  b) Create a new insertion like waiting for context element, which
  probably ignores end tags and doctype and inserts character tokens and
  comments. Once the implied context element is picked, reset the
  insertion mode appropriately, and procede normally.

 I prefer b).


 I like b as well. I assume it means that the waiting for context element
 insertion mode would keep scanning until the ambiguity was resolved, and
 then enter the appropriate insertion mode. Am I misunderstanding?

 I think what Yehuda is getting at here is that there are a handful of
 tags which are allowed to appear anywhere, so it doesn't make sense to
 resolve the ambiguity based on their identity.

 I talked with Tab about this, and happily, that set seems to be
 style, script, meta,  link. Happily, because this means that
 the new ImpliedContext insertion mode can handle start tags as
 follows (code from the above patch)

 if (token.name() == styleTag
    || token.name() == scriptTag
    || token.name() == metaTag
    || token.name() == linkTag) {
    processStartTagForInHead(token); // process following the rules
 for the in head insertion mode
    return;
 }

 m_fragmentContext.setContextTag(getImpliedContextTag(token.name()));
 set the context element
 resetInsertionModeAppropriately(); reset the insertion mode appropriately
 processStartTag(token); // reprocess the token

 So if I understand things correctly, that would mean that:

 document.parse(parsed as textscriptparsed as script
 content/scripttrtdtable content/td/tr);

 would return a fragment like:
 #fragment
  #text parsed as text
  script
    #text parsed as script content
  tr
    td
      #text table content

Note that I added an explicit test case for this:

#data
parse as textscriptparse as
spanscript/span/scripttrtdtable content/td/tr
#errors
#document-fragment
#document
| parse as text
| script
|   parse as spanscript/span
| tr
|   td
| table content



 Is this correct? The important part here is that the contents of the
 script element is parsed according to the rules which normally apply
 when parsing scripts?

 (That of course leaves the terrible situation that script parsing is
 vastly different in HTML and SVG, but that's a bad problem that
 already exists)

 Yes. Exactly.


 / Jonas



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-16 Thread Rafael Weinstein
Ok. I think I'm convinced on all points.

I've uploaded a webkit patch which implements what we've agreed on here:

https://bugs.webkit.org/show_bug.cgi?id=84646

I'm happy to report that this patch is nicer than the queued-token
approach. Good call, Henri.

On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz wyc...@gmail.com wrote:

 Yehuda Katz
 (ph) 718.877.1325


 On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote:

 On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com
 wrote:
  Issue 1: How to handle tokens which precede the first start tag
 
  Options:
  a) Queue them, and then later run them through tree construction once
  the implied context element has been picked
 
  b) Create a new insertion like waiting for context element, which
  probably ignores end tags and doctype and inserts character tokens and
  comments. Once the implied context element is picked, reset the
  insertion mode appropriately, and procede normally.

 I prefer b).


 I like b as well. I assume it means that the waiting for context element
 insertion mode would keep scanning until the ambiguity was resolved, and
 then enter the appropriate insertion mode. Am I misunderstanding?

I think what Yehuda is getting at here is that there are a handful of
tags which are allowed to appear anywhere, so it doesn't make sense to
resolve the ambiguity based on their identity.

I talked with Tab about this, and happily, that set seems to be
style, script, meta,  link. Happily, because this means that
the new ImpliedContext insertion mode can handle start tags as
follows (code from the above patch)

if (token.name() == styleTag
|| token.name() == scriptTag
|| token.name() == metaTag
|| token.name() == linkTag) {
processStartTagForInHead(token); // process following the rules
for the in head insertion mode
return;
}

m_fragmentContext.setContextTag(getImpliedContextTag(token.name()));
set the context element
resetInsertionModeAppropriately(); reset the insertion mode appropriately
processStartTag(token); // reprocess the token




 I'm assuming the use case for this stuff isn't that authors throw
 random stuff at the API and then insert the result somewhere. I expect
 authors to pass string literals or somewhat cooked string literals to
 the API knowing where they're going to insert the result but not
 telling the insertion point to the API as a matter of convenience.

 If you know you are planning to insert stuff as a child of tbody,
 don't start your string literal with stuff that would tokenize as
 characters!

 (Firefox currently does not have the capability to queue tokens.
 Speculative parsing in Firefox is not based on queuing tokens. See
 https://developer.mozilla.org/en/Gecko/HTML_parser_threading for the
 details.)

  Issue 2: How to infer a non-HTML implied context element
 
  Options:
  a) By tagName alone. When multiple namespaces match, prefer HTML, and
  then either SVG or MathML (possibly on a per-tagName basis)
 
  b) Also inspect attributes for tagNames which may be in multiple
  namespaces

 AFAICT, the case where this really matters (if my assumptions about
 use cases are right) is a. (Fragment parsing makes scripts useless
 anyway by setting their already started flag, authors probably
 shouldn't be adding styles by parsing style, both HTML and SVG
 font are considered harmful and cross-browser support Content MathML
 is far off in the horizon.)

 So I prefer a) possibly with a-specific elaborations if we can come
 up with some. Generic solutions seem to involve more complexity. For
 example, if we supported a generic attribute for forcing SVG
 interpretation, would it put us on a slippery slope to support it when
 it appears on tokens that aren't the first start tag token in a
 contextless fragment parse?

  Issue 3: What form does the API take
 
  a) Document.innerHTML
 
  b) document.parse()
 
  c) document.createDocumentFragment()

 I prefer b) because:
  * It doesn't involve creating the fragment as a separate step.
  * It doesn't need to be foolishly consistent with the HTML vs. XML
 design errors of innerHTML.
  * It's shorted than document.createDocumentFragment().
  * Unlike innerHTML, it is a method, so we can add more arguments
 later (or right away) to refine its behavior.

 --
 Henri Sivonen
 hsivo...@iki.fi
 http://hsivonen.iki.fi/





Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-16 Thread Jonas Sicking
On Wed, May 16, 2012 at 4:29 PM, Rafael Weinstein rafa...@google.com wrote:
 Ok. I think I'm convinced on all points.

 I've uploaded a webkit patch which implements what we've agreed on here:

 https://bugs.webkit.org/show_bug.cgi?id=84646

 I'm happy to report that this patch is nicer than the queued-token
 approach. Good call, Henri.

 On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz wyc...@gmail.com wrote:

 Yehuda Katz
 (ph) 718.877.1325


 On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote:

 On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com
 wrote:
  Issue 1: How to handle tokens which precede the first start tag
 
  Options:
  a) Queue them, and then later run them through tree construction once
  the implied context element has been picked
 
  b) Create a new insertion like waiting for context element, which
  probably ignores end tags and doctype and inserts character tokens and
  comments. Once the implied context element is picked, reset the
  insertion mode appropriately, and procede normally.

 I prefer b).


 I like b as well. I assume it means that the waiting for context element
 insertion mode would keep scanning until the ambiguity was resolved, and
 then enter the appropriate insertion mode. Am I misunderstanding?

 I think what Yehuda is getting at here is that there are a handful of
 tags which are allowed to appear anywhere, so it doesn't make sense to
 resolve the ambiguity based on their identity.

 I talked with Tab about this, and happily, that set seems to be
 style, script, meta,  link. Happily, because this means that
 the new ImpliedContext insertion mode can handle start tags as
 follows (code from the above patch)

 if (token.name() == styleTag
    || token.name() == scriptTag
    || token.name() == metaTag
    || token.name() == linkTag) {
    processStartTagForInHead(token); // process following the rules
 for the in head insertion mode
    return;
 }

 m_fragmentContext.setContextTag(getImpliedContextTag(token.name()));
 set the context element
 resetInsertionModeAppropriately(); reset the insertion mode appropriately
 processStartTag(token); // reprocess the token

So if I understand things correctly, that would mean that:

document.parse(parsed as textscriptparsed as script
content/scripttrtdtable content/td/tr);

would return a fragment like:
#fragment
  #text parsed as text
  script
#text parsed as script content
  tr
td
  #text table content

Is this correct? The important part here is that the contents of the
script element is parsed according to the rules which normally apply
when parsing scripts?

(That of course leaves the terrible situation that script parsing is
vastly different in HTML and SVG, but that's a bad problem that
already exists)

/ Jonas



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-16 Thread Rafael Weinstein
On Wed, May 16, 2012 at 4:49 PM, Jonas Sicking jo...@sicking.cc wrote:
 On Wed, May 16, 2012 at 4:29 PM, Rafael Weinstein rafa...@google.com wrote:
 Ok. I think I'm convinced on all points.

 I've uploaded a webkit patch which implements what we've agreed on here:

 https://bugs.webkit.org/show_bug.cgi?id=84646

 I'm happy to report that this patch is nicer than the queued-token
 approach. Good call, Henri.

 On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz wyc...@gmail.com wrote:

 Yehuda Katz
 (ph) 718.877.1325


 On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote:

 On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com
 wrote:
  Issue 1: How to handle tokens which precede the first start tag
 
  Options:
  a) Queue them, and then later run them through tree construction once
  the implied context element has been picked
 
  b) Create a new insertion like waiting for context element, which
  probably ignores end tags and doctype and inserts character tokens and
  comments. Once the implied context element is picked, reset the
  insertion mode appropriately, and procede normally.

 I prefer b).


 I like b as well. I assume it means that the waiting for context element
 insertion mode would keep scanning until the ambiguity was resolved, and
 then enter the appropriate insertion mode. Am I misunderstanding?

 I think what Yehuda is getting at here is that there are a handful of
 tags which are allowed to appear anywhere, so it doesn't make sense to
 resolve the ambiguity based on their identity.

 I talked with Tab about this, and happily, that set seems to be
 style, script, meta,  link. Happily, because this means that
 the new ImpliedContext insertion mode can handle start tags as
 follows (code from the above patch)

 if (token.name() == styleTag
    || token.name() == scriptTag
    || token.name() == metaTag
    || token.name() == linkTag) {
    processStartTagForInHead(token); // process following the rules
 for the in head insertion mode
    return;
 }

 m_fragmentContext.setContextTag(getImpliedContextTag(token.name()));
 set the context element
 resetInsertionModeAppropriately(); reset the insertion mode appropriately
 processStartTag(token); // reprocess the token

 So if I understand things correctly, that would mean that:

 document.parse(parsed as textscriptparsed as script
 content/scripttrtdtable content/td/tr);

 would return a fragment like:
 #fragment
  #text parsed as text
  script
    #text parsed as script content
  tr
    td
      #text table content

 Is this correct? The important part here is that the contents of the
 script element is parsed according to the rules which normally apply
 when parsing scripts?

 (That of course leaves the terrible situation that script parsing is
 vastly different in HTML and SVG, but that's a bad problem that
 already exists)

Yes. Exactly.


 / Jonas



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-16 Thread Jonas Sicking
On Wed, May 16, 2012 at 4:52 PM, Rafael Weinstein rafa...@google.com wrote:
 On Wed, May 16, 2012 at 4:49 PM, Jonas Sicking jo...@sicking.cc wrote:
 On Wed, May 16, 2012 at 4:29 PM, Rafael Weinstein rafa...@google.com wrote:
 Ok. I think I'm convinced on all points.

 I've uploaded a webkit patch which implements what we've agreed on here:

 https://bugs.webkit.org/show_bug.cgi?id=84646

 I'm happy to report that this patch is nicer than the queued-token
 approach. Good call, Henri.

 On Tue, May 15, 2012 at 9:39 PM, Yehuda Katz wyc...@gmail.com wrote:

 Yehuda Katz
 (ph) 718.877.1325


 On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote:

 On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com
 wrote:
  Issue 1: How to handle tokens which precede the first start tag
 
  Options:
  a) Queue them, and then later run them through tree construction once
  the implied context element has been picked
 
  b) Create a new insertion like waiting for context element, which
  probably ignores end tags and doctype and inserts character tokens and
  comments. Once the implied context element is picked, reset the
  insertion mode appropriately, and procede normally.

 I prefer b).


 I like b as well. I assume it means that the waiting for context element
 insertion mode would keep scanning until the ambiguity was resolved, and
 then enter the appropriate insertion mode. Am I misunderstanding?

 I think what Yehuda is getting at here is that there are a handful of
 tags which are allowed to appear anywhere, so it doesn't make sense to
 resolve the ambiguity based on their identity.

 I talked with Tab about this, and happily, that set seems to be
 style, script, meta,  link. Happily, because this means that
 the new ImpliedContext insertion mode can handle start tags as
 follows (code from the above patch)

 if (token.name() == styleTag
    || token.name() == scriptTag
    || token.name() == metaTag
    || token.name() == linkTag) {
    processStartTagForInHead(token); // process following the rules
 for the in head insertion mode
    return;
 }

 m_fragmentContext.setContextTag(getImpliedContextTag(token.name()));
 set the context element
 resetInsertionModeAppropriately(); reset the insertion mode appropriately
 processStartTag(token); // reprocess the token

 So if I understand things correctly, that would mean that:

 document.parse(parsed as textscriptparsed as script
 content/scripttrtdtable content/td/tr);

 would return a fragment like:
 #fragment
  #text parsed as text
  script
    #text parsed as script content
  tr
    td
      #text table content

 Is this correct? The important part here is that the contents of the
 script element is parsed according to the rules which normally apply
 when parsing scripts?

 (That of course leaves the terrible situation that script parsing is
 vastly different in HTML and SVG, but that's a bad problem that
 already exists)

 Yes. Exactly.

That leaves the question of if the contents of the script should be
parsed as a HTML script or an SVG script. The same question applies to
style.

Of course, ideally we would make the two parse the same way, but so
far I've not been successful in convincing people here that that's a
good idea.

/ Jonas



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-15 Thread Henri Sivonen
On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote:
 Issue 1: How to handle tokens which precede the first start tag

 Options:
 a) Queue them, and then later run them through tree construction once
 the implied context element has been picked

 b) Create a new insertion like waiting for context element, which
 probably ignores end tags and doctype and inserts character tokens and
 comments. Once the implied context element is picked, reset the
 insertion mode appropriately, and procede normally.

I prefer b).

I'm assuming the use case for this stuff isn't that authors throw
random stuff at the API and then insert the result somewhere. I expect
authors to pass string literals or somewhat cooked string literals to
the API knowing where they're going to insert the result but not
telling the insertion point to the API as a matter of convenience.

If you know you are planning to insert stuff as a child of tbody,
don't start your string literal with stuff that would tokenize as
characters!

(Firefox currently does not have the capability to queue tokens.
Speculative parsing in Firefox is not based on queuing tokens. See
https://developer.mozilla.org/en/Gecko/HTML_parser_threading for the
details.)

 Issue 2: How to infer a non-HTML implied context element

 Options:
 a) By tagName alone. When multiple namespaces match, prefer HTML, and
 then either SVG or MathML (possibly on a per-tagName basis)

 b) Also inspect attributes for tagNames which may be in multiple namespaces

AFAICT, the case where this really matters (if my assumptions about
use cases are right) is a. (Fragment parsing makes scripts useless
anyway by setting their already started flag, authors probably
shouldn't be adding styles by parsing style, both HTML and SVG
font are considered harmful and cross-browser support Content MathML
is far off in the horizon.)

So I prefer a) possibly with a-specific elaborations if we can come
up with some. Generic solutions seem to involve more complexity. For
example, if we supported a generic attribute for forcing SVG
interpretation, would it put us on a slippery slope to support it when
it appears on tokens that aren't the first start tag token in a
contextless fragment parse?

 Issue 3: What form does the API take

 a) Document.innerHTML

 b) document.parse()

 c) document.createDocumentFragment()

I prefer b) because:
 * It doesn't involve creating the fragment as a separate step.
 * It doesn't need to be foolishly consistent with the HTML vs. XML
design errors of innerHTML.
 * It's shorted than document.createDocumentFragment().
 * Unlike innerHTML, it is a method, so we can add more arguments
later (or right away) to refine its behavior.

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-15 Thread Tab Atkins Jr.
On Tue, May 15, 2012 at 12:46 PM, Henri Sivonen hsivo...@iki.fi wrote:
 On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com wrote:
 Issue 1: How to handle tokens which precede the first start tag

 Options:
 a) Queue them, and then later run them through tree construction once
 the implied context element has been picked

 b) Create a new insertion like waiting for context element, which
 probably ignores end tags and doctype and inserts character tokens and
 comments. Once the implied context element is picked, reset the
 insertion mode appropriately, and procede normally.

 I prefer b).

 I'm assuming the use case for this stuff isn't that authors throw
 random stuff at the API and then insert the result somewhere. I expect
 authors to pass string literals or somewhat cooked string literals to
 the API knowing where they're going to insert the result but not
 telling the insertion point to the API as a matter of convenience.

Exactly correct.  That's the jQuery use-case exactly, which we're
trying to solve.

I'm totally fine with b) as well.


 Issue 2: How to infer a non-HTML implied context element

 Options:
 a) By tagName alone. When multiple namespaces match, prefer HTML, and
 then either SVG or MathML (possibly on a per-tagName basis)

 b) Also inspect attributes for tagNames which may be in multiple namespaces

 AFAICT, the case where this really matters (if my assumptions about
 use cases are right) is a. (Fragment parsing makes scripts useless
 anyway by setting their already started flag, authors probably
 shouldn't be adding styles by parsing style, both HTML and SVG
 font are considered harmful and cross-browser support Content MathML
 is far off in the horizon.)

Yup, your assumptions are correct as far as I know.

 So I prefer a) possibly with a-specific elaborations if we can come
 up with some. Generic solutions seem to involve more complexity. For
 example, if we supported a generic attribute for forcing SVG
 interpretation, would it put us on a slippery slope to support it when
 it appears on tokens that aren't the first start tag token in a
 contextless fragment parse?

That wouldn't make sense, though.  If you see p foo a svgrect
//a, you're already in an HTML context, and the svg stuff has to
be wrapped in an svg to form the airtight namespace seal.

But still, @svg is kinda a hacky solution.  Shrug.


 Issue 3: What form does the API take

 a) Document.innerHTML

 b) document.parse()

 c) document.createDocumentFragment()

 I prefer b) because:
  * It doesn't involve creating the fragment as a separate step.
  * It doesn't need to be foolishly consistent with the HTML vs. XML
 design errors of innerHTML.
  * It's shorted than document.createDocumentFragment().
  * Unlike innerHTML, it is a method, so we can add more arguments
 later (or right away) to refine its behavior.

Possibly a second argument to force parsing in a particular way when
it's ambiguous would be useful.

~TJ



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-15 Thread Yehuda Katz
Yehuda Katz
(ph) 718.877.1325


On Tue, May 15, 2012 at 6:46 AM, Henri Sivonen hsivo...@iki.fi wrote:

 On Fri, May 11, 2012 at 10:04 PM, Rafael Weinstein rafa...@google.com
 wrote:
  Issue 1: How to handle tokens which precede the first start tag
 
  Options:
  a) Queue them, and then later run them through tree construction once
  the implied context element has been picked
 
  b) Create a new insertion like waiting for context element, which
  probably ignores end tags and doctype and inserts character tokens and
  comments. Once the implied context element is picked, reset the
  insertion mode appropriately, and procede normally.

 I prefer b).


I like b as well. I assume it means that the waiting for context element
insertion mode would keep scanning until the ambiguity was resolved, and
then enter the appropriate insertion mode. Am I misunderstanding?



 I'm assuming the use case for this stuff isn't that authors throw
 random stuff at the API and then insert the result somewhere. I expect
 authors to pass string literals or somewhat cooked string literals to
 the API knowing where they're going to insert the result but not
 telling the insertion point to the API as a matter of convenience.

 If you know you are planning to insert stuff as a child of tbody,
 don't start your string literal with stuff that would tokenize as
 characters!

 (Firefox currently does not have the capability to queue tokens.
 Speculative parsing in Firefox is not based on queuing tokens. See
 https://developer.mozilla.org/en/Gecko/HTML_parser_threading for the
 details.)

  Issue 2: How to infer a non-HTML implied context element
 
  Options:
  a) By tagName alone. When multiple namespaces match, prefer HTML, and
  then either SVG or MathML (possibly on a per-tagName basis)
 
  b) Also inspect attributes for tagNames which may be in multiple
 namespaces

 AFAICT, the case where this really matters (if my assumptions about
 use cases are right) is a. (Fragment parsing makes scripts useless
 anyway by setting their already started flag, authors probably
 shouldn't be adding styles by parsing style, both HTML and SVG
 font are considered harmful and cross-browser support Content MathML
 is far off in the horizon.)

 So I prefer a) possibly with a-specific elaborations if we can come
 up with some. Generic solutions seem to involve more complexity. For
 example, if we supported a generic attribute for forcing SVG
 interpretation, would it put us on a slippery slope to support it when
 it appears on tokens that aren't the first start tag token in a
 contextless fragment parse?

  Issue 3: What form does the API take
 
  a) Document.innerHTML
 
  b) document.parse()
 
  c) document.createDocumentFragment()

 I prefer b) because:
  * It doesn't involve creating the fragment as a separate step.
  * It doesn't need to be foolishly consistent with the HTML vs. XML
 design errors of innerHTML.
  * It's shorted than document.createDocumentFragment().
  * Unlike innerHTML, it is a method, so we can add more arguments
 later (or right away) to refine its behavior.

 --
 Henri Sivonen
 hsivo...@iki.fi
 http://hsivonen.iki.fi/



Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-11 Thread Rafael Weinstein
Ok,

So from the previous threads, there are appear to be three issues to
resolve, and I'll list the options that I've noted.

I'll follow up with my perspective of pros/cons and ask others to do
the same. Please point out options or issues that I've missed.


Issue 1: How to handle tokens which precede the first start tag

Options:
a) Queue them, and then later run them through tree construction once
the implied context element has been picked

b) Create a new insertion like waiting for context element, which
probably ignores end tags and doctype and inserts character tokens and
comments. Once the implied context element is picked, reset the
insertion mode appropriately, and procede normally.


---
Issue 2: How to infer a non-HTML implied context element

Options:
a) By tagName alone. When multiple namespaces match, prefer HTML, and
then either SVG or MathML (possibly on a per-tagName basis)

b) Also inspect attributes for tagNames which may be in multiple namespaces

c) Allow for inline name spacing of elements which would normally
inherit the proper namespace from svg or math

d) Somewhat orthogonal, but later allow for template to have an
optional context attribute (e.g. template context=svg), which
explicitly picks a context element.

e) Some combination of the above.

---
Issue 3: What form does the API take

a) Document.innerHTML

b) document.parse()

c) document.createDocumentFragment()



Re: Implied Context Parsing (DocumentFragment.innerHTML, or similar) proposal details to be sorted out

2012-05-11 Thread Rafael Weinstein
Ok, so I have some preferences, but they are *mild* preferences and
any permutation of the options below is acceptable to me.

On Fri, May 11, 2012 at 12:04 PM, Rafael Weinstein rafa...@google.com wrote:
 Ok,

 So from the previous threads, there are appear to be three issues to
 resolve, and I'll list the options that I've noted.

 I'll follow up with my perspective of pros/cons and ask others to do
 the same. Please point out options or issues that I've missed.

 
 Issue 1: How to handle tokens which precede the first start tag

 Options:
 a) Queue them, and then later run them through tree construction once
 the implied context element has been picked

I like option (a) because you always get identical output for any
input relative to if you had applied it via innerHTML to the
appropriate implied context element. E.g.

myHTMLElement.innerHTML = foobodybar;
myDocumentFragment.innerHTML = foobodybar;

myHTMLElement.innerHTML == myDocumentFragment.innerHTML; // true;

Also, most browsers are already speculatively tokenizing ahead for
resource preloading purposes, so the implementation complexity isn't
especially daunting.


 b) Create a new insertion like waiting for context element, which
 probably ignores end tags and doctype and inserts character tokens and
 comments. Once the implied context element is picked, reset the
 insertion mode appropriately, and procede normally.


 ---
 Issue 2: How to infer a non-HTML implied context element

 Options:
 a) By tagName alone. When multiple namespaces match, prefer HTML, and
 then either SVG or MathML (possibly on a per-tagName basis)

 b) Also inspect attributes for tagNames which may be in multiple namespaces

 c) Allow for inline name spacing of elements which would normally
 inherit the proper namespace from svg or math

 d) Somewhat orthogonal, but later allow for template to have an
 optional context attribute (e.g. template context=svg), which
 explicitly picks a context element.

 e) Some combination of the above.

I have a mild preference for not getting to fancy here. I'll mostly
stay out of this and let those who know more about SVG and MathML sort
it out, but I'll just note that the main concern cited for needing
this to work is with web components, and there I think Hixie's idea of
a declarative context element (e.g. option (d)) above is a nice
addition and might alleviate the need to get fancy.


 ---
 Issue 3: What form does the API take

 a) Document.innerHTML

 b) document.parse()

 c) document.createDocumentFragment()

I'm torn here between (a)  (b). I like the familiarity of (a), but
agree with Henri's point of about the namespace of the owner document
being a downside. I don't like (c) for stylistic reasons.