Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-06-16 Thread Rafael Weinstein
E4H doesn't address all the use cases of Document.parse().

It doesn't solve the problem of existing templating libraries
constructing DOM fragments from processed templates.

E4H (or something similar) would be great, but I think it's a mistake
to make it mutually exclusive with Document.parse().

On Tue, Jun 5, 2012 at 11:24 PM, Ian Hickson i...@hixie.ch wrote:
 On Mon, 4 Jun 2012, Adam Barth wrote:
 
    http://www.hixie.ch/specs/e4h/strawman
 
  Who wants to be first to implement it?

 Doesn't e4h have the same security problems as e4x?

 As written it did, yes (specifically, if you can inject content into an
 XML file you can cause it to run JS under your control in your origin with
 content from the other origin). However, as Anne and you have said, it's
 easy to fix, either by using an XML-incompatible syntax or using CORS to
 disable it. Since we have to disable it in Workers anyway, I'd go with
 disabling it when there's no CORS. Strawman has been updated accordingly.


 On Tue, 5 Jun 2012, Anne van Kesteren wrote:

 A (bigger?) problem with E4H/H4E is that TC39 does not like it:
 http://lists.w3.org/Archives/Public/public-script-coord/2011OctDec/thread.html#msg33

 What matters is what implementors want to do.

The TC-39 spec process isn't the problem here. TC-39 is composed of
implementors, and they are clearly stating a preference for quasis.


 --
 Ian Hickson               U+1047E                )\._.,--,'``.    fL
 http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-06-05 Thread Adam Barth
On Tue, Jun 5, 2012 at 12:58 AM, Anne van Kesteren ann...@annevk.nl wrote:
 On Tue, Jun 5, 2012 at 2:10 AM, Adam Barth w...@adambarth.com wrote:
 Doesn't e4h have the same security problems as e4x?

 If you mean http://code.google.com/p/doctype-mirror/wiki/ArticleE4XSecurity
 I guess that would depend on how we define it.

By the way, it occurs to me that we can solve these security problems
if we restrict the syntax to only working when executing inline or via
script crossorigin src=  If the script has appropriate CORS
headers, then it doesn't matter if we leak its contents because
they're already readable by the document executing the script.

Adam



Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-06-05 Thread Anne van Kesteren
On Tue, Jun 5, 2012 at 2:10 AM, Adam Barth w...@adambarth.com wrote:
 Doesn't e4h have the same security problems as e4x?

If you mean http://code.google.com/p/doctype-mirror/wiki/ArticleE4XSecurity
I guess that would depend on how we define it.

A (bigger?) problem with E4H/H4E is that TC39 does not like it:
http://lists.w3.org/Archives/Public/public-script-coord/2011OctDec/thread.html#msg33
I'm not as optimistic as them that quasis just solve this (not compile
time, nobody has actually written out the safehtml definition), but
TC39 being against it does not help.


-- 
Anne — Opera Software
http://annevankesteren.nl/
http://www.opera.com/



Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-06-05 Thread Anne van Kesteren
On Tue, Jun 5, 2012 at 11:02 AM, Adam Barth w...@adambarth.com wrote:
 On Tue, Jun 5, 2012 at 2:10 AM, Adam Barth w...@adambarth.com wrote:
 If you mean http://code.google.com/p/doctype-mirror/wiki/ArticleE4XSecurity
 I guess that would depend on how we define it.

 By the way, it occurs to me that we can solve these security problems
 if we restrict the syntax to only working when executing inline or via
 script crossorigin src=  If the script has appropriate CORS
 headers, then it doesn't matter if we leak its contents because
 they're already readable by the document executing the script.

It would also have to be disabled for workers until we have DOM access there...


-- 
Anne — Opera Software
http://annevankesteren.nl/
http://www.opera.com/



Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-06-05 Thread Ian Hickson
On Mon, 4 Jun 2012, Adam Barth wrote:
 
  � http://www.hixie.ch/specs/e4h/strawman
 
  Who wants to be first to implement it?
 
 Doesn't e4h have the same security problems as e4x?

As written it did, yes (specifically, if you can inject content into an 
XML file you can cause it to run JS under your control in your origin with 
content from the other origin). However, as Anne and you have said, it's 
easy to fix, either by using an XML-incompatible syntax or using CORS to 
disable it. Since we have to disable it in Workers anyway, I'd go with 
disabling it when there's no CORS. Strawman has been updated accordingly.


On Tue, 5 Jun 2012, Anne van Kesteren wrote:
 
 A (bigger?) problem with E4H/H4E is that TC39 does not like it:
 http://lists.w3.org/Archives/Public/public-script-coord/2011OctDec/thread.html#msg33

What matters is what implementors want to do.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-06-04 Thread Ian Hickson
On Fri, 25 May 2012, Rafael Weinstein wrote:

 Now's the time to raise objections to UA's adding support for this 
 feature.

For the record, I very much object to Document.parse(). I think it's a 
terrible API. We should IMHO resolve the use case of generate a DOM tree 
from script using a much more robust solution that has compile-time 
syntax checking and so forth, rather than relying on the super-hacky 
concatenate a bunch of strings and then parse them solution that authors 
are forced to use today.

innerHTML and document.write() are abominations unto computer science, and 
we are doing nobody any favours by continuing the platform down this road. 
They lead to programming styles that are rife with injection bugs (XSS), 
they are extremely difficult to debug and maintain, and they are terribly 
complicated to implement compared to more structured alternatives. The 
core reasons for these problems, IMHO, are two-fold:

 1. Lack of compile-time syntax checking, which leads to typos not being 
caught and thus programmer intent not being faithfully represented, 
and
 2. Putting markup syntax and data at the same level, instead of having
separating them as with other features in JS.

For example, this kind of bug is easy to introduce and hard to spot or 
debug:

   var heading = 'h1Hello/h1';
   // ...
   div.innerHTML = 'h1' + heading + '/h1';

Even worse are things like typos:

   tr.innerHTML = 'td' + c1 + '/tdtd' + c2 + '/tddt' + c3 + '/td; 

Compile-time syntax checking makes this a non-issue. Making data variables 
be qualitatively different than the syntax also solves problems, e.g.:

   var title = I hate /p tags.;
   // ...
   div.innerHTML = 'pToday's topic is: ' + title + '/p'; // oops, not 
escaped


There have been several alternative proposals; my personal favourite is 
Anne's E4H solution, basically E4X but simplified just for HTML, which 
I've written a strawman spec for here:

   http://www.hixie.ch/specs/e4h/strawman

I'm happy to write a more serious spec for this if this is something 
anyone is interested in implementing. The above examples become much 
easier to debug. The first one results in very ugly markup visible in the 
output of the page rather than in the weird spacing:

   var heading = 'h1Hello/h1';
   // ...
   div.appendChild(h1{heading}/h1);

The second results in a compile-time syntax error so would be caught even 
before the code is reviewed:

   tr.appendChild(td{c1}/tdtd{c2}/tddt{c3}/td/);

The third becomes a non-issue because you don't need to escape text to 
avoid it from being mistaken for markup [1]:

   var title = I hate /p tags.;
   // ...
   div.innerHTML = pToday's topic is: {title}/p;


Other proposed solutions include Element.create(), which is less verbose 
than the DOM but still more verbose than innerHTML or E4H; and 
quasistrings, which still suffer from lack of compile-time checking and 
mix markup with data, but at least would be more structured than raw 
strings and could offer better injection protection.


[1] (This is not the same as auto-escaping strings in other contexts. For 
example, E4H doesn't propose to have CSS literals, so a string embedded in 
a style= attribute wouldn't be automagically safe.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-06-04 Thread Rafael Weinstein
Just to be clear: what you are objecting to is the addition of formal
API for this.

You're generally supportive of adding a template element whose
contents would parse the way we're discussing here -- and given that,
a webdev could trivially polyfil Document.parse().

I.e. you're ok with the approach of the parser picking a context
element based on the contents of markup, but against giving webdevs
the impression that innerHTML is good practice, by adding more API in
that direction?

Put another way, though you're not happy with adding the API, you
willing to set that aside and help spec the parser changes required
for both this and template element (assuming the remaining issues
with template can be agreed upon)?

FWIW, I agree with Hixie in principle, but disagree in practice. I
think innerHTML is generally to be avoided, but I feel that adding
Document.parse() improves the situation by making some current uses
(which aren't likely to go away) less hacky. Also, I'm not as worried
with webdevs taking the wrong message from us adding API. My feeling
is that they just do what works best for them and don't think much
about what we are or are not encouraging.

Also, I'm highly supportive of the goal of allowing HTML literals in
script. I fully agree that better load (compile) time feedback would
be beneficial to authors here.

On Mon, Jun 4, 2012 at 3:47 PM, Ian Hickson i...@hixie.ch wrote:
 On Fri, 25 May 2012, Rafael Weinstein wrote:

 Now's the time to raise objections to UA's adding support for this
 feature.

 For the record, I very much object to Document.parse(). I think it's a
 terrible API. We should IMHO resolve the use case of generate a DOM tree
 from script using a much more robust solution that has compile-time
 syntax checking and so forth, rather than relying on the super-hacky
 concatenate a bunch of strings and then parse them solution that authors
 are forced to use today.

 innerHTML and document.write() are abominations unto computer science, and
 we are doing nobody any favours by continuing the platform down this road.
 They lead to programming styles that are rife with injection bugs (XSS),
 they are extremely difficult to debug and maintain, and they are terribly
 complicated to implement compared to more structured alternatives. The
 core reasons for these problems, IMHO, are two-fold:

  1. Lack of compile-time syntax checking, which leads to typos not being
    caught and thus programmer intent not being faithfully represented,
    and
  2. Putting markup syntax and data at the same level, instead of having
    separating them as with other features in JS.

 For example, this kind of bug is easy to introduce and hard to spot or
 debug:

   var heading = 'h1Hello/h1';
   // ...
   div.innerHTML = 'h1' + heading + '/h1';

 Even worse are things like typos:

   tr.innerHTML = 'td' + c1 + '/tdtd' + c2 + '/tddt' + c3 + '/td;

 Compile-time syntax checking makes this a non-issue. Making data variables
 be qualitatively different than the syntax also solves problems, e.g.:

   var title = I hate /p tags.;
   // ...
   div.innerHTML = 'pToday's topic is: ' + title + '/p'; // oops, not 
 escaped


 There have been several alternative proposals; my personal favourite is
 Anne's E4H solution, basically E4X but simplified just for HTML, which
 I've written a strawman spec for here:

   http://www.hixie.ch/specs/e4h/strawman

 I'm happy to write a more serious spec for this if this is something
 anyone is interested in implementing. The above examples become much
 easier to debug. The first one results in very ugly markup visible in the
 output of the page rather than in the weird spacing:

   var heading = 'h1Hello/h1';
   // ...
   div.appendChild(h1{heading}/h1);

 The second results in a compile-time syntax error so would be caught even
 before the code is reviewed:

   tr.appendChild(td{c1}/tdtd{c2}/tddt{c3}/td/);

 The third becomes a non-issue because you don't need to escape text to
 avoid it from being mistaken for markup [1]:

   var title = I hate /p tags.;
   // ...
   div.innerHTML = pToday's topic is: {title}/p;


 Other proposed solutions include Element.create(), which is less verbose
 than the DOM but still more verbose than innerHTML or E4H; and
 quasistrings, which still suffer from lack of compile-time checking and
 mix markup with data, but at least would be more structured than raw
 strings and could offer better injection protection.


 [1] (This is not the same as auto-escaping strings in other contexts. For
 example, E4H doesn't propose to have CSS literals, so a string embedded in
 a style= attribute wouldn't be automagically safe.)

 --
 Ian Hickson               U+1047E                )\._.,--,'``.    fL
 http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-06-04 Thread Ian Hickson
On Mon, 4 Jun 2012, Rafael Weinstein wrote:

 Just to be clear: what you are objecting to is the addition of formal 
 API for this.
 
 You're generally supportive of adding a template element whose 
 contents would parse the way we're discussing here -- and given that, a 
 webdev could trivially polyfil Document.parse().

Sure.


 I.e. you're ok with the approach of the parser picking a context element 
 based on the contents of markup, but against giving webdevs the 
 impression that innerHTML is good practice, by adding more API in that 
 direction?

Right.


 Put another way, though you're not happy with adding the API, you 
 willing to set that aside and help spec the parser changes required for 
 both this and template element (assuming the remaining issues with 
 template can be agreed upon)?

I think template is important. If implementing that happens to make it 
easier for a script to implement a bad practice, so be it.

(See my e-mail on the template thread for comments on that subject.)


 FWIW, I agree with Hixie in principle, but disagree in practice. I
 think innerHTML is generally to be avoided, but I feel that adding
 Document.parse() improves the situation by making some current uses
 (which aren't likely to go away) less hacky.

If we want to make things less hacky, let's actually make them less 
hacky, not introduce more APIs that suck.


 Also, I'm not as worried with webdevs taking the wrong message from us 
 adding API. My feeling is that they just do what works best for them and 
 don't think much about what we are or are not encouraging.

I strongly disagree on that. Whether consciously or not, we set the 
standard for what is good practice. I've defintely seen authors look at 
the standards community for leadership. Just look at how authors adopted 
XHTML's syntax, even in the absence of actually using XHTML. It was such a 
tidal wave that we ended up actually changing HTML's conformance criteria 
to ignore the extra characters rather than say they were invalid. Why? 
Because XHTML was what the W3C was working on, so it must have been good, 
even though objectively it really added no semantics (literally nothing, 
the language was defined by deferring to HTML4) and the syntax changes 
were a net negative.


 Also, I'm highly supportive of the goal of allowing HTML literals in 
 script. I fully agree that better load (compile) time feedback would 
 be beneficial to authors here.

Let's do it! As far as I can tell, the impact on a JS parser would be 
pretty minimal.

   http://www.hixie.ch/specs/e4h/strawman

Who wants to be first to implement it?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-06-04 Thread Adam Barth
On Mon, Jun 4, 2012 at 4:38 PM, Ian Hickson i...@hixie.ch wrote:
 On Mon, 4 Jun 2012, Rafael Weinstein wrote:

 Just to be clear: what you are objecting to is the addition of formal
 API for this.

 You're generally supportive of adding a template element whose
 contents would parse the way we're discussing here -- and given that, a
 webdev could trivially polyfil Document.parse().

 Sure.


 I.e. you're ok with the approach of the parser picking a context element
 based on the contents of markup, but against giving webdevs the
 impression that innerHTML is good practice, by adding more API in that
 direction?

 Right.


 Put another way, though you're not happy with adding the API, you
 willing to set that aside and help spec the parser changes required for
 both this and template element (assuming the remaining issues with
 template can be agreed upon)?

 I think template is important. If implementing that happens to make it
 easier for a script to implement a bad practice, so be it.

 (See my e-mail on the template thread for comments on that subject.)


 FWIW, I agree with Hixie in principle, but disagree in practice. I
 think innerHTML is generally to be avoided, but I feel that adding
 Document.parse() improves the situation by making some current uses
 (which aren't likely to go away) less hacky.

 If we want to make things less hacky, let's actually make them less
 hacky, not introduce more APIs that suck.


 Also, I'm not as worried with webdevs taking the wrong message from us
 adding API. My feeling is that they just do what works best for them and
 don't think much about what we are or are not encouraging.

 I strongly disagree on that. Whether consciously or not, we set the
 standard for what is good practice. I've defintely seen authors look at
 the standards community for leadership. Just look at how authors adopted
 XHTML's syntax, even in the absence of actually using XHTML. It was such a
 tidal wave that we ended up actually changing HTML's conformance criteria
 to ignore the extra characters rather than say they were invalid. Why?
 Because XHTML was what the W3C was working on, so it must have been good,
 even though objectively it really added no semantics (literally nothing,
 the language was defined by deferring to HTML4) and the syntax changes
 were a net negative.


 Also, I'm highly supportive of the goal of allowing HTML literals in
 script. I fully agree that better load (compile) time feedback would
 be beneficial to authors here.

 Let's do it! As far as I can tell, the impact on a JS parser would be
 pretty minimal.

   http://www.hixie.ch/specs/e4h/strawman

 Who wants to be first to implement it?

Doesn't e4h have the same security problems as e4x?

Adam



Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-05-25 Thread Simon Pieters
On Fri, 25 May 2012 09:01:43 +0200, Rafael Weinstein rafa...@google.com  
wrote:


Ok, so from consensus on earlier threads, here's the full API   
semantics.


Now's the time to raise objections to UA's adding support for this  
feature.


-

1) The Document interface is extended to include a new method:

DocumentFragment parse (DOMString markup);

which:
-Invokes the fragment parsing algorithm with markup and an empty
context element,
-Unmarks all scripts in the returned fragment node as already started
-Returns the fragment node

2) The fragment parsing algorithm's context element is now optional.

It's behavior is similar to the case of a known context element, but
the tokenizer is simply set to the data state

3) Resetting the insertion appropriately now sets the mode to Implied
Context if parsing a fragment and no context element is set, and
aborts.

4) A new Implied Context insertion mode is defined which

-Ignores doctype, end tag tokens
-Handles comment  character tokens as if in body
-Handles the following start tags as if in body (which is as if in
head): style, script, link, meta
-Handles any other start tag by selecting a context element, resetting
the insertion mode appropriately and reprocessing the token.

5) A new selecting a context element algorithm is defined which
takes a start tag as input and outputs an element. The element's
identity is as follows:

-If start tag is tbody, thead, tfoot, caption or colgroup
  return table
-if start tag is tr,
  return tbody
-if start tag is col
  return colgroup
-if start tag is td or td
  return tr
-if start tag is head or body
  return html
-if start tag is rp or rt
  return ruby


I think ruby is better handled by always making rp and rt generate  
implied end tags in the fragment case (maybe even when parsing normally,  
too). Making the context element ruby still doesn't make rt parse  
right, because the spec currently looks for ruby on the *stack* (and the  
context element isn't on the stack).


Also, the ruby base is allowed to include markup, so this would fail:

ruby.appendChild(document.parse('spanfoo/spanrtbarrtbaz'));



-if start tag is a defined SVG localName (case insensitive)
  return svg


Except those that conflict with HTML?


-if start tag is a defined MathML localName (case insensitive)
  return math


(Making the context element svg or math doesn't do anything currently:  
https://www.w3.org/Bugs/Public/show_bug.cgi?id=16635 )



-otherwise, return body



--
Simon Pieters
Opera Software



Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-05-25 Thread Rafael Weinstein
On Fri, May 25, 2012 at 12:32 AM, Simon Pieters sim...@opera.com wrote:
 On Fri, 25 May 2012 09:01:43 +0200, Rafael Weinstein rafa...@google.com
 wrote:

 Ok, so from consensus on earlier threads, here's the full API  semantics.

 Now's the time to raise objections to UA's adding support for this
 feature.

 -

 1) The Document interface is extended to include a new method:

 DocumentFragment parse (DOMString markup);

 which:
 -Invokes the fragment parsing algorithm with markup and an empty
 context element,
 -Unmarks all scripts in the returned fragment node as already started
 -Returns the fragment node

 2) The fragment parsing algorithm's context element is now optional.

 It's behavior is similar to the case of a known context element, but
 the tokenizer is simply set to the data state

 3) Resetting the insertion appropriately now sets the mode to Implied
 Context if parsing a fragment and no context element is set, and
 aborts.

 4) A new Implied Context insertion mode is defined which

 -Ignores doctype, end tag tokens
 -Handles comment  character tokens as if in body
 -Handles the following start tags as if in body (which is as if in
 head): style, script, link, meta
 -Handles any other start tag by selecting a context element, resetting
 the insertion mode appropriately and reprocessing the token.

 5) A new selecting a context element algorithm is defined which
 takes a start tag as input and outputs an element. The element's
 identity is as follows:

 -If start tag is tbody, thead, tfoot, caption or colgroup
  return table
 -if start tag is tr,
  return tbody
 -if start tag is col
  return colgroup
 -if start tag is td or td
  return tr
 -if start tag is head or body
  return html
 -if start tag is rp or rt
  return ruby


 I think ruby is better handled by always making rp and rt generate
 implied end tags in the fragment case (maybe even when parsing normally,
 too). Making the context element ruby still doesn't make rt parse right,
 because the spec currently looks for ruby on the *stack* (and the context
 element isn't on the stack).

 Also, the ruby base is allowed to include markup, so this would fail:

 ruby.appendChild(document.parse('spanfoo/spanrtbarrtbaz'));



 -if start tag is a defined SVG localName (case insensitive)
  return svg


 Except those that conflict with HTML?

Yes. Thank you. Item 5 should be:

5) A new selecting a context element algorithm is defined which
takes a start tag as input and outputs an element. The element's
identity is as follows:

-If start tag is tbody, thead, tfoot, caption or colgroup
 return table
-if start tag is tr,
 return tbody
-if start tag is col
 return colgroup
-if start tag is td or td
 return tr
-if start tag is head or body
 return html
-if start tag is rp or rt
 return ruby

-if start tag is a defined HTML localName (case insensitive)
 return body

-if start tag is a defined SVG localName (case insensitive)
 return svg

-if start tag is a defined MathML localName (case insensitive)
 return math

-otherwise, return body




 -if start tag is a defined MathML localName (case insensitive)
  return math


 (Making the context element svg or math doesn't do anything currently:
 https://www.w3.org/Bugs/Public/show_bug.cgi?id=16635 )

 -otherwise, return body



 --
 Simon Pieters
 Opera Software



Re: Proposal: Document.parse() [AKA: Implied Context Parsing]

2012-05-25 Thread Scott González
On Fri, May 25, 2012 at 3:01 AM, Rafael Weinstein rafa...@google.comwrote:

 -if start tag is td or td


typo: th