Re: [whatwg] input element's value should not be sanitized during parsing

2011-06-14 Thread Ian Hickson
On Fri, 11 Mar 2011, Jonas Sicking wrote:
 On Tue, Dec 28, 2010 at 11:46 PM, Ian Hickson i...@hixie.ch wrote:
  On Mon, 20 Sep 2010, Mounir Lamouri wrote:
 
  With the current specification, these two elements will not have the
  same value:
  input value=foo#13;bar type='hidden'
  input type='hidden' value=foo#13;bar
 
  Yes they will. The attribute order has no effect. Elements are created 
  by the parser with their attributes already set:
 
  # When the steps below require the UA to create an element for a token in
  # a particular namespace, the UA must create a node implementing the 
  interface
  # appropriate for the element type corresponding to the tag name of the
  # token in the given namespace (as given in the specification that defines
  # that element, e.g. for an a element in the HTML namespace, this
  # specification defines it to be the HTMLAnchorElement interface), with
  # the tag name being the name of that element, with the node being in the
  # given namespace, and with the attributes on the node being those given
  # in the given token.
   -- 
  http://www.whatwg.org/specs/web-apps/current-work/complete.html#create-an-element-for-the-token
 
 Except that I don't think this is how any implementation actually works. 
 Nor do I have any desire to write the implementation this way since it 
 means duplicating a lot of code. I'd have to add code which implemented 
 attribute behavior both in some special code path triggered during 
 element creation, as well as code to react to attribute changes 
 triggered by attribute changes in setAttribute/removeAttribute.
 
 So far this hasn't been needed and the parsing code basically just calls 
 setAttribute. Unless there are really good reasons to change this I'd 
 like to avoid it. So far I haven't heard of any such reasons.

The spec is defined such that attribute setting during element creation is 
order-agnostic. I believe this is consistent with what authors expect (in 
part based on the confusion I've seen when authors run into cases where 
that isn't the case). How you implement that is somewhat orthogonal to how 
it is specced; if there are specific things that are hard to implement, 
I'm happy to discuss them specifically if you like.


  On Tue, 21 Sep 2010, Boris Zbarsky wrote:
 
  Where does it say that it's atomic? �I don't see that anywhere (and 
  in fact, the create an element code in the Gecko parser is most 
  decidedly non-atomic). �Now maybe the spec intends this to be an 
  atomic operation; if so it needs to say that.
 
  The operation it describes is a single operation: create a node. It 
  describes various constraints on that operation, one of which is that 
  the node have the various tokenised attributes set. I don't understand 
  how creating a node could be anything other than atomic -- either it 
  exists or it does not.
 
 You're expecting several operations to happen at the same time. We could 
 certainly manually insert the attributes and their value into the 
 datastructure inside the element which stores the attribute name/value 
 pairs. However at some point we need to update all of the state that 
 these values drive. Things like sticking elements into id-hashes, 
 storing the calculated type of an input, calculating the effective URI 
 of an image, etc. This involves several separate pieces of state and so 
 can't happen all at the same time.

Sure. When those things happen is defined by the spec too.


  On Tue, 21 Sep 2010, Jonas Sicking wrote:
 
  Also, it would mean that the following two pieces of code behaves 
  differently:
 
  inp = document.createElement(input);
  inp.setAttribute(value, foo\nbar);
  inp.setAttribute(type, hidden);
 
  and
 
  inp = document.createElement(input);
  inp.setAttribute(type, hidden);
  inp.setAttribute(value, foo\nbar);
 
  This does not seem desirable.
 
  I can't argue that it's desireable, but it's how the Web works, as I 
  understand it.
 
 Gecko doesn't exhibit this behavior and I don't know of any sites that 
 doesn't work in Gecko because of this.

On Wed, 30 Mar 2011, Mounir Lamouri wrote:
 
 FWIW, it does. The first inp.value is 'foobar' while the second is 'foo 
 bar'. See: 
 http://software.hixie.ch/utilities/js/live-dom-viewer/saved/900
 
 Though, I do not think this is related to the initial issue which is 
 about setting attributes while creating the element from the parser.

Right, the behaviour is different when the parser does it. This is per 
spec, and seems to match what Firefox does.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] input element's value should not be sanitized during parsing

2011-06-14 Thread Jonas Sicking
On Tue, Jun 14, 2011 at 2:00 PM, Ian Hickson i...@hixie.ch wrote:
 On Fri, 11 Mar 2011, Jonas Sicking wrote:
 On Tue, Dec 28, 2010 at 11:46 PM, Ian Hickson i...@hixie.ch wrote:
  On Mon, 20 Sep 2010, Mounir Lamouri wrote:
 
  With the current specification, these two elements will not have the
  same value:
  input value=foo#13;bar type='hidden'
  input type='hidden' value=foo#13;bar
 
  Yes they will. The attribute order has no effect. Elements are created
  by the parser with their attributes already set:
 
  # When the steps below require the UA to create an element for a token in
  # a particular namespace, the UA must create a node implementing the 
  interface
  # appropriate for the element type corresponding to the tag name of the
  # token in the given namespace (as given in the specification that defines
  # that element, e.g. for an a element in the HTML namespace, this
  # specification defines it to be the HTMLAnchorElement interface), with
  # the tag name being the name of that element, with the node being in the
  # given namespace, and with the attributes on the node being those given
  # in the given token.
   -- 
  http://www.whatwg.org/specs/web-apps/current-work/complete.html#create-an-element-for-the-token

 Except that I don't think this is how any implementation actually works.
 Nor do I have any desire to write the implementation this way since it
 means duplicating a lot of code. I'd have to add code which implemented
 attribute behavior both in some special code path triggered during
 element creation, as well as code to react to attribute changes
 triggered by attribute changes in setAttribute/removeAttribute.

 So far this hasn't been needed and the parsing code basically just calls
 setAttribute. Unless there are really good reasons to change this I'd
 like to avoid it. So far I haven't heard of any such reasons.

 The spec is defined such that attribute setting during element creation is
 order-agnostic. I believe this is consistent with what authors expect (in
 part based on the confusion I've seen when authors run into cases where
 that isn't the case). How you implement that is somewhat orthogonal to how
 it is specced; if there are specific things that are hard to implement,
 I'm happy to discuss them specifically if you like.

The problem, if I understand things correctly, is that setAttribute is
*not* order agnostic, while the parsing code is expected to be. This
means that we can't use the same code paths for setAttribute and
parsing.

This is not acceptable to us in Gecko. We're not willing to have two
code paths for setting attributes.

/ Jonas


Re: [whatwg] input element's value should not be sanitized during parsing

2011-06-14 Thread Ian Hickson
On Tue, 14 Jun 2011, Jonas Sicking wrote:
 
 The problem, if I understand things correctly, is that setAttribute is 
 *not* order agnostic, while the parsing code is expected to be. This 
 means that we can't use the same code paths for setAttribute and 
 parsing.

You can, you just have to have a special initialisation signal that the 
parser sends to an element after its set its attributes.


 This is not acceptable to us in Gecko. We're not willing to have two 
 code paths for setting attributes.

You already _have_ two code paths. The example you gave shows that the 
parser is order agnostic but the equivalent DOM code is not.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] input element's value should not be sanitized during parsing

2011-03-30 Thread Mounir Lamouri
On 03/12/2011 12:56 AM, Jonas Sicking wrote:
 inp = document.createElement(input);
 inp.setAttribute(value, foo\nbar);
 inp.setAttribute(type, hidden);

 and

 inp = document.createElement(input);
 inp.setAttribute(type, hidden);
 inp.setAttribute(value, foo\nbar);

 This does not seem desirable.

 I can't argue that it's desireable, but it's how the Web works, as I
 understand it.
 
 Gecko doesn't exhibit this behavior and I don't know of any sites that
 doesn't work in Gecko because of this.

FWIW, it does. The first inp.value is 'foobar' while the second is 'foo
bar'.
See: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/900

Though, I do not think this is related to the initial issue which is
about setting attributes while creating the element from the parser.

--
Mounir


Re: [whatwg] input element's value should not be sanitized during parsing

2011-03-11 Thread Jonas Sicking
(Sorry to bring back an old thread. Trying to catch up on old to-do's
now that FF4 is almost out the door)

On Tue, Dec 28, 2010 at 11:46 PM, Ian Hickson i...@hixie.ch wrote:
 On Mon, 20 Sep 2010, Mounir Lamouri wrote:

 With the current specification, these two elements will not have the
 same value:
 input value=foo#13;bar type='hidden'
 input type='hidden' value=foo#13;bar

 Yes they will. The attribute order has no effect. Elements are created
 by the parser with their attributes already set:

 # When the steps below require the UA to create an element for a token in
 # a particular namespace, the UA must create a node implementing the interface
 # appropriate for the element type corresponding to the tag name of the
 # token in the given namespace (as given in the specification that defines
 # that element, e.g. for an a element in the HTML namespace, this
 # specification defines it to be the HTMLAnchorElement interface), with
 # the tag name being the name of that element, with the node being in the
 # given namespace, and with the attributes on the node being those given
 # in the given token.
  -- 
 http://www.whatwg.org/specs/web-apps/current-work/complete.html#create-an-element-for-the-token

Except that I don't think this is how any implementation actually
works. Nor do I have any desire to write the implementation this way
since it means duplicating a lot of code. I'd have to add code which
implemented attribute behavior both in some special code path
triggered during element creation, as well as code to react to
attribute changes triggered by attribute changes in
setAttribute/removeAttribute.

So far this hasn't been needed and the parsing code basically just
calls setAttribute. Unless there are really good reasons to change
this I'd like to avoid it. So far I haven't heard of any such reasons.

 On Tue, 21 Sep 2010, Boris Zbarsky wrote:

 Where does it say that it's atomic?  I don't see that anywhere (and in
 fact, the create an element code in the Gecko parser is most decidedly
 non-atomic).  Now maybe the spec intends this to be an atomic operation;
 if so it needs to say that.

 The operation it describes is a single operation: create a node. It
 describes various constraints on that operation, one of which is that the
 node have the various tokenised attributes set. I don't understand how
 creating a node could be anything other than atomic -- either it exists or
 it does not.

You're expecting several operations to happen at the same time. We
could certainly manually insert the attributes and their value into
the datastructure inside the element which stores the attribute
name/value pairs. However at some point we need to update all of the
state that these values drive. Things like sticking elements into
id-hashes, storing the calculated type of an input, calculating the
effective URI of an image, etc. This involves several separate pieces
of state and so can't happen all at the same time.

 On Tue, 21 Sep 2010, Boris Zbarsky wrote:

 That doesn't work if your parser and DOM aren't very very _very_ tightly
 coupled, since there are no DOM APIs to atomically set a bunch of
 attributes.

 The HTML spec in general assumes that the implementation of the parser is
 the implementation of the DOM and that you wouldn't use the DOM Core API
 to implement the DOM or the parser.

I wouldn't build a parser on the raw DOM API either. But mostly for
performance reasons since we have to do a lot more checks on data that
comes from untrusted script (things like prevent ancestor cycles etc).
But I'd also strongly want to share most of the code path between the
API that the DOM uses and that the parser uses. Not doing that is
going to lead to a lot more bloat and a lot more bugs.

 On Tue, 21 Sep 2010, Jonas Sicking wrote:

 Also, it would mean that the following two pieces of code behaves 
 differently:

 inp = document.createElement(input);
 inp.setAttribute(value, foo\nbar);
 inp.setAttribute(type, hidden);

 and

 inp = document.createElement(input);
 inp.setAttribute(type, hidden);
 inp.setAttribute(value, foo\nbar);

 This does not seem desirable.

 I can't argue that it's desireable, but it's how the Web works, as I
 understand it.

Gecko doesn't exhibit this behavior and I don't know of any sites that
doesn't work in Gecko because of this.

/ Jonas


Re: [whatwg] input element's value should not be sanitized during parsing

2010-12-28 Thread Ian Hickson
On Mon, 20 Sep 2010, Mounir Lamouri wrote:
 
 With the current specification, these two elements will not have the
 same value:
 input value=foo#13;bar type='hidden'
 input type='hidden' value=foo#13;bar

Yes they will. The attribute order has no effect. Elements are created 
by the parser with their attributes already set:

# When the steps below require the UA to create an element for a token in 
# a particular namespace, the UA must create a node implementing the interface 
# appropriate for the element type corresponding to the tag name of the 
# token in the given namespace (as given in the specification that defines 
# that element, e.g. for an a element in the HTML namespace, this 
# specification defines it to be the HTMLAnchorElement interface), with 
# the tag name being the name of that element, with the node being in the 
# given namespace, and with the attributes on the node being those given 
# in the given token.
 -- 
http://www.whatwg.org/specs/web-apps/current-work/complete.html#create-an-element-for-the-token


 Depending on how the attributes are read, value will be set before or
 after type, thus, changing the value sanitization algorithm.

No, the value sanitization algorithm is invoked separately after the 
element is first created:

# When an input element is first created, the element's rendering and 
# behavior must be set to the rendering and behavior defined for the type 
# attribute's state, and the value sanitization algorithm, if one is 
# defined for the type attribute's state, must be invoked.
 -- 
http://www.whatwg.org/specs/web-apps/current-work/complete.html#the-input-element


 The following change would fix that bug:
 - The specification should add that the value sanitization algorithm
 should not be used during parsing/as long as the element hasn't been
 created.

I don't understand how it could be run before the element has been 
created. It runs on the element! :-)


 OR
 - The specification should add in the set value content attribute
 paragraph that the value sanitization algorithm should not be run during
 parsing/if the element hasn't been created.

The set value content attribute paragraph doesn't apply until after the 
element has been created, with the attribute already set.


 The specifications already require that the value sanitization algorithm
 should be run when the element is first created.
 So, with this change, the element's value will be un-sanitized during
 parsing and as soon as the parsing will be done, the element's value
 will be sanitized.

I don't really understand what that means.


 By the way, first created could probably be changed to a concept from 
 the specifications. We can guess what that means but there is no strong 
 notion behind this words AFAIK.

At some point the element is created. How is this ambiguous?


On Tue, 21 Sep 2010, James Graham wrote:
 
 The concept of Creating an Element already exists [1] and is atomic, 
 that is the element is created with all its attributes in a single 
 operation. Therefore it is not clear to me how attribute order can make 
 a difference per spec. Am I missing your point?
 
 [1] 
 http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#creating-and-inserting-elements

Indeed.


On Tue, 21 Sep 2010, Boris Zbarsky wrote:
 
 Where does it say that it's atomic?  I don't see that anywhere (and in 
 fact, the create an element code in the Gecko parser is most decidedly 
 non-atomic).  Now maybe the spec intends this to be an atomic operation; 
 if so it needs to say that.

The operation it describes is a single operation: create a node. It 
describes various constraints on that operation, one of which is that the 
node have the various tokenised attributes set. I don't understand how 
creating a node could be anything other than atomic -- either it exists or 
it does not.


On Tue, 21 Sep 2010, Boris Zbarsky wrote:
 
 That doesn't work if your parser and DOM aren't very very _very_ tightly 
 coupled, since there are no DOM APIs to atomically set a bunch of 
 attributes.

The HTML spec in general assumes that the implementation of the parser is 
the implementation of the DOM and that you wouldn't use the DOM Core API 
to implement the DOM or the parser.


 So yes, if the spec implies that this is what's supposed to happen here 
 then it needs to be _very_ explicit about that.

It's not clear to me how I can be more explicit. Could you elaborate on 
what you would like it to say?


On Tue, 21 Sep 2010, Jonas Sicking wrote:
 
 Also, it would mean that the following two pieces of code behaves differently:
 
 inp = document.createElement(input);
 inp.setAttribute(value, foo\nbar);
 inp.setAttribute(type, hidden);
 
 and
 
 inp = document.createElement(input);
 inp.setAttribute(type, hidden);
 inp.setAttribute(value, foo\nbar);

 This does not seem desirable.

I can't argue that it's desireable, but it's how the Web works, as I 
understand it.

-- 
Ian Hickson   

[whatwg] input element's value should not be sanitized during parsing

2010-09-21 Thread Mounir Lamouri
Hi,

For a few days, Firefox's nightly had a bug related to value sanitizing
which happens to be a specification bug.
With the current specification, these two elements will not have the
same value:
input value=foo#13;bar type='hidden'
input type='hidden' value=foo#13;bar
Depending on how the attributes are read, value will be set before or
after type, thus, changing the value sanitization algorithm. So, the
value sanitization algorithm of input type='text' will be used for one
of these elements and the value will be foobar.

The following change would fix that bug:
- The specification should add that the value sanitization algorithm
should not be used during parsing/as long as the element hasn't been
created.
OR
- The specification should add in the set value content attribute
paragraph that the value sanitization algorithm should not be run during
parsing/if the element hasn't been created.

For a specification point of view, both changes would have the same result.

The specifications already require that the value sanitization algorithm
 should be run when the element is first created.
So, with this change, the element's value will be un-sanitized during
parsing and as soon as the parsing will be done, the element's value
will be sanitized.

By the way, first created could probably be changed to a concept from
the specifications. We can guess what that means but there is no strong
notion behind this words AFAIK.

Thanks,
--
Mounir


Re: [whatwg] input element's value should not be sanitized during parsing

2010-09-21 Thread James Graham

On Mon, 20 Sep 2010, Mounir Lamouri wrote:


Hi,

For a few days, Firefox's nightly had a bug related to value sanitizing
which happens to be a specification bug.
With the current specification, these two elements will not have the
same value:
input value=foo#13;bar type='hidden'
input type='hidden' value=foo#13;bar
Depending on how the attributes are read, value will be set before or
after type, thus, changing the value sanitization algorithm. So, the
value sanitization algorithm of input type='text' will be used for one
of these elements and the value will be foobar.

The following change would fix that bug:
- The specification should add that the value sanitization algorithm
should not be used during parsing/as long as the element hasn't been
created.
OR
- The specification should add in the set value content attribute
paragraph that the value sanitization algorithm should not be run during
parsing/if the element hasn't been created.

For a specification point of view, both changes would have the same result.

The specifications already require that the value sanitization algorithm
should be run when the element is first created.
So, with this change, the element's value will be un-sanitized during
parsing and as soon as the parsing will be done, the element's value
will be sanitized.


The concept of Creating an Element already exists [1] and is atomic, 
that is the element is created with all its attributes in a single 
operation. Therefore it is not clear to me how attribute order can make a 
difference per spec. Am I missing your point?


[1] 
http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#creating-and-inserting-elements


Re: [whatwg] input element's value should not be sanitized during parsing

2010-09-21 Thread Boris Zbarsky

On 9/21/10 4:06 AM, James Graham wrote:

The concept of Creating an Element already exists [1] and is atomic,


Where does it say that it's atomic?  I don't see that anywhere (and in 
fact, the create an element code in the Gecko parser is most decidedly 
non-atomic).  Now maybe the spec intends this to be an atomic operation; 
if so it needs to say that.


-Boris


Re: [whatwg] input element's value should not be sanitized during parsing

2010-09-21 Thread James Graham

On 09/21/2010 10:12 AM, Boris Zbarsky wrote:

On 9/21/10 4:06 AM, James Graham wrote:

The concept of Creating an Element already exists [1] and is atomic,


Where does it say that it's atomic? I don't see that anywhere (and in
fact, the create an element code in the Gecko parser is most decidedly
non-atomic). Now maybe the spec intends this to be an atomic operation;
if so it needs to say that.


It is described as a single step in the spec, which I take to imply that 
it should behave as a single operation from the point of view of the 
rest of the spec. Of course I am not against this being made clearer.


Re: [whatwg] input element's value should not be sanitized during parsing

2010-09-21 Thread Boris Zbarsky

On 9/21/10 5:09 AM, James Graham wrote:

It is described as a single step in the spec, which I take to imply that
it should behave as a single operation from the point of view of the
rest of the spec.


That doesn't work if your parser and DOM aren't very very _very_ tightly 
coupled, since there are no DOM APIs to atomically set a bunch of 
attributes.


So yes, if the spec implies that this is what's supposed to happen here 
then it needs to be _very_ explicit about that.


-Boris


Re: [whatwg] input element's value should not be sanitized during parsing

2010-09-21 Thread Jonas Sicking
On Tue, Sep 21, 2010 at 9:13 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 9/21/10 5:09 AM, James Graham wrote:

 It is described as a single step in the spec, which I take to imply that
 it should behave as a single operation from the point of view of the
 rest of the spec.

 That doesn't work if your parser and DOM aren't very very _very_ tightly
 coupled, since there are no DOM APIs to atomically set a bunch of
 attributes.

 So yes, if the spec implies that this is what's supposed to happen here then
 it needs to be _very_ explicit about that.

Also, it would mean that the following two pieces of code behaves differently:

inp = document.createElement(input);
inp.setAttribute(value, foo\nbar);
inp.setAttribute(type, hidden);

and

inp = document.createElement(input);
inp.setAttribute(type, hidden);inp.setAttribute(value, foo\nbar);
This does not seem desirable.

/ Jonas


Re: [whatwg] input element's value should not be sanitized during parsing

2010-09-21 Thread Mounir Lamouri
On 09/21/2010 10:18 PM, Jonas Sicking wrote:
 On Tue, Sep 21, 2010 at 9:13 AM, Boris Zbarsky bzbar...@mit.edu wrote:
 Also, it would mean that the following two pieces of code behaves differently:
 
 inp = document.createElement(input);
 inp.setAttribute(value, foo\nbar);
 inp.setAttribute(type, hidden);
 
 and
 
 inp = document.createElement(input);
 inp.setAttribute(type, hidden);inp.setAttribute(value, foo\nbar);
 This does not seem desirable.

They do. And I don't see this can be different.

--
Mounir