Re: [whatwg] input element's value should not be sanitized during parsing
On Fri, 11 Mar 2011, Jonas Sicking wrote: On Tue, Dec 28, 2010 at 11:46 PM, Ian Hickson i...@hixie.ch wrote: On Mon, 20 Sep 2010, Mounir Lamouri wrote: With the current specification, these two elements will not have the same value: input value=foo#13;bar type='hidden' input type='hidden' value=foo#13;bar Yes they will. The attribute order has no effect. Elements are created by the parser with their attributes already set: # When the steps below require the UA to create an element for a token in # a particular namespace, the UA must create a node implementing the interface # appropriate for the element type corresponding to the tag name of the # token in the given namespace (as given in the specification that defines # that element, e.g. for an a element in the HTML namespace, this # specification defines it to be the HTMLAnchorElement interface), with # the tag name being the name of that element, with the node being in the # given namespace, and with the attributes on the node being those given # in the given token. -- http://www.whatwg.org/specs/web-apps/current-work/complete.html#create-an-element-for-the-token Except that I don't think this is how any implementation actually works. Nor do I have any desire to write the implementation this way since it means duplicating a lot of code. I'd have to add code which implemented attribute behavior both in some special code path triggered during element creation, as well as code to react to attribute changes triggered by attribute changes in setAttribute/removeAttribute. So far this hasn't been needed and the parsing code basically just calls setAttribute. Unless there are really good reasons to change this I'd like to avoid it. So far I haven't heard of any such reasons. The spec is defined such that attribute setting during element creation is order-agnostic. I believe this is consistent with what authors expect (in part based on the confusion I've seen when authors run into cases where that isn't the case). How you implement that is somewhat orthogonal to how it is specced; if there are specific things that are hard to implement, I'm happy to discuss them specifically if you like. On Tue, 21 Sep 2010, Boris Zbarsky wrote: Where does it say that it's atomic? �I don't see that anywhere (and in fact, the create an element code in the Gecko parser is most decidedly non-atomic). �Now maybe the spec intends this to be an atomic operation; if so it needs to say that. The operation it describes is a single operation: create a node. It describes various constraints on that operation, one of which is that the node have the various tokenised attributes set. I don't understand how creating a node could be anything other than atomic -- either it exists or it does not. You're expecting several operations to happen at the same time. We could certainly manually insert the attributes and their value into the datastructure inside the element which stores the attribute name/value pairs. However at some point we need to update all of the state that these values drive. Things like sticking elements into id-hashes, storing the calculated type of an input, calculating the effective URI of an image, etc. This involves several separate pieces of state and so can't happen all at the same time. Sure. When those things happen is defined by the spec too. On Tue, 21 Sep 2010, Jonas Sicking wrote: Also, it would mean that the following two pieces of code behaves differently: inp = document.createElement(input); inp.setAttribute(value, foo\nbar); inp.setAttribute(type, hidden); and inp = document.createElement(input); inp.setAttribute(type, hidden); inp.setAttribute(value, foo\nbar); This does not seem desirable. I can't argue that it's desireable, but it's how the Web works, as I understand it. Gecko doesn't exhibit this behavior and I don't know of any sites that doesn't work in Gecko because of this. On Wed, 30 Mar 2011, Mounir Lamouri wrote: FWIW, it does. The first inp.value is 'foobar' while the second is 'foo bar'. See: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/900 Though, I do not think this is related to the initial issue which is about setting attributes while creating the element from the parser. Right, the behaviour is different when the parser does it. This is per spec, and seems to match what Firefox does. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] input element's value should not be sanitized during parsing
On Tue, Jun 14, 2011 at 2:00 PM, Ian Hickson i...@hixie.ch wrote: On Fri, 11 Mar 2011, Jonas Sicking wrote: On Tue, Dec 28, 2010 at 11:46 PM, Ian Hickson i...@hixie.ch wrote: On Mon, 20 Sep 2010, Mounir Lamouri wrote: With the current specification, these two elements will not have the same value: input value=foo#13;bar type='hidden' input type='hidden' value=foo#13;bar Yes they will. The attribute order has no effect. Elements are created by the parser with their attributes already set: # When the steps below require the UA to create an element for a token in # a particular namespace, the UA must create a node implementing the interface # appropriate for the element type corresponding to the tag name of the # token in the given namespace (as given in the specification that defines # that element, e.g. for an a element in the HTML namespace, this # specification defines it to be the HTMLAnchorElement interface), with # the tag name being the name of that element, with the node being in the # given namespace, and with the attributes on the node being those given # in the given token. -- http://www.whatwg.org/specs/web-apps/current-work/complete.html#create-an-element-for-the-token Except that I don't think this is how any implementation actually works. Nor do I have any desire to write the implementation this way since it means duplicating a lot of code. I'd have to add code which implemented attribute behavior both in some special code path triggered during element creation, as well as code to react to attribute changes triggered by attribute changes in setAttribute/removeAttribute. So far this hasn't been needed and the parsing code basically just calls setAttribute. Unless there are really good reasons to change this I'd like to avoid it. So far I haven't heard of any such reasons. The spec is defined such that attribute setting during element creation is order-agnostic. I believe this is consistent with what authors expect (in part based on the confusion I've seen when authors run into cases where that isn't the case). How you implement that is somewhat orthogonal to how it is specced; if there are specific things that are hard to implement, I'm happy to discuss them specifically if you like. The problem, if I understand things correctly, is that setAttribute is *not* order agnostic, while the parsing code is expected to be. This means that we can't use the same code paths for setAttribute and parsing. This is not acceptable to us in Gecko. We're not willing to have two code paths for setting attributes. / Jonas
Re: [whatwg] input element's value should not be sanitized during parsing
On Tue, 14 Jun 2011, Jonas Sicking wrote: The problem, if I understand things correctly, is that setAttribute is *not* order agnostic, while the parsing code is expected to be. This means that we can't use the same code paths for setAttribute and parsing. You can, you just have to have a special initialisation signal that the parser sends to an element after its set its attributes. This is not acceptable to us in Gecko. We're not willing to have two code paths for setting attributes. You already _have_ two code paths. The example you gave shows that the parser is order agnostic but the equivalent DOM code is not. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] input element's value should not be sanitized during parsing
On 03/12/2011 12:56 AM, Jonas Sicking wrote: inp = document.createElement(input); inp.setAttribute(value, foo\nbar); inp.setAttribute(type, hidden); and inp = document.createElement(input); inp.setAttribute(type, hidden); inp.setAttribute(value, foo\nbar); This does not seem desirable. I can't argue that it's desireable, but it's how the Web works, as I understand it. Gecko doesn't exhibit this behavior and I don't know of any sites that doesn't work in Gecko because of this. FWIW, it does. The first inp.value is 'foobar' while the second is 'foo bar'. See: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/900 Though, I do not think this is related to the initial issue which is about setting attributes while creating the element from the parser. -- Mounir
Re: [whatwg] input element's value should not be sanitized during parsing
(Sorry to bring back an old thread. Trying to catch up on old to-do's now that FF4 is almost out the door) On Tue, Dec 28, 2010 at 11:46 PM, Ian Hickson i...@hixie.ch wrote: On Mon, 20 Sep 2010, Mounir Lamouri wrote: With the current specification, these two elements will not have the same value: input value=foo#13;bar type='hidden' input type='hidden' value=foo#13;bar Yes they will. The attribute order has no effect. Elements are created by the parser with their attributes already set: # When the steps below require the UA to create an element for a token in # a particular namespace, the UA must create a node implementing the interface # appropriate for the element type corresponding to the tag name of the # token in the given namespace (as given in the specification that defines # that element, e.g. for an a element in the HTML namespace, this # specification defines it to be the HTMLAnchorElement interface), with # the tag name being the name of that element, with the node being in the # given namespace, and with the attributes on the node being those given # in the given token. -- http://www.whatwg.org/specs/web-apps/current-work/complete.html#create-an-element-for-the-token Except that I don't think this is how any implementation actually works. Nor do I have any desire to write the implementation this way since it means duplicating a lot of code. I'd have to add code which implemented attribute behavior both in some special code path triggered during element creation, as well as code to react to attribute changes triggered by attribute changes in setAttribute/removeAttribute. So far this hasn't been needed and the parsing code basically just calls setAttribute. Unless there are really good reasons to change this I'd like to avoid it. So far I haven't heard of any such reasons. On Tue, 21 Sep 2010, Boris Zbarsky wrote: Where does it say that it's atomic? I don't see that anywhere (and in fact, the create an element code in the Gecko parser is most decidedly non-atomic). Now maybe the spec intends this to be an atomic operation; if so it needs to say that. The operation it describes is a single operation: create a node. It describes various constraints on that operation, one of which is that the node have the various tokenised attributes set. I don't understand how creating a node could be anything other than atomic -- either it exists or it does not. You're expecting several operations to happen at the same time. We could certainly manually insert the attributes and their value into the datastructure inside the element which stores the attribute name/value pairs. However at some point we need to update all of the state that these values drive. Things like sticking elements into id-hashes, storing the calculated type of an input, calculating the effective URI of an image, etc. This involves several separate pieces of state and so can't happen all at the same time. On Tue, 21 Sep 2010, Boris Zbarsky wrote: That doesn't work if your parser and DOM aren't very very _very_ tightly coupled, since there are no DOM APIs to atomically set a bunch of attributes. The HTML spec in general assumes that the implementation of the parser is the implementation of the DOM and that you wouldn't use the DOM Core API to implement the DOM or the parser. I wouldn't build a parser on the raw DOM API either. But mostly for performance reasons since we have to do a lot more checks on data that comes from untrusted script (things like prevent ancestor cycles etc). But I'd also strongly want to share most of the code path between the API that the DOM uses and that the parser uses. Not doing that is going to lead to a lot more bloat and a lot more bugs. On Tue, 21 Sep 2010, Jonas Sicking wrote: Also, it would mean that the following two pieces of code behaves differently: inp = document.createElement(input); inp.setAttribute(value, foo\nbar); inp.setAttribute(type, hidden); and inp = document.createElement(input); inp.setAttribute(type, hidden); inp.setAttribute(value, foo\nbar); This does not seem desirable. I can't argue that it's desireable, but it's how the Web works, as I understand it. Gecko doesn't exhibit this behavior and I don't know of any sites that doesn't work in Gecko because of this. / Jonas
Re: [whatwg] input element's value should not be sanitized during parsing
On Mon, 20 Sep 2010, Mounir Lamouri wrote: With the current specification, these two elements will not have the same value: input value=foo#13;bar type='hidden' input type='hidden' value=foo#13;bar Yes they will. The attribute order has no effect. Elements are created by the parser with their attributes already set: # When the steps below require the UA to create an element for a token in # a particular namespace, the UA must create a node implementing the interface # appropriate for the element type corresponding to the tag name of the # token in the given namespace (as given in the specification that defines # that element, e.g. for an a element in the HTML namespace, this # specification defines it to be the HTMLAnchorElement interface), with # the tag name being the name of that element, with the node being in the # given namespace, and with the attributes on the node being those given # in the given token. -- http://www.whatwg.org/specs/web-apps/current-work/complete.html#create-an-element-for-the-token Depending on how the attributes are read, value will be set before or after type, thus, changing the value sanitization algorithm. No, the value sanitization algorithm is invoked separately after the element is first created: # When an input element is first created, the element's rendering and # behavior must be set to the rendering and behavior defined for the type # attribute's state, and the value sanitization algorithm, if one is # defined for the type attribute's state, must be invoked. -- http://www.whatwg.org/specs/web-apps/current-work/complete.html#the-input-element The following change would fix that bug: - The specification should add that the value sanitization algorithm should not be used during parsing/as long as the element hasn't been created. I don't understand how it could be run before the element has been created. It runs on the element! :-) OR - The specification should add in the set value content attribute paragraph that the value sanitization algorithm should not be run during parsing/if the element hasn't been created. The set value content attribute paragraph doesn't apply until after the element has been created, with the attribute already set. The specifications already require that the value sanitization algorithm should be run when the element is first created. So, with this change, the element's value will be un-sanitized during parsing and as soon as the parsing will be done, the element's value will be sanitized. I don't really understand what that means. By the way, first created could probably be changed to a concept from the specifications. We can guess what that means but there is no strong notion behind this words AFAIK. At some point the element is created. How is this ambiguous? On Tue, 21 Sep 2010, James Graham wrote: The concept of Creating an Element already exists [1] and is atomic, that is the element is created with all its attributes in a single operation. Therefore it is not clear to me how attribute order can make a difference per spec. Am I missing your point? [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#creating-and-inserting-elements Indeed. On Tue, 21 Sep 2010, Boris Zbarsky wrote: Where does it say that it's atomic? I don't see that anywhere (and in fact, the create an element code in the Gecko parser is most decidedly non-atomic). Now maybe the spec intends this to be an atomic operation; if so it needs to say that. The operation it describes is a single operation: create a node. It describes various constraints on that operation, one of which is that the node have the various tokenised attributes set. I don't understand how creating a node could be anything other than atomic -- either it exists or it does not. On Tue, 21 Sep 2010, Boris Zbarsky wrote: That doesn't work if your parser and DOM aren't very very _very_ tightly coupled, since there are no DOM APIs to atomically set a bunch of attributes. The HTML spec in general assumes that the implementation of the parser is the implementation of the DOM and that you wouldn't use the DOM Core API to implement the DOM or the parser. So yes, if the spec implies that this is what's supposed to happen here then it needs to be _very_ explicit about that. It's not clear to me how I can be more explicit. Could you elaborate on what you would like it to say? On Tue, 21 Sep 2010, Jonas Sicking wrote: Also, it would mean that the following two pieces of code behaves differently: inp = document.createElement(input); inp.setAttribute(value, foo\nbar); inp.setAttribute(type, hidden); and inp = document.createElement(input); inp.setAttribute(type, hidden); inp.setAttribute(value, foo\nbar); This does not seem desirable. I can't argue that it's desireable, but it's how the Web works, as I understand it. -- Ian Hickson
[whatwg] input element's value should not be sanitized during parsing
Hi, For a few days, Firefox's nightly had a bug related to value sanitizing which happens to be a specification bug. With the current specification, these two elements will not have the same value: input value=foo#13;bar type='hidden' input type='hidden' value=foo#13;bar Depending on how the attributes are read, value will be set before or after type, thus, changing the value sanitization algorithm. So, the value sanitization algorithm of input type='text' will be used for one of these elements and the value will be foobar. The following change would fix that bug: - The specification should add that the value sanitization algorithm should not be used during parsing/as long as the element hasn't been created. OR - The specification should add in the set value content attribute paragraph that the value sanitization algorithm should not be run during parsing/if the element hasn't been created. For a specification point of view, both changes would have the same result. The specifications already require that the value sanitization algorithm should be run when the element is first created. So, with this change, the element's value will be un-sanitized during parsing and as soon as the parsing will be done, the element's value will be sanitized. By the way, first created could probably be changed to a concept from the specifications. We can guess what that means but there is no strong notion behind this words AFAIK. Thanks, -- Mounir
Re: [whatwg] input element's value should not be sanitized during parsing
On Mon, 20 Sep 2010, Mounir Lamouri wrote: Hi, For a few days, Firefox's nightly had a bug related to value sanitizing which happens to be a specification bug. With the current specification, these two elements will not have the same value: input value=foo#13;bar type='hidden' input type='hidden' value=foo#13;bar Depending on how the attributes are read, value will be set before or after type, thus, changing the value sanitization algorithm. So, the value sanitization algorithm of input type='text' will be used for one of these elements and the value will be foobar. The following change would fix that bug: - The specification should add that the value sanitization algorithm should not be used during parsing/as long as the element hasn't been created. OR - The specification should add in the set value content attribute paragraph that the value sanitization algorithm should not be run during parsing/if the element hasn't been created. For a specification point of view, both changes would have the same result. The specifications already require that the value sanitization algorithm should be run when the element is first created. So, with this change, the element's value will be un-sanitized during parsing and as soon as the parsing will be done, the element's value will be sanitized. The concept of Creating an Element already exists [1] and is atomic, that is the element is created with all its attributes in a single operation. Therefore it is not clear to me how attribute order can make a difference per spec. Am I missing your point? [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#creating-and-inserting-elements
Re: [whatwg] input element's value should not be sanitized during parsing
On 9/21/10 4:06 AM, James Graham wrote: The concept of Creating an Element already exists [1] and is atomic, Where does it say that it's atomic? I don't see that anywhere (and in fact, the create an element code in the Gecko parser is most decidedly non-atomic). Now maybe the spec intends this to be an atomic operation; if so it needs to say that. -Boris
Re: [whatwg] input element's value should not be sanitized during parsing
On 09/21/2010 10:12 AM, Boris Zbarsky wrote: On 9/21/10 4:06 AM, James Graham wrote: The concept of Creating an Element already exists [1] and is atomic, Where does it say that it's atomic? I don't see that anywhere (and in fact, the create an element code in the Gecko parser is most decidedly non-atomic). Now maybe the spec intends this to be an atomic operation; if so it needs to say that. It is described as a single step in the spec, which I take to imply that it should behave as a single operation from the point of view of the rest of the spec. Of course I am not against this being made clearer.
Re: [whatwg] input element's value should not be sanitized during parsing
On 9/21/10 5:09 AM, James Graham wrote: It is described as a single step in the spec, which I take to imply that it should behave as a single operation from the point of view of the rest of the spec. That doesn't work if your parser and DOM aren't very very _very_ tightly coupled, since there are no DOM APIs to atomically set a bunch of attributes. So yes, if the spec implies that this is what's supposed to happen here then it needs to be _very_ explicit about that. -Boris
Re: [whatwg] input element's value should not be sanitized during parsing
On Tue, Sep 21, 2010 at 9:13 AM, Boris Zbarsky bzbar...@mit.edu wrote: On 9/21/10 5:09 AM, James Graham wrote: It is described as a single step in the spec, which I take to imply that it should behave as a single operation from the point of view of the rest of the spec. That doesn't work if your parser and DOM aren't very very _very_ tightly coupled, since there are no DOM APIs to atomically set a bunch of attributes. So yes, if the spec implies that this is what's supposed to happen here then it needs to be _very_ explicit about that. Also, it would mean that the following two pieces of code behaves differently: inp = document.createElement(input); inp.setAttribute(value, foo\nbar); inp.setAttribute(type, hidden); and inp = document.createElement(input); inp.setAttribute(type, hidden);inp.setAttribute(value, foo\nbar); This does not seem desirable. / Jonas
Re: [whatwg] input element's value should not be sanitized during parsing
On 09/21/2010 10:18 PM, Jonas Sicking wrote: On Tue, Sep 21, 2010 at 9:13 AM, Boris Zbarsky bzbar...@mit.edu wrote: Also, it would mean that the following two pieces of code behaves differently: inp = document.createElement(input); inp.setAttribute(value, foo\nbar); inp.setAttribute(type, hidden); and inp = document.createElement(input); inp.setAttribute(type, hidden);inp.setAttribute(value, foo\nbar); This does not seem desirable. They do. And I don't see this can be different. -- Mounir