Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)
On Fri, 22 Jun 2007 04:19:53 +0200, Ian Hickson <[EMAIL PROTECTED]> wrote: Safari, Opera and Firefox drop the attribute. IE has an attribute with the name being the empty string and the value being ="". The HTML5 parsing spec says that there should be an attribute with the name = and the value the empty string. The "Before attribute name state" part of the parsing spec might have to be revisited. I don't see any harm in leaving the spec as-is here, given the lack of interoperability and the fact that there's no real reason to be using attributes with this name anyway. Whatever's simplest to implement is probably best here. Since it doesn't match any browser, and probably is an authoring mistake (that would silently pass conformance checking in the case of ), could it be a parse error? (Also update the wording in the syntax section if so.) -- Simon Pieters
Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)
On Wed, 13 Jun 2007, Simon Pieters wrote: > > Since attribute names that use characters outside ASCII aren't parse > errors, and any attribute is allowed on the embed element, the > definition of "Attribute names" in #writing is incorrect. Fixed. > Although that isn't quite right either. The parsing section allows > attributes to begin with =. Given the following markup: > > > > Safari, Opera and Firefox drop the attribute. IE has an attribute with > the name being the empty string and the value being ="". The HTML5 > parsing spec says that there should be an attribute with the name = and > the value the empty string. The "Before attribute name state" part of > the parsing spec might have to be revisited. I don't see any harm in leaving the spec as-is here, given the lack of interoperability and the fact that there's no real reason to be using attributes with this name anyway. Whatever's simplest to implement is probably best here. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)
On Jun 13, 2007, at 10:27, Simon Pieters wrote: I'd rather change the #tokenisation section to generate more parse errors. Or the DOM-level conformance for embed could make non-ASCII attribute names non-conforming. Why? When you put non-ASCII in element or attribute names (or variable and function names), you aren't really making your format (or software) international. You are more likely to *nationalize* the document format (or software) by creating a barrier for developers from outside your locale. When you start doing a lot of stuff along the lines of smörgåsbord="" in markup, you create a barrier of inconvenience for everyone else but Swedes and Finns. That might be OK for you and me, but it won't be OK for us when people start using something that our input methods and cognitive background don't cover. Compare with Chinese in markup in UOF--a nationalized fork of ODF. (See http://blogs.msdn.com/dmahugh/archive/2007/05/22/uof-translator- project.aspx ) To keep markup internationally tractable, identifiers should use ASCII only with English-based mnemonics. What if you want to pass a paramater to a plugin with non-ASCII characters using ? People who want that should readjust their wishes, in my opinion. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)
On Wed, 13 Jun 2007 10:26:48 +0200, Thomas Broyer <[EMAIL PROTECTED]> wrote: 2007/6/13, Simon Pieters: On Wed, 13 Jun 2007 09:11:31 +0200, Thomas Broyer wrote: > I'd rather change the #tokenisation section to generate more parse > errors. Why? What if you want to pass a paramater to a plugin with non-ASCII characters using ? What would you do if you had to recode the document into 7bit ASCII? Would you recode the attribute name with a "pseudo-entity" (would the plugin then correctly interpret the parameter name?) Would you drop the non-ASCII character? Would you rather drop the attribute? You'd throw an error. Btw, we'd have a similar problem if you use non-ASCII characters in CDATA elements... Should they be changed to RCDATA to accept entities? or should the recoder assume that \u escapes will be understood by the
Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)
2007/6/13, Simon Pieters: On Wed, 13 Jun 2007 09:11:31 +0200, Thomas Broyer wrote: > I'd rather change the #tokenisation section to generate more parse > errors. Why? What if you want to pass a paramater to a plugin with non-ASCII characters using ? What would you do if you had to recode the document into 7bit ASCII? Would you recode the attribute name with a "pseudo-entity" (would the plugin then correctly interpret the parameter name?) Would you drop the non-ASCII character? Would you rather drop the attribute? Btw, we'd have a similar problem if you use non-ASCII characters in CDATA elements... Should they be changed to RCDATA to accept entities? or should the recoder assume that \u escapes will be understood by the
Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)
On Wed, 13 Jun 2007 09:11:31 +0200, Thomas Broyer <[EMAIL PROTECTED]> wrote: Why? Inconsistent maybe, but not incorrect. Conformance checkers have to follow the parsing section. smörgåsbord="" src="foo"> is thus conforming. The #writing section is strictly speaking not necessary, it is merely a reverse engineered version of the parsing section taking the rest of the specification into account. In this case, it seems it didn't take the "any attribute" rule for into account. I'd rather change the #tokenisation section to generate more parse errors. Why? What if you want to pass a paramater to a plugin with non-ASCII characters using ? Or maybe change the #creating section to drop such attributes, if we choose to follow the Safari/Opera/Firefox path. Yeah, indeed. -- Simon Pieters
Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)
2007/6/13, Simon Pieters: Since attribute names that use characters outside ASCII aren't parse errors, and any attribute is allowed on the embed element, the definition of "Attribute names" in #writing is incorrect. Why? Inconsistent maybe, but not incorrect. I would suggest to change the definition in #writing to say that attribute names can consist of any characters except whitespace, =, >, / and <. I'd rather change the #tokenisation section to generate more parse errors. Although that isn't quite right either. The parsing section allows attributes to begin with =. Given the following markup: Safari, Opera and Firefox drop the attribute. IE has an attribute with the name being the empty string and the value being ="". The HTML5 parsing spec says that there should be an attribute with the name = and the value the empty string. The "Before attribute name state" part of the parsing spec might have to be revisited. Or maybe change the #creating section to drop such attributes, if we choose to follow the Safari/Opera/Firefox path. -- Thomas Broyer