Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)

2007-07-06 Thread Simon Pieters

On Fri, 22 Jun 2007 04:19:53 +0200, Ian Hickson <[EMAIL PROTECTED]> wrote:


  

Safari, Opera and Firefox drop the attribute. IE has an attribute with
the name being the empty string and the value being ="". The HTML5
parsing spec says that there should be an attribute with the name = and
the value the empty string. The "Before attribute name state" part of
the parsing spec might have to be revisited.


I don't see any harm in leaving the spec as-is here, given the lack of
interoperability and the fact that there's no real reason to be using
attributes with this name anyway. Whatever's simplest to implement is
probably best here.


Since it doesn't match any browser, and probably is an authoring mistake  
(that would silently pass conformance checking in the case of ),  
could it be a parse error? (Also update the wording in the syntax section  
if so.)


--
Simon Pieters


Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)

2007-06-21 Thread Ian Hickson
On Wed, 13 Jun 2007, Simon Pieters wrote:
> 
> Since attribute names that use characters outside ASCII aren't parse 
> errors, and any attribute is allowed on the embed element, the 
> definition of "Attribute names" in #writing is incorrect.

Fixed.


> Although that isn't quite right either. The parsing section allows 
> attributes to begin with =. Given the following markup:
> 
>   
> 
> Safari, Opera and Firefox drop the attribute. IE has an attribute with 
> the name being the empty string and the value being ="". The HTML5 
> parsing spec says that there should be an attribute with the name = and 
> the value the empty string. The "Before attribute name state" part of 
> the parsing spec might have to be revisited.

I don't see any harm in leaving the spec as-is here, given the lack of 
interoperability and the fact that there's no real reason to be using 
attributes with this name anyway. Whatever's simplest to implement is 
probably best here.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)

2007-06-13 Thread Henri Sivonen

On Jun 13, 2007, at 10:27, Simon Pieters wrote:

I'd rather change the #tokenisation section to generate more parse  
errors.


Or the DOM-level conformance for embed could make non-ASCII attribute  
names non-conforming.



Why?


When you put non-ASCII in element or attribute names (or variable and  
function names), you aren't really making your format (or software)  
international. You are more likely to *nationalize* the document  
format (or software) by creating a barrier for developers from  
outside your locale.


When you start doing a lot of stuff along the lines of smörgåsbord=""  
in markup, you create a barrier of inconvenience for everyone else  
but Swedes and Finns. That might be OK for you and me, but it won't  
be OK for us when people start using something that our input methods  
and cognitive background don't cover.


Compare with Chinese in markup in UOF--a nationalized fork of ODF.
(See http://blogs.msdn.com/dmahugh/archive/2007/05/22/uof-translator- 
project.aspx )


To keep markup internationally tractable, identifiers should use  
ASCII only with English-based mnemonics.


What if you want to pass a paramater to a plugin with non-ASCII  
characters using ?


People who want that should readjust their wishes, in my opinion.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)

2007-06-13 Thread Anne van Kesteren
On Wed, 13 Jun 2007 10:26:48 +0200, Thomas Broyer <[EMAIL PROTECTED]>  
wrote:

2007/6/13, Simon Pieters:

On Wed, 13 Jun 2007 09:11:31 +0200, Thomas Broyer wrote:
> I'd rather change the #tokenisation section to generate more parse
> errors.

Why? What if you want to pass a paramater to a plugin with non-ASCII
characters using ?


What would you do if you had to recode the document into 7bit ASCII?
Would you recode the attribute name with a "pseudo-entity" (would the
plugin then correctly interpret the parameter name?) Would you drop
the non-ASCII character? Would you rather drop the attribute?


You'd throw an error.



Btw, we'd have a similar problem if you use non-ASCII characters in
CDATA elements... Should they be changed to RCDATA to accept entities?
or should the recoder assume that \u escapes will be understood by
the 

Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)

2007-06-13 Thread Thomas Broyer

2007/6/13, Simon Pieters:

On Wed, 13 Jun 2007 09:11:31 +0200, Thomas Broyer wrote:

> I'd rather change the #tokenisation section to generate more parse
> errors.

Why? What if you want to pass a paramater to a plugin with non-ASCII
characters using ?


What would you do if you had to recode the document into 7bit ASCII?
Would you recode the attribute name with a "pseudo-entity" (would the
plugin then correctly interpret the parameter name?) Would you drop
the non-ASCII character? Would you rather drop the attribute?



Btw, we'd have a similar problem if you use non-ASCII characters in
CDATA elements... Should they be changed to RCDATA to accept entities?
or should the recoder assume that \u escapes will be understood by
the 

Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)

2007-06-13 Thread Simon Pieters
On Wed, 13 Jun 2007 09:11:31 +0200, Thomas Broyer <[EMAIL PROTECTED]>  
wrote:



Why?
Inconsistent maybe, but not incorrect.


Conformance checkers have to follow the parsing section. smörgåsbord="" src="foo"> is thus conforming. The #writing section is  
strictly speaking not necessary, it is merely a reverse engineered version  
of the parsing section taking the rest of the specification into account.  
In this case, it seems it didn't take the "any attribute" rule for   
into account.


I'd rather change the #tokenisation section to generate more parse  
errors.


Why? What if you want to pass a paramater to a plugin with non-ASCII  
characters using ?



Or maybe change the #creating section to drop such attributes, if we
choose to follow the Safari/Opera/Firefox path.


Yeah, indeed.

--
Simon Pieters


Re: [whatwg] Allowed characters in attribute names (was: Re: Steps for finding one or two numbers in a string)

2007-06-13 Thread Thomas Broyer

2007/6/13, Simon Pieters:


Since attribute names that use characters outside ASCII aren't parse
errors, and any attribute is allowed on the embed element, the definition
of "Attribute names" in #writing is incorrect.


Why?
Inconsistent maybe, but not incorrect.


I would suggest to change the definition in #writing to say that attribute
names can consist of any characters except whitespace, =, >, / and <.


I'd rather change the #tokenisation section to generate more parse errors.


Although that isn't quite right either. The parsing section allows
attributes to begin with =. Given the following markup:



Safari, Opera and Firefox drop the attribute. IE has an attribute with the
name being the empty string and the value being ="". The HTML5 parsing
spec says that there should be an attribute with the name = and the value
the empty string. The "Before attribute name state" part of the parsing
spec might have to be revisited.


Or maybe change the #creating section to drop such attributes, if we
choose to follow the Safari/Opera/Firefox path.

--
Thomas Broyer