Re: [whatwg] More comments and questions on Web Apps 1.0

2007-06-01 Thread Ian Hickson
On Fri, 1 Jun 2007, Henri Sivonen wrote:
  
  I have no idea which section that was, nor which RFC that is (the URI 
  is now dead). Is there an updated link?
 
 The section is now 3.17.1.1. Script languages. (The section numbering in 
 the email you quoted is from the 2006-02-24 revision of the spec.)
 
 The linked draft has become http://www.ietf.org/rfc/rfc4329

Ah, indeed, that would be a good place to reference that. Noted.


   2.20.1. When I read this, I had trouble organizing (in my mind) what 
   I was reading because I had no prior understanding of where the spec 
   was going. Up to this point, I had had prior hypotheses that were 
   confirmed or disconfirmed by the spec. This section would be a lot 
   easier to read if it had an introductionary paragraph stating the 
   relationship of rendering, the DOM, the data model object and data 
   submission. (Is the DOM being rendered or is a replaced widget 
   element being rendered? Is it stylable? Is the data model reflected 
   back to the DOM? What's the expected way of serializing the data 
   model and sending it back to the server?)
  
  I don't know which section this is talking about.
 
 It was about datagrid.
 
  Is it better now?
 
 I think the non-normative intro section still doesn't sufficiently cover 
 the relationship to the DOM and the CSS frame tree.

The relationship to CSS will all be in the rendering section.

I guess I don't really know what you think is needed in the intro section, 
I'm probably too close to it. Could you write some questions that you 
think an intro section should answer?


 It wasn't clear to me why the spec specified datagrid as part of 
 required UA functionality instead of e.g. Google shipping an Open Source 
 JavaScript library that implements the whole thing using existing stuff 
 available in browsers. Is this about particular native widgets? About 
 performance?

Both of those, but also simply semantics (spelt accessibility for 
political correctness reasons).


 I thought there might be a requirement that the content made sense as a 
 data model.

I think that would be excessive. It might be a good idea, though.


  Do you think it should be further restricted?
 
 Not necessary, I guess.

Ok.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Return values of on* event handlers

2007-06-01 Thread Ian Hickson
On Thu, 31 May 2007, Bill Mason wrote:
  
  Could you elaborate?
 
 The current release of iCab (3.03) treats 'return 0' the same as 'return
 false'.
 
 On the other hand, all these browsers do not in my testing:
 
 IE 3, 4, 5.0, 5.5, 6, 7 (Windows)
 IE 5.2 (Mac)
 Netscape 4, 8 (Windows)
 Netscape 6, 7 (Mac)
 Mozilla 1.7, Firefox 1.5, Firefox 2 (Mac)
 Opera 3, 4, 5 (Windows)
 Opera 6, 7, 8, 9 (Mac)
 Safari 2.0.4 (Mac)

That seems pretty strongly a vote against it for me. Thanks for the 
testing! I guess it's a bug in iCab.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTMLDocument.title and SVGDocument

2007-06-01 Thread Ian Hickson
On Sat, 10 Feb 2007, Anne van Kesteren wrote:

 If HTMLDocument really is going to apply to every Document object... 
 then at least HTMLDocument.title needs to somehow not clash with 
 SVGDocument.title or do both or something.

I basically see two options: HTMLDocument.title always wins, and you can 
get the other one using getFeature(), or, they both get redefined to check 
the root element and dispatch to the other one if appropriate. 
Suggestions?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] live image maps

2007-06-01 Thread Ian Hickson
On Sun, 11 Feb 2007, Alexey Feldgendler wrote:
 On Sat, 10 Feb 2007 12:29:46 +0100, Anne van Kesteren [EMAIL PROTECTED]
 wrote:
 
  I think the specification should be clearer about what happens when an 
  area element is added or removed. When an img element is added 
  with a usemap=. When the shape= attribute is altered, et cetera. 
  Either by handling each case specifically or stating in general that 
  the specified algorithms always apply or something.

The spec says: Image maps are live; if the DOM is mutated, then the user 
agent must act as if it had rerun the algorithms for image maps.


 Isn't it implied that any modification to the DOM tree should be 
 equivalent to replacing the document with another one which would parse 
 into the resulting DOM? If not, maybe it's worth specifying that this 
 holds true unless explicitly stated otherwise.

I don't think this is necessarily true. Replacing a document with another 
one has all sorts of implications that are quite complex.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] DOMTokenList versus DOMStringList

2007-06-01 Thread Ian Hickson
On Sun, 18 Feb 2007, Anne van Kesteren wrote:

 FWIW: DOM Level 3 Core seems to define a 
 http://www.w3.org/TR/DOM-Level-3-Core/core.html#DOMStringList 
 DOMStringList, but it seems far less useful than the proposed 
 DOMTokenList. On the other hand, I suppose you could let DOMTokenList 
 inherit from DOMStringList or something...

I don't really see the relationship... Why would we want to use 
DOMStringList for this?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTMLMediaElement.volume

2007-06-01 Thread Ian Hickson
On Fri, 23 Mar 2007, Anne van Kesteren wrote:

 Wouldn't it be better if no INDEX_SIZE_ERR was raised but instead the 
 previous value was retained? For consistency with 
 CanvasRenderingContext2D.globalAlpha for instance. It's not really 
 important, but I think that some consistency between the various APIs 
 would be nice.

In general, actually, raising INDEX_SIZE_ERR is what the APIs do. So for 
consistency, volume is correct.

globalAlpha, though, is not. What do people think? Should we change the 
canvas globalAlpha attribute to raise an exception for out-of-range 
values? Any browser vendors have an opinion?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] datetime - dateTime

2007-06-01 Thread Ian Hickson
On Sat, 24 Mar 2007, Anne van Kesteren wrote:

 The dateTime DOM attribute is spelled with an uppercase T:
 
  http://www.w3.org/TR/DOM-Level-2-HTML/html.html#ID-79359609

Fixed. Thanks.


On Sun, 25 Mar 2007, Nicholas Shanks wrote:

 On 24 Mar 2007, at 16:57, Anne van Kesteren wrote:
 
  The dateTime DOM attribute is spelled with an uppercase T:
  http://www.w3.org/TR/DOM-Level-2-HTML/html.html#ID-79359609
 
 I just encountered that while implementing longdesc support. The IMG 
 attribute is lower-case, the DOM attribute is longDesc. At least they 
 are consistently inconsistent :-)

Indeed!

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Apply script.defer to internal scripts

2007-06-04 Thread Ian Hickson
On Tue, 27 Mar 2007, Kristof Zelechovski wrote:

 I understand that the async attribute must depend on the src attribute 
 because it is needed and meaningful only when the script element is 
 loaded from an external source; however, the advantage of using the 
 defer attribute is not limited to that case.
 
 Consider the following example:
 
 script type=text/javascript defer
 function ha8validate(p5event) { return true }
 document.forms[0].onsubmit = ha8validate
 /script
 
 The script embedded here is so short and specific that it makes no sense 
 relaying it to an external location; however, if the script is not 
 deferred, the script fails with an exception at run time because the 
 document body is not constructed yet.
 
 Therefore, the defer attribute can be meaningful without the src 
 attribute and the dependency should be removed.

I have removed the dependency. You can now specify defer even without 
the src attribute.

I've also removed the restriction for async, because you might want to 
run a set of scripts in a particular order, with one of them being 
external and async, and another being internal. The only way to guarentee 
the internal one runs immediately after the external one is to make the 
internal one async too.


On Thu, 29 Mar 2007, Gareth Hay wrote:

 Does it not follow that to be more consistent, logical, better style, 
 whatever. you should wrap your code in a function that is called 
 onload?
 
 Isn't that what onload is for? being triggered after the page has 
 loaded?

This doesn't preclude us allowing the other.


On Thu, 29 Mar 2007, Alexey Feldgendler wrote:

 How is this better than putting the script immediately beefore 
 /body, which already works today?

It might not be better, but that's not a reason to disallow it.


On Tue, 3 Apr 2007, Hallvord R M Steen wrote:
 
 There is no real advantage to the defer attribute since in HTML4 it is 
 only advisory, the UA is not required to actually defer the script 
 execution, and some implementations only defer it until seeing the next 
 SCRIPT element in the source. Relying on it the way your use case does 
 will work in very few browsers, and specifying this in HTML5 would 
 increase backwards incompatibility for a very minimal gain.

HTML5 defines it exactly.


On Tue, 3 Apr 2007, Stewart Brodie wrote:
 
 My implementation will execute the script immediately if it was inline, 
 and execute it as soon as the whole script is available (obtained from 
 filesystem/network) otherwise.  As far as I understood the 
 specification, the DEFER simply indicates to the HTML parser that it can 
 continue parsing the HTML without waiting to see if the script is going 
 to insert additional content - i.e. the script will not use 
 document.write (and friends).

HTML5 defines this more exactly than HTML4.


Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Apply script.defer to internal scripts

2007-06-04 Thread Ian Hickson
On Thu, 29 Mar 2007, Matthias Bauer wrote:
 
 What about the DOMContentLoaded event? It is supported by Mozilla and, 
 apparently, Opera 9. Dean Edwards has a technique to make it work on IE, 
 and jQuery supports it on Safari [1].
 
 Is there any chance DOMContentLoaded will be part of HTML5?

On Thu, 29 Mar 2007, Dean Edwards wrote:

 Seems to have been forgotten:
 
 http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2005-April/003709.html

It wasn't forgotten. The spec defines it now.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] A few editing suggestions for the HTML5 spec

2007-06-04 Thread Ian Hickson

On Sun, 15 Apr 2007, Geoffrey Garen wrote:
 
 Some of the algorithms in this specification, for historical  
 reasons, require the user agent to pause until some condition has  
 been met. While a user agent is paused, it must ensure that no  
 scripts execute (e.g. no event handlers, no timers, etc). User agents  
 should remain responsive to user input while paused, however.
 
 How should a user agent respond to user input that would cause an  
 event handler to fire, like clicking on a button?

On Sun, 15 Apr 2007, Kristof Zelechovski wrote:

 Pressing a button when the user agent is in paused state should cause the
 button to remain pressed until the user agent wakes up and execution of the
 associated event handlers should be deferred.

On Sun, 15 Apr 2007, Geoffrey Garen wrote:
 
  Pressing a button when the user agent is in paused state should cause the
  button to remain pressed until the user agent wakes up and execution of the
  associated event handlers should be deferred.
 
 So, if I had N buttons in a page, does that mean that all N could potentially
 end up in a pressed state?

On Sun, 15 Apr 2007, Kristof Zelechovski wrote:

 Methinks, if several buttons are pressed, the events should be placed in the
 queue and executed in the order of appearance.  If the page is reloaded as a
 result of an event handler, all remaining events should be discarded as
 usual.

 I am not quite sure how to handle this situation because the user could end
 up pressing all of the buttons out of impatience in search for a button that
 works.  It is irrelevant if the first button pressed causes a reload,
 otherwise the result may not meet the user's expectation.  The user agent
 should indicate its busy state with an hourglass pointer, a status message,
 an animated icon, whatever in order to prevent this misunderstanding.

On Sun, 15 Apr 2007, Kristof Zelechovski wrote:

 Yes, they could, just like storey buttons in the lift.

This seems like exactly the kind of thing that we should leave up to the 
user agents, so I haven't specified anything. I don't really see that, 
from an interoperability perspective, it really matters. If there's a case 
in which it matters, though, do let me know so we can specify it.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] activeElement

2007-06-04 Thread Ian Hickson
On Thu, 17 May 2007, Hallvord R M Steen wrote:

 if WHATWG is defining document.activeElement, perhaps the WHAT spec 
 should match IE's behaviour more closely on some points. I refer to: 
 http://www.whatwg.org/specs/web-apps/current-work/#activeelement
 
 * when the document is loaded, before any interaction activeElement is 
 the body element (!) (probably not important, I doubt any site would 
 rely on this)

I've made the default the body element instead of the root element, 
which does indeed seem to more accurately reflect IE's behaviour.


 * activeElement is set after mousedown. (important, maybe implied by 
 other stuff about focus handling? I didn't test keydown for e.g. tabbing 
 but pretty sure the same applies.)

This flows from the fact that that's when focus is set. It would vary on a 
platform with different focusing semantics.


 * it is set to the event's target if it is focusable (A, INPUT, BUTTON 
 etc.), otherwise it is set to the event's target's .offsetParent 
 (important, and the offsetParent stuff isn't covered in the current 
 spec)

I couldn't reproduce this. In my testing, only positioned div and span 
elements were magical in this way.

For example, click the uuu text on this test case and you'll see the 
offsetParent (written to the log) is the B element, but the activeElement 
is the DIV element.

   
http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E%0D%0A%3Cbody%20onclick%3D%22w%28event.srcElement.offsetParent.tagName%29%22%3E%0D%0A%3Cpre%3E%28...%29%3C/pre%3E%0D%0A%3Cdiv%20style%3D%22position%3Aabsolute%22%3Eddd%3Ci%3Eiii%3Cb%20style%3D%22position%3Aabsolute%22%3Ebbb%3Cu%3Euuu%3Cinput%3E%3C/u%3E%3C/b%3E%3C/i%3E%3C/div%3E%0D%0A%3Cscript%3E%0D%0A%20setInterval%28function%20%28%29%20%7B%0D%0A%20%20%20var%20pre%20%3D%20document.getElementsByTagName%28%27pre%27%29%5B0%5D%3B%0D%0A%20%20%20pre.firstChild.data%20%3D%20document.activeElement.tagName%3B%0D%0A%20%7D%2C%20100%29%3B%0D%0A%3C/script%3E

I haven't made the spec let positioned div and span elements get 
focused in this way, because whether an element is positioned or not 
should have no bearing on the semantics of the document.


 * it keeps pointing to the same element until another interaction with 
 the document sets it again (important)

That's already in the spec.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Scripting Tweaks

2007-06-04 Thread Ian Hickson
On Sat, 19 May 2007, Maciej Stachowiak wrote:
 
 May I suggest reproposing [DOMContentLoaded] for DOM 3 Events, then, 
 since your former objection to it is withdrawn?

I can if you want, but I don't really see it as a feature that would be 
expected in DOM3 Events. DOM Events defines the event infrastructure; it 
doesn't define when and how each event is actually fired. The firing of 
the events in HTML is very closely tied to the rest of the HTML processing 
model.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] setting .src of a SCRIPT element

2007-06-04 Thread Ian Hickson
On Wed, 30 May 2007, Jonas Sicking wrote:

 The reason I designed it this way was that it felt like the least 
 illogical behavior. In general a document behaves according to its 
 current DOM. I.e. it doesn't matter what the DOM looked like before, or 
 how it got to be in the current state, it only matters what's in the DOM 
 now. [...]

 For script things are a lot worse. If the contents of a script 
 element is changed it is impossible to 'drop' the script that was there 
 before. Once the contents of a script has executed, it can never be 
 unexecuted. And since we can't undo what the script has already done, 
 it feels weird to redo the new thing that you're asking it to do.
 
 Another thing that would be weird would be inline scripts. How would the
 following behave:
 s = document.createElement('script');
 document.head.appendChild(s);
 for (i = 0; i  10; i++) {
   s.textContent += a + i +  += 5;;
 }

 Would you reexecute the entire script every time data was appended to 
 the script? Would you try to just execute the new parts? Would you do 
 nothing? IE gets around this problem by not supporting dynamically 
 created inline scripts at all, which I think is a really bad solution.
 
 So I opted for 'killing' script elements once they have executed, they 
 become in effect dead elements. This felt simple and consistent.
 
 I'm not sure what you mean when you say you need to keep track of them, 
 and remove them from the document again. All you need to do every time 
 you want to execute a script is to insert a new DOM element in the head 
 of your page. It's not going to be a problem with having too many 
 script elements in the document unless you start executing millions of 
 scripts, at which point you'll have bigger performance issues.

On Thu, 31 May 2007, Jonas Sicking wrote:
  
   I don't see that being able to reuse elements adds any value. Could 
   you give an example where it does?
  
  The global eval equivalent is an example. It's not much of an 
  improvement over the cloneNode example but I'd like the performance to 
  be as close to a plain eval as possible. Ability to switch type, 
  charset, language attributes in chosen user agents may be useful for 
  things like testing E4X support or ES4 support, or correct broken 
  encodings. Ability to execute an external resource again may be 
  useful. All of these are already possible however, so I don't think 
  they are strong use cases.
 
 If there aren't any strong use cases I think we should go with what's 
 simple.

I agree with Jonas here (and I apologise for not seeming to have the other 
side of this conversation; I assume I put it into another folder and will 
get to it in due course).

I haven't changed the spec, since the spec describes what Jonas says.

Please let me know if you disagree with this, especially if you find pages 
that break because of it.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] typos in HTMLElement IDL

2007-06-04 Thread Ian Hickson
On Sat, 2 Jun 2007, Anne van Kesteren wrote:

 * tabindex - tabIndex

Fixed.

 * contenteditable - contentEditable

Fixed.

 * The irrelevant DOM attribute currently doesn't link because
   there's no dfn around its definition.

Fixed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTMLDocument.title and SVGDocument

2007-06-04 Thread Ian Hickson
On Fri, 1 Jun 2007, Maciej Stachowiak wrote:
 
  I basically see two options: HTMLDocument.title always wins, and you 
  can get the other one using getFeature(), or, they both get redefined 
  to check the root element and dispatch to the other one if 
  appropriate. Suggestions?
 
 I like the check the root element option. That way, one could use the 
 exact same Document object implementation for SVG and XHTML, while 
 remaining compatible with expected behavior for existing content in both 
 languages. Compound documents where both the HTML and the SVG have a 
 title are possible, but that seems obscure enough that a special DOM API 
 to get both titles is probably unnecessary.

On Sat, 2 Jun 2007, Anne van Kesteren wrote:
 
 Even in that case only one title can be the document title.

On Sat, 2 Jun 2007, Maciej Stachowiak wrote:
 
 If we define which is the document title in such cases than both 
 HTMLDocument and SVGDocument returning that seems better than separate 
 results.

Done.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Scripting Tweaks

2007-06-05 Thread Ian Hickson
On Mon, 4 Jun 2007, Maciej Stachowiak wrote:
  
  I can if you want, but I don't really see it as a feature that would 
  be expected in DOM3 Events. DOM Events defines the event 
  infrastructure; it doesn't define when and how each event is actually 
  fired. The firing of the events in HTML is very closely tied to the 
  rest of the HTML processing model.
 
 It also defines the names of events

Just referring to the event defines the name, so that's a non-issue.


 their associated IDL interfaces, 
 whether they bubble, whether they are cancellable, and so forth. 

That is independent of the name, and determined when you fire the event. 
There are some events that sometimes bubble and sometimes don't, for 
instance. In fact I would say that it's up to the spec that fires the 
event to define whether they bubble, have default actions, etc. 


 DOM 3 Events defines this for the load event for instance.

I would argue that it shouldn't.


 It seems to me that load and DOMContentLoaded can both be defined in 
 ways that are independent of the specific markup language, and are 
 equally deserving of being in DOM 3 Events itself. Certainly in a user 
 agent that supports multiple markup languages, you'd want 
 DOMContentLoaded to be dispatched for all of them under the same 
 conditions.

I guess it makes sense to have a non-normative (because there's nothing 
to be normative about) repository somewhere that lists event names and 
which specs are using them, so that specs can remain consistent on the 
matter. But I don't think DOM3 Events need be it; it's a lot of work. For 
example, all the HTMLMediaElement events would need to be added to the 
list.


 Finally, the reason it was left out in the first place was largely due 
 to presumed lack of use cases, not because it was believed to require 
 language-specific processing rules. And that argument was accepted 
 largely due to your support for it. So it would be good to at least 
 present the new info.

I really don't think that my input had much influence on the matter. This 
is the sum total of what I said about it:

| All in all I'm in agreement with the sentiment on this thread that 
| DOMContentLoaded's use cases are unconvincing.

...and that still stands. However, a lot of people have indicated that 
there _are_ use cases, and that they really want support for this. Since 
browsers support it and aren't going to _remove_ support for it, we have 
to specify it, and hence it gets added to the spec.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Arbitrary HTML in option-elements

2007-06-05 Thread Ian Hickson
On Tue, 30 Nov 2004, Olav Junker Kjær wrote:

 Generally, I think its a very good thing that the spec tries to define 
 how to handle invalid HTML. Undefined and optional behavior in 
 interpreting HTML is bad thing IMHO.
 
 Maybe the rules for parsing invalid HTML (in HTML5) could be 
 generalized, something like: [...]

Since we now have very specific parsing rules, this probably no longer 
really applies. Please let me know if you disagree with what the spec 
says today about handling invalid HTML (and option elements in 
particular).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Lowercase attribute values

2007-06-05 Thread Ian Hickson
On Sun, 28 Aug 2005, Henri Sivonen wrote:

 In XHTML there are attributes whose value must be in lowercase, although 
 in HTML the value is case-insensitive. The most common example is the 
 method attribute of the form element. But should rev and rel be 
 lowercased?
 
 A piece of software that maps from the HTML flavor of HTML5 to the XHTML 
 flavor and needs to decide which attribute values to lowercase. How 
 should the decision be done? Based solely on the attribute name? (In 
 which case 'type' would be interesting.) Based on both the element name 
 and the attribute name? What is the recommended method for the author of 
 such a piece of software for extracting the list of special cases from 
 the spec?

There are no more differences between XHTML and HTML now as far as this 
goes, as far as I know. Please let me know if I missed one.


 How should the lowercasing be performed? Using the locale-insensitive 
 Unicode case data or for ASCII only treating non-ASCII as an error?

So long as you don't do it in the Turkish locale, it should be fine. I 
haven't really made the spec very clear on this yet, but there's a red box 
about it; it'll be dealt with in due course.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Lowercase attribute values

2007-06-05 Thread Ian Hickson
On Mon, 29 Aug 2005, Henri Sivonen wrote:

 On Aug 28, 2005, at 09:56, Henri Sivonen wrote:
 
  How should the lowercasing be performed? Using the locale-insensitive
  Unicode case data or for ASCII only treating non-ASCII as an error?
 
 On further reflection, it seems to me the latter has to be chosen at 
 least for a parser that is intended to be used as a part of a 
 conformance checker. Otherwise input type=RADİO ... would pass.

Good point. Ok, ASCII-only it is.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] how to handle minimised attributes in HTML5

2007-06-05 Thread Ian Hickson
On Wed, 27 Apr 2005, Henri Sivonen wrote:
 On Apr 27, 2005, at 13:09, Ian Hickson wrote:
  On Tue, 26 Apr 2005, Henri Sivonen wrote:
   
   What do you suggest the parser layer of an text/html conformance 
   checker say about input checkbox ...?
   
   1. Silently treat as input type=checkbox ...?
   2. Treat as input type=checkbox ... but warn?
   3. Treat as input checkbox=checkbox ... causing an error to be
   reported on a higher layer?
   4. Treat as fatal error in the parser?
  
  5. Treat as input checkbox=
 
 Why? XHTML requires boolean attributes to be represented as foo='foo'. 
 If input checked ... was treated as input checked='' ..., one could 
 not reuse XHTML schemas on top a minimal text/html flavor parser.

XHTML no longer requires this. foo= and foo=foo are now defined 
equivalently.


  The only exception, I believe, would be for table border, which 
  would instead be treated as table border=1.
 
 Do you mean table border should pass a conformance check?

No. The border= attribute is not valid however it is written.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Character References in HTML

2007-06-05 Thread Ian Hickson
On Thu, 13 Oct 2005, Lachlan Hunt wrote:
 
 In HTML4, according to SGML rules, numeric character references in the 
 range from #128; to #159; are defined as UNUSED, which makes them 
 non-SGML characters.  Strictly speaking, it's not an error to refer to 
 these characters with character references (even the validator only 
 issues a warning: reference to a non-SGML character); but, AIUI, neither 
 SGML nor HTML4 assigns any meaning to them. 
 http://lachy.id.au/log/2005/10/char-refs
 
 Technically, these character references should really refer to the 
 Unicode control characters, but reality dictates otherwise for 
 text/html, thanks to IE and countless (poorly written) books and 
 tutorials.  I, therefore, think the spec should say something along 
 these lines:
 
   In HTML, numeric and hexadecimal character references referring to
   code positions in the range from 128 to 159 (0x80 to 0x9F) should be
   re-mapped to code positions in the Unicode character repertoire
   according to the CP1252 to Unicode table [CP1252].  This does not
   apply to XHTML.

Done. (With a must, and with an explicit table, since CP1252 doesn't 
define all those characters.)


   HTML documents must not use numeric or hexadecimal character
   references in this range, although browsers should support them for
   backwards compatibility.  Authors should instead refer to the correct
   Unicode code position for these characters.

Done.


 Also, I think this would also be a nice conformance requirement to see for
 authoring tools:
 
   HTML Authoring tools should automatically convert these character
   references to either the equivalent Unicode code position or, if the
   file's encoding supports it, the character itself, according to the
   CP1252 to Unicode table [CP1252].

Not done, but it's redundant anyway since simply implementing the spec 
will do this automatically (the spec doesn't round-trip the out-of-range 
entities through the DOM).


 None of that should apply to XHTML, since XML explicitly allows this 
 range in the production for Char and, as far as I'm aware, no XHTML UA 
 implements this buggy behaviour.

Indeed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Test suite: Embedded content

2007-06-05 Thread Ian Hickson
On Tue, 29 Nov 2005, Lachlan Hunt wrote:
  
  At least in Gecko, we parse the contents of noembed, noscript, 
  noframes, and iframe as CDATA when we're not going to be using 
  their contents because in the past, we've had lots of problems with 
  authors treating these tags like C's preprocessor directives, handling 
  cases like: 
  headnoscriptbody.../noscriptscript.../scriptbody is 
  extremely difficult (and then preserving round-tripping for editor 
  gets to be a problem, and the list of problems goes on).
 
 Ok, but how is equivalent markup handled in XHTML, where parsing 
 obviously can't switch to CDATA?

Badly.

noembed is non-conforming and does nothing in XHTML.
noscript is non-conforming in XHTML and does nothing in XHTML.
noframes might be non-conforming. I haven't done anything with it yet.
iframe contents are non-conforming in XHTML. They would be hidden but 
are in the DOM and active.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Serialize comments that contain --

2007-06-06 Thread Ian Hickson
On Mon, 30 Jan 2006, Simon Pieters wrote:
 
 What should happen with comments that contain -- when you serialize 
 the DOM into HTML? When serializing into XML it should result in a fatal 
 error according to DOM3Core[1], but I guess that is not really desired 
 for HTML. Should the comment be dropped? Should the serializer insert a 
 space in between (- -)?

The spec says:

# If the element's contents are not conformant, it is possible that
# the roundtripping through innerHTML will not work. For instance, if
# the element is a textarea element to which a Comment node has been
# appended, then assigning innerHTML to itself will result in the
# comment being displayed in the text field. Similarly, if, as a
# result of DOM manipulation, the element contains a comment that
# contains the literal string --, then when the result of
# serialising the element is parsed, the comment will be truncated at
# that point and the rest of the comment will be interpreted as
# markup. Another example would be making a script element contain a
# text node with the text string /script.

It currently says this for innerHTML. Should I say it in other places too? 
Or would something else be better?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] The problem of duplicate ID as a security issue

2007-06-06 Thread Ian Hickson
On Fri, 10 Mar 2006, Alexey Feldgendler wrote:

 Does the current version of the spec define what happens to elements 
 with duplicate ID values?

No. It's something we should consider for fixes to DOM3 Core, though.


 The problem of duplicate ID isn't just another issue where it's nice to 
 have some well-defined error recovery just for uniformity. There are 
 cases when duplicate IDs should be viewed as a security concern.
 
 Consider a script which augments the HTML page after it has been parsed 
 by attaching event listeners to elements in the DOM tree, inserting new 
 nodes into the tree etc. This is common practice, for example, for many 
 web-based WYSIWYG editors. In this scenario, any method the script uses 
 for identificaation of the DOM nodes subject to augmentation is 
 vulnerable to possible spoofing by user-supplied content present on the 
 same page.
 
 For example, imagine a script which finds a button by ID and attaches an 
 event listener to it. A possible markup looks like this:
 
 div
...blog entry body...
 /div
 button id=addtomemoriesAdd this entry to memories/button
 script
 document.getElementById('addtomemories').addEventListener('click',
 doSomeNiceAJAX);
 /script
 
 So, a malicious blog author can make the following entry:
 
 I have found a a href=# id=addtomemoriescool website/a.
 
 Depending on how the browser handles duplicate IDs, any of the following 
 unwanted effects may occur, or both:
 1. Clicking the link in the blog entry adds the entry to memories list 
 of the reader.
 2. Clicking the real Add this entry to memories button does nothing.
 
 One can think of other examples, possibly more dangerous. Other methods 
 of identification (by tag name, by class, by CSS selector as proposed 
 recently) are also vulnerable.
 
 This kind of attack is hard to circumvent through use of HTML cleaners 
 because id=addtomemories looks like an innocent attribute, like an 
 anchor for navigation.

It's not that hard to avoid. You can whitelist what attributes are allowed 
(e.g. only attribute consisting of comment followed by the comment 
number followed by 1 to 10 characters in the range a-z).


 Preventing such attacks by a HTML cleaner would require either making a 
 full list of all forbidden IDs, class names etc, or imposing Draconian 
 rules upon user-supplied content, completely disallowing such useful 
 attributes like id and class.

I'm not really convinced there's that much use in user-supplied IDs and 
classes, but the rules needn't be that draconian. The server could 
automatically prepend the commentN string to IDs and classes.

To be safe, a server's cleaning code must whitelist everything -- 
elements, attribute names, attribute values, element contents, etc. It's 
not trivial, but that's no excuse for not doing it.


 Necessary but not sufficient. Duplicate IDs aren't caught by a 
 validating parser, so custom code is needed to enforce many of the 
 requirements. For example, if one was trying to ensure that all IDs are 
 unique, then the ID values within the user-supplied code would have to 
 be checked for duplicates among them, too.

This is already the case, yes.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Parsing Numeric Character References

2007-06-06 Thread Ian Hickson
On Sun, 12 Mar 2006, Lachlan Hunt wrote:
 
 [The spec] does not cover [entities for] the characters in the range 
 from #x80 to #x9F, which have historically been treated as code points 
 from the Windows-1252 repertoire, rather than the control characters 
 from Unicode.  AFAIK, this is already interoperably implemented in all 
 browsers.

Fixed.

 Characters in the range from #x01 to #x19 (except for whitespace 
 characters) are not treated interoperably across platforms.  On Windows, 
 Firefox, IE and Opera all displayed characters from some repertoire I 
 couldn't identify.  But on Mac: all the browsers displayed either 
 nothing or a box (a place holder character).  I think these should all 
 return U+FFFD.

They return the appropriate control characters from Unicode. The reason 
they render on some platforms is that the fonts on some platforms (Windows 
in particular) have glyphs in those positions.


 The use of characters in either of these ranges should be an easy parse 
 error.

I've made the first set a parse error, since those actually don't 
roundtrip as one mights expect. But the x01-x19 entities roundtrip fine, 
they just render funkily. We could define something special about these 
characters in the rendering section, but I don't think they should be 
parse errors. Do you agree?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] why, e.g., input/@checked=checked ?

2007-06-06 Thread Ian Hickson
On Wed, 6 Jun 2007, Henri Sivonen wrote:
 
 Requiring lower case for the boolean attribute's canonical name (as 
 value) certainly makes things friendly for clean and portable RELAX NG 
 schemata and, thus, easier for me. It also makes things politically 
 correct as far as XHTML5 goes. I can imagine, however, that someone else 
 might see the case restriction as excessive.

If you get feedback along these lines from your users, please let me know. 
We can review this in light of experience.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] The problem of duplicate ID as a security issue

2007-06-06 Thread Ian Hickson
On Thu, 7 Jun 2007, Alexey Feldgendler wrote:
 On Thu, 07 Jun 2007 00:20:18 +0200, Ian Hickson [EMAIL PROTECTED] wrote:
 
   Preventing such attacks by a HTML cleaner would require either 
   making a full list of all forbidden IDs, class names etc, or 
   imposing Draconian rules upon user-supplied content, completely 
   disallowing such useful attributes like id and class.
 
  I'm not really convinced there's that much use in user-supplied IDs 
  and classes, but the rules needn't be that draconian. The server could 
  automatically prepend the commentN string to IDs and classes.
 
 IDs in user-supplied content are only useful as fragment identifiers for 
 URLs, and mangling them like that defeats this use case because you 
 don't know N before you post the comment, and therefore can't make 
 internal links within the body (and it's also unobvious when you try to 
 make links to parts of your article afterwards).

True. I don't have a good solution to this that doesn't involve code on 
the server-side, though.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] On validation

2007-06-06 Thread Ian Hickson
On Thu, 16 Mar 2006, Henri Sivonen wrote:
 From the spec:
 The term validation specifically refers to a subset of conformance
  checking that only verifies that a document complies with the requirements
  given by an SGML or XML DTD. Conformance checkers that only perform
  validation are non-conforming, as there are many conformance requirements
  described in this specification that cannot be checked by SGML or XML DTDs.
  
 To put it another way, there are three types of conformance criteria:
  
1. Criteria that can be expressed in a DTD.
2. Criteria that cannot be expressed by a DTD, but can still be
  checked by a machine.
3. Criteria that can only be checked by a human.
  
 A conformance checker must check for the first two. A simple DTD-based
  validator only checks for the first class of errors and is therefore not a
  conforming conformance checker according to this specification.
 
 There are three things I don't like about this note:
 First, it perpetuates the Validation means only DTD validation mantra.
 Second, it mentions SGML and XML DTDs casually together.
 Third, it can be read to imply that using a DTD as part of a conformance
 checker is a good idea.

Fixed. Let me know if it's still a problem.


 Suggested replacement text:
 
 Note: XML DTDs cannot express all the conformance requirement of this 
 specification. Therefore, a validating the XML processor and a DTD 
 cannot constitute a conformance checker. Also, since the two authoring 
 formats defined in this specification are applications of SGML, a 
 validating SGML system cannot constitute a conformance checker.

I used basically this text.


 Since a large part of HTML5 involves aligning in the spec with the real 
 world, perhaps the term HTML5 validation should be defined to mean the 
 same as HTML5 conformance checking. :-)

I have added a paragraph to this effect.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] [html5] tags, elements and generated DOM

2007-06-06 Thread Ian Hickson
On Thu, 16 Mar 2006, Henri Sivonen wrote:
 
 At the end of section 1.8 it says:
 These XML documents may contain a DOCTYPE if desired, but this is not
 required to conform to this specification.
 
 I'd like to see a note here. Something like this: Note: According to 
 [XML], XML processors are not guaranteed to process the external DTD 
 subset referenced in the DOCTYPE. This means, for example, that using 
 entities for characters is unsafe (except for lt;, gt;, amp;, quot; 
 and apos;). For interoperability, authors are advised to avoid optional 
 features of XML.

Added.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Forbidden characters in text/html

2007-06-06 Thread Ian Hickson
On Sun, 19 Mar 2006, Henri Sivonen wrote:
 
 Since U+ has no legitimate reason to be there just to get dropped, 
 is any encounter of U+ a parse error?

Yes. Fixed.


 The way the spec is written, U+000D does not occur in the character 
 stream immediately before tokenization, but (as in XML!) it *can* appear 
 in the tree construction stage, because an NCR can expand into U+000D. 
 (I'm not suggesting any changes here--just noting how it is.)

Indeed.


 Since U+000D can occur in the tree construction stage, I think the point 
 under 8.2.2.3.7. How to handle tokens in the main phase that says A 
 character token that is one of one of U+0009 CHARACTER TABULATION, 
 U+000A LINE FEED (LF), U+000B LINE TABULATION, U+000C FORM FEED (FF), or 
 U+0020 SPACE should include U+000D as well.

Good point. Fixed.


 On the other hand, I am wondering why the list of characters that 
 implements the concept of whitespace in the tokenization and tree 
 contruction stages includes U+000B LINE TABULATION and U+000C FORM FEED 
 (FF). Are they required for backwards-compatibility? I would guess they 
 do not actually show up on the Web that often. According to the W3C 
 Validator, those characters do not need to be allowed for formal 
 backwards compatibility with HTML4--on the contrary. 
 http://validator.w3.org/check?uri=http%3A%2F%2Fhsivonen.iki.fi%2Ftest%2Fform-feed-in-tag.html
  
 http://validator.w3.org/check?uri=http%3A%2F%2Fhsivonen.iki.fi%2Ftest%2Fline-tabulation-in-tag.html

I don't have an opinion about U+000B. What would you want changed?

U+000C is allowed because converting text files to HTML can easily end up 
inserting FF characters. (e.g. RFCs have FF characters, conversion to HTML 
often leaves them.) I see no harm in allowing them really.


 In order to make all conforming HTML5 documents serializable as XHTML5, 
 it would be necessary to have a catch-all restriction stating that a 
 document is non-conforming if parsing it causes a non-XML character ( 
 http://www.w3.org/TR/REC-xml/#NT-Char ) to appear in the DOM. For 
 clarity, it would be nice to have the same restriction on the pre-parse 
 character stream, but such a restriction is not strictly necessary for 
 XHTML-serializability.

I don't really think we can guarentee that all conforming HTML5 documents 
be serializable as XHTML5 anyway. I'm reluctant to add catch-all clauses, 
because they tend to have unexpected consequences.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] basefont

2007-06-06 Thread Ian Hickson
On Mon, 20 Mar 2006, Lachlan Hunt wrote:

 I'm just wondering how you're intending to deal with basefont? AFAIK, 
 the only browser that supports it these days is IE, but it does so by 
 breaking the DOM (I could be mistaken, but I think NN4 supported it 
 too).
 
 Considering that no other modern browser supports it and that IE's DOM 
 looks like this when base font is used:
 
 !DOCTYPE html
 titletest/title
 pbasefont face=Arial size=3test/p
 
 #comment: CTYPE ht
 HTML
   HEAD
 TITLE
 BASEFONT face=Arial
   BODY
 P
   #text: test
   BODY (shown as error)
 
 I think it should be made officially obsolete.  It should be inserted 
 into the DOM as an empty element, but UAs should ignore it.  UAs may 
 choose to support it at their own risk, but must not do so by breaking 
 the DOM like IE does.

Agreed. Done.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] The problem of duplicate ID as a security issue

2007-06-08 Thread Ian Hickson
On Thu, 7 Jun 2007, Alexey Feldgendler wrote:

 On Thu, 07 Jun 2007 00:42:31 +0200, Ian Hickson [EMAIL PROTECTED] wrote:
 
   IDs in user-supplied content are only useful as fragment identifiers for
   URLs, and mangling them like that defeats this use case because you
   don't know N before you post the comment, and therefore can't make
   internal links within the body (and it's also unobvious when you try to
   make links to parts of your article afterwards).
 
  True. I don't have a good solution to this that doesn't involve code on
  the server-side, though.
 
 Some form of sandboxing would be one.

If sandboxing would solve it then I'll treat this issue as closed and deal 
with the sandboxing problems separately.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Still more comments and questions on Web Apps 1.0

2007-06-08 Thread Ian Hickson
 elements
 It is strange and potentially confusing that the notion of top and 
 bottom is reversed compared to the conventional use of those terms in 
 connection to stacks.

I agree. I am, however, reluctant to change it at this point, lest I make 
a mistake.



 8.2.2.3.7.
 In the after head phase even white space implies the start of body. Is that
 intentional?

This no longer appears to be the case.


 8.2.2.3.7.
 The algorithms to be run on opening li, dt and dd are do not say anything
 about parse errors when elements whose end tag is not optional get popped.
 Those should, in my opinion, count as parse errors.

Done.


 8.2.2.3.7.
 The insertion modes pertaining to tables specify the handling of comment 
 tokens as parse errors and the comments are inserted on the foster 
 parent. Is that intentional? It looks like an oversight.

This seems fixed now.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] id and xml:id

2007-06-08 Thread Ian Hickson
On Sun, 2 Apr 2006, Henri Sivonen wrote:

 Since UAs handle whitespace in the id attribute inconsistently (see 
 below)

Note that there is interoperability (in that, we have two browsers that do 
the same thing, and one of those is IE, even).


 old specs imply or require whitespace trimming

Old specs imply or require a lot of things. ;-)


 and ids with whitespace are unreferencable from whitespace-separated 
 lists of ids,

True.


 I suggest adding the following language concerning document conformance:
 
 The value of the id attribute must be a string that consists of one or 
 more characters matching the following production: 
 [#x21-#xD7FF]|[#xE000-#xFFFD]|[#x1-#x10] (any XML 1.0 character 
 excluding whitespace).

I've made it non-conforming for an ID to contain a whitespace character.


 Also, I suggest requiring that elements must not have both id and xml:id 
 and requiring that xml:id must not occur in the HTML serialization. 
 (Again, from the document conformance point of view--not disputing 
 requirements on browsers.)

I don't really want to mention xml:id. If someone wants to write a spec 
that affects our spec, that's their business. I don't think it makes sense 
for us to go ahead and then ban their spec. That's not to say that xml:id 
is good or bad, it just doesn't seem relevant to mention it in our spec.


 If an element had both an id attribute and an xml:id attribute with different
 values, the document would not be HTML-serializable, which would be bad.

That applies to any document that has nodes from other namespaces. xml:id 
isn't special in that sense.


 If an element was allowed to have an id attribute and an xml:id attribute with
 the same value, the following constraint from xml:id spec would be violated
 even for conforming docs:
 An xml:id processor should assure that the following constraint holds:
* The values of all attributes of type “ID” (which includes all xml:id
 attributes) within a document are unique.
 ( http://www.w3.org/TR/xml-id/ )

I don't really understand what you mean there.


 Finally, as the ultimate ID nitpicking, the spec should state that it is 
 naughty of authors to turn attributes other than id and xml:id into IDs 
 via the DTD. (Well, using a DTD at all is naughty. :-)

Again, if they want to do that, that's their business. I don't see that as 
a big problem.


 Test case: http://hsivonen.iki.fi/test/wa10/adhoc/id.html
 The script tries every id with a whitespaceless value to see if whitespace is
 trimmed before ID assignment.

 Safari and IE 6:
 
 id='a' PASS
 id='2' PASS
 id='lt;' PASS
 id=',' PASS
 id='auml;' PASS
 id=' c ' FAIL
 id='\nd\n' FAIL
 id='\t\te\t\t' FAIL
 id='#13;f#13;' FAIL

That's what the spec requires today.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] apos; in text/html

2007-06-11 Thread Ian Hickson
On Tue, 25 Apr 2006, Henri Sivonen wrote:
   
   Should apos; be a valid charater reference in text/html? If not, 
   what would be correct error handling?
  
  I went with making it valid, since it's valid in XML.
 
 That's problematic, because allowing it as a conforming entity reference 
 does not add any expressiveness to the language but makes conformance 
 checking less useful as an authoring aid (because apos; fails in IE and 
 such failure could be trivially avoided).

True, but I think having the predefined XML entities be a subset of the 
HTML ones on the long term is better, even if it does cause short-term 
minor pain.


 I think I'm going to emit a warning even if apos; is conforming.

That seems reasonable.


 I am uncomfortable with LT;, GT;, AMP;, QUOT; and COPY; on 
 aesthetic grounds, but at least they work interoperably.

Yeah.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Common number formats

2007-06-11 Thread Ian Hickson
On Tue, 25 Apr 2006, Henri Sivonen wrote:
 
 I assume number formats in attributes consistently do not allow 
 whitespace before and after. Am I right?

The spec says (for unsigned integers):

# A string is a valid non-negative integer if it consists of one of more 
# characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9).

Since space characters aren't in the range U+0030 to U+0039, they are not 
allowed, whether at the start, middle, or end.


 I assume that an explicit + sign is always forbidden. Correct?

The spec says:

# A string is a valid integer if it consists of one of more characters in 
# the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), optionally 
# prefixed with a U+002D HYPHEN-MINUS (-) character.


 Is the - sign forbidden in front of zero? (Would be logical considering 
 that the explicit plus is forbidden.)

The - is always allowed, per the text above.


 Most definitions of integers allow leading zeros but the width and 
 height for canvas don't. Is this intentional?

This has been fixed.


 I guess floats allow leading zeros as well. Correct?

# A string is a valid floating point number if it consists of one of more 
# characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), 
# optionally with a single U+002E FULL STOP (.) character somewhere 
# (either before these numbers, in between two numbers, or after the 
# numbers), all optionally prefixed with a U+002D HYPHEN-MINUS (-) 
# character.


 Do percentages allow leading zeros? (Leading zeros are harmless in 
 Firefox, Opera and Safari, at least.)

I haven't defined percentages yet, as we may not need them. But I imagine 
I would define them as being just a kind of integer followed by a % 
character.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Doctype conformance requirements

2007-06-11 Thread Ian Hickson
On Sun, 7 May 2006, Simon Pieters wrote:
 
 The conformance requirements section[1] states that:
 
  HTML documents that use the new features described in this 
  specification and that are served over the wire (e.g. by HTTP) must be 
  sent as text/html and must start with the following DOCTYPE: !DOCTYPE 
  html.
 
 So, if I read this correctly, HTML documents that aren't served over the 
 wire need not have a doctype?

Fixed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] script type= and style type= parsing

2007-06-11 Thread Ian Hickson
On Sun, 21 May 2006, Anne van Kesteren wrote:

 Based on http://testsuite.org/html/elements/script/001.htm and 
 http://testsuite.org/html/elements/style/001.htm and the results in 
 Internet Explorer, Firefox and Opera it seems parsing can be made pretty 
 strict. The only real problem is script with the type= attribute set 
 to the empty string. It seems all three browsers treat that as if it was 
 an ECMAScript/JavaScript type, while in fact it is not. Internet 
 Explorer handles the same situation with style correctly...
 
 Ian, perhaps you have statistics that show we don't have to worry about 
 script type= and can make the specification to say that browsers 
 must ignore the content in that case?
 
 By the way, I was planning on filing bugs on Mozilla for both the 
 testcases, but couldn't find out what the right component would be. 
 Anyone with ideas?

On Mon, 22 May 2006, Joost 'AlthA' de Valk wrote:

 I've tested those two tests in WebKit nightly, and see that webkit fails 
 a few more than firefox does. I would be very interested in some 
 statistics as well :).

The statistics are depressing.

Anyway, I've defined (to some extent) the processing of type= and 
language=. I'm not really sure exactly what more to say. If we get into 
much more detail, we'll start having to define the difference between 
JS1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.5 with E4X, 1.6, 1.6 with E4X, 1.7, 1.7 
with E4X, JScript, JScript.Encode, and VBScript. At least. I'm not sure we 
want to go there (mostly because I have no idea what we should say).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Steps for finding one or two numbers in a string

2007-06-12 Thread Ian Hickson
On Sat, 9 Jun 2007, Kristof Zelechovski wrote:
  On Fri, 14 Apr 2006, Henri Sivonen wrote:
   I think i18n political correctness has no place in attributes. I 
   think they should be ASCII only with the XML notion of whitespace.
 
  I agree and believe the spec already requires this.

 That statement was not precise enough.  It applies to attribute names, not
 to attributes as such.

I don't understand, could you elaborate?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Steps for finding one or two numbers in a string

2007-06-12 Thread Ian Hickson
On Tue, 12 Jun 2007, Kristof Zelechovski wrote:

 Attribute names are limited to ASCII, attribute values are not.

Neither are limited to ASCII. I don't understand. The discussion was 
concerning the numeric search algorithms for progress bars, not attribute 
names. What exactly are you requesting should change?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Steps for finding one or two numbers in a string

2007-06-12 Thread Ian Hickson
On Tue, 12 Jun 2007, Kristof Zelechovski wrote:

 Attribute names are not and cannot be localized because they are for the 
 software and not for the human reader.  That means they are limited to 
 ASCII whether the standard is specific about that or not.

Ok... So the spec doesn't have to change?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] script type= and style type= parsing

2007-06-12 Thread Ian Hickson
On Tue, 12 Jun 2007, Anne van Kesteren wrote:
 
 Hmm. I hope this will be defined by someone at some point. Well, at 
 least the version switches that are important for interoparability.

Me too. If you have an editor who has the time to work this out, the 
WebAPI WG is probably the best place for it.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Steps for finding one or two numbers in a string

2007-06-12 Thread Ian Hickson
On Tue, 12 Jun 2007, Kristof Zelechovski wrote:

 The specification enumerates all accepted element attributes.  Neither of
 them transgresses ASCII boundaries.  Since it can be directly inferred from
 the text, the explicit statement about that
 http://www.whatwg.org/specs/web-apps/current-work/#attributes0 technically
 is not needed, although it does no harm either.

Ok. Since it does no harm and might help some readers, I'll leave it.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] [wa1] Status of tree construction section

2007-06-13 Thread Ian Hickson
 that 
   require additional behaviour (like IMG, LINK, META etc.).
  
  In Web browsers it's simply not an option. Having to fire mutation 
  events for every mutation according to the complete DOM3 Events model 
  is prohibitively expensive.
 
 To be honest, I've not found it a burden even on the sorts of low-end 
 devices that our software runs (typically 300MHz CPUs, 8MB RAM, that 
 sort of thing)  Then again, I have a highly optimised event dispatcher 
 that takes steps to minimise the work, particularly when there are no 
 DOM listeners for the event being raised, which will almost always be 
 the case for the events concerned (DOMNodeInserted and 
 DOMNodeInsertedIntoDocument and the Removed counterparts).  The internal 
 default event handlers have similar filtering to eliminate any 
 unnecessary processing quickly.

Even minimal work is more than no work, and when you're dealing with 
thousands of elements, that's a big difference (in the order of 
milliseconds).


 In the in body section, WBR doesn't really belong with a,b,big,em... 
 because it never had content.  It probably ought to go in with 
 area,basefont,bgsound... a bit further down, or in its own section.  
 There's no real point bothering with putting it in the list of active 
 formatting elements so it's coming off the stack again straight away.

Fixed.

Thanks,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] [WebApps] Entity consumption

2007-06-13 Thread Ian Hickson
On Fri, 14 Jul 2006, J. King wrote:
 On Fri, 14 Jul 2006 18:53:31 -0400, Ian Hickson [EMAIL PROTECTED] wrote:
  On Fri, 14 Jul 2006, J. King wrote:
   
   There are two paragraphs at the end of section 8.2.1.1:
   
   # When an end tag token is emitted, the content model
   # flag must be switched to the PCDATA state.
   #
   # When an end tag token is emitted with attributes,
   # that is a parse error.
   
   They don't seem to make sense in context; are they editing 
   artefacts?
  
  No, they're intentional... why don't they make sense? They're 
  additional requirements on the tokenisation step.
 
 I was confused because I thought they belonged to section 8.2.1.1.  I 
 see now that they actually belong to 8.2.1, but it's kind of difficult 
 to see that at a glance---8.2.1.1 is so long that the indention gets 
 pretty lost.  Perhaps the 8.2.1.1 section should be after those two 
 paragraphs---or perhaps the two paragraphs should be moved to before the 
 list of parsing states.  The paragraphs seem to fit in better with the 
 more general nature of the beginning of the tokenization section, I 
 think.

Ok, moved them up.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] [WebApps] Parsing: close tag open state

2007-06-13 Thread Ian Hickson
On Sun, 16 Jul 2006, J. King wrote:

 When the content model flag is set to RCDATA or CDATA in the close tag 
 open state, the state machine is supposed to examine the next few 
 character, and if they match the last start tag, also examine the next 
 character to see if it matches whitespace.  If these conditions are not 
 true, then it is supposed to Emit a U+003C LESS-THAN SIGN character 
 token, a U+002F SOLIDUS character token, and reconsume the current input 
 character in the data state.
 
 However, there is no current input character; all the state machine has 
 done so far is look ahead.  Shouldn't it simply emit the two character 
 tokens and switch to the data state?

Fixed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] About adopting quirks mode parsing

2007-06-13 Thread Ian Hickson
On Wed, 19 Jul 2006, Michel Fortin wrote:
 Le 18 juil. 2006 à 21:43, Ian Hickson a écrit :
   
   It might be desirable also that a valid HTML4 document gets a 
   conforming HTML4 DOM. If it is, then ps shouldn't contain table.
  
  I agree.
 
 Is this goal compatible with blockquote, pre, ol, ul, and dl 
 being structured inline-level elements? Let's take this valid snippet of 
 HTML 4:
 
pSome text ulliList item/li/ul
 
 According to HTML 4 parsers, I believe the DOM will be:
 
P
  #text: Some text
UL
  LI
#text: List item

Right. And for compatibility with legacy content, that's what HTML5 does 
too.


 But in HTML 5, where the list can be part of a paragraph, shouldn't the 
 list be put inside the paragraph? Giving this DOM:
 
P
  #text: Some text
  UL
LI
  #text: List item
 
 Or should the list be put inside the paragraph only when you have an 
 explicit closing p tag following the list (so that it becomes invalid 
 HTML 4):
 
pSome text ulliList item/li/ul/p
 
 ?

Neither. As it says in 8.1.2.5. Restrictions on content models [1]:

# A p element must not contain blockquote, dl, menu, ol, pre, table, or ul  
# elements, even though these elements are technically allowed inside p 
# elements according to the content models described in this 
# specification. (In fact, if one of those elements is put inside a p 
# element in the markup, it will instead imply a p element end tag before 
# it.)

The new content models only apply to the DOM and the XML serialisations, 
they can't be expressed in the HTML serialisation.

[1] http://www.whatwg.org/specs/web-apps/current-work/#restrictions

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] About adopting quirks mode parsing

2007-06-13 Thread Ian Hickson
On Wed, 19 Jul 2006, Simon Pieters wrote:
 From: Ian Hickson [EMAIL PROTECTED]
  On Mon, 17 Jul 2006, Simon Pieters wrote:
   As for an algorithm for how to do that, I think that an extra flag would
   be sufficient. If the parser hits !-- while in RCDATA or CDATA, the
   flag is set to true. Then, if the parser hits -- the flag sets to
   false. Initially the flag is false. While the flag is true the element
   can't be closed.
  
  It's slightly more complicated than that due to the whole problem with 
  things like !---, but yes.
 
 You're right. I forgot about that. I've added more test cases (008-014, 
 and 003-004 in rcdata)[1].
 
 Opera never treats !-- as a standalone pseudo-comment.
 
 Firefox treats !-- as a standalone pseudo-comment for script, but not 
 for title and textarea.
 
 IE always treats !-- as a standalone pseudo-comment.
 
 Safari treats !-- as a standalone pseudo-comment for style and script, 
 but not for noscript, noembed and noframes.
 
 Now, I think that !-- should always be treated as a standalone 
 pseudo-comment if !-- will be treated as a standalone real comment (in 
 PCDATA), otherwise never. (If pseudo-comments really are needed, that 
 is.)

I've made the spec do what IE does (not counting conditional comments).

I haven't looked at the parsing of comments in PCDATA mode yet but I'm 
guessing we'll have to support !-- there too.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] parsing: bogus comments - PIs

2007-06-13 Thread Ian Hickson
On Wed, 26 Jul 2006, Shadow2531 wrote:
  
   So, ?xml-stylesheet type=text/css href=? is a bogus comment.
  
   I *was* 100% sure that the PI should be parsed into:
  
   !--?xml-stylesheet type=text/css href=?--
  
  Correct.
 
 Thanks Ian. Can you comment on innerHTML for this situation?
 
 If ?xml-stylesheet type=text/css href=? is parsed into
 !--?xml-stylesheet type=text/css href=?-- , what should
 innerHTML show?

Assuming you mean the .innerHTML of a parent element, it would show the 
comment as you've written it above. See the innerHTML definition in the 
spec:

   http://www.whatwg.org/specs/web-apps/current-work/#innerhtml0

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Events for added nodes while page is loading

2007-06-13 Thread Ian Hickson
On Tue, 1 Aug 2006, Robert Græsdal wrote:

 It'd be nice to have an event that'd tell my script when a new dom node 
 have been added to the DOM tree /while it is loading/. Some documents 
 just take quite a while to load, so it'd be nice to be able to modify 
 nodes as they were added to the DOM tree.

Browser vendors have told me that they don't want to do this due to the 
performance impact of such a feature. Otherwise, we already have this 
feature (DOM mutation events).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] XHTML and document.write()

2007-06-14 Thread Ian Hickson
On Mon, 14 Aug 2006, Anne van Kesteren wrote:

 Just a FYI. You have to deal with the edge case that the root element 
 might be html:script. Non conforming obviously, but what's supposed to 
 happen should still be defined. I guess you would ignore calls to 
 document.write() in such cases or perhaps copy the element and put it 
 inside a html:html element and try again... Ouch!
 
 Not sure if nested html:script element would make things harder 
 here...

document.write() in XHTML is defined to raise an exception. There were 
simply too many edge cases that make no sense whatsoever for me to work 
out how it could work.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] innerHTML and QNames

2007-06-14 Thread Ian Hickson
On Tue, 3 Oct 2006, Simon Pieters wrote:
 
 On getting .innerHTML the spec says that the tag name is used to 
 serialize tags. However, Opera and Firefox use the local name. Also, it 
 isn't certain that element names and attribute names will be all 
 lower-case.

Fixed, as per our discussion on IRC. I took the opportunity to clean up 
the use of the term tag name in a few other places where it was 
ambiguously used.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)

2007-06-14 Thread Ian Hickson
On Mon, 9 Oct 2006, Robert wrote:
 
  In browsers today, the following:
 a href=test xmlns= ... /a
  ...is just a link. If we start supporting xmlns= as it works in XML, but
  in HTML, then literally millions of pages are going to suddenly have their
  links stop working, because a in the  namespace (as opposed to the XHTML
  namespace), is not an HTML a, and thus isn't a link.
 
 How about defining a standard namespace _prefix_ for such additions to 
 HTML? As far as I've seen, all browsers interpret the namespace prefix 
 as part of the tag/attribute, such that for MATHML in HTML, you'd use 
 math:add. It'd require the author use the prefix for all relevant 
 tags, but it should work without changing anything fundamental in UAs 
 that might break other sites. As far as I'm aware, since namespaces 
 don't exist in HTML there's nothing particularily evil about this.

On Mon, 9 Oct 2006, Anne van Kesteren wrote:
 
 This seems much more annoying to author than the proposed alternative.
 
 It's not like we'll have millions of elements to be used in HTML one 
 day. (I hope not, at least!) The language should remain relatively 
 simple. I'm not even sure why people suggest SVG should be included as 
 well as that's a presentational language. It makes much more sense to 
 bind SVG to elements using XBL.

I tend to agree with Anne. It's not clear to me what the advantage of the 
proposed solution would be. It's not really clear to me what the problem 
is, even.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Map lang to xml:lang at the parser level

2007-06-14 Thread Ian Hickson
On Sun, 15 Oct 2006, Simon Pieters wrote:
 
 When parsing HTML and serializing as XML you normally want to change the 
 lang attribute to xml:lang. But why not put it in the XML namespace at 
 the parser level? Then when you serialize the DOM as XML it becomes 
 xml:lang automatically.
 
 The .lang DOM attribute would reflect xml:lang. This would make it 
 simpler to set/get the language with script in XHTML (no need to use 
 namespace-aware methods).
 
 I don't know if this is too expensive on the parser or if there are 
 other flaws but it's just an idea.

It's an interesting idea but it isn't really compatible with what legacy 
UAs do, since they would expose the attribute as 'lang' but this would 
require them attribute to be fetched using getAttributeNS instead of 
getAttribute to get the same effect.

There are enough other subtleties in the differences between HTML5 and 
XHTML5 that I think you'd have to have special code to convert between the 
two anyway. So I'm not sure this would gain you much.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] innerHTML in XML

2007-06-14 Thread Ian Hickson
On Fri, 27 Oct 2006, Anne van Kesteren wrote:

  foo
   bar/
   bar/
  /foo
 
 How can foo.innerHTML be well-formed here?

On Sat, 28 Oct 2006, Lachlan Hunt wrote:

 Anne van Kesteren wrote:
foo
 bar/
 bar/
/foo
  
  How can foo.innerHTML be well-formed here?
 
 It could be if it were treated as an external parsed entity.

I've made the spec explicitly require that innerHTML return an XML 
namespace-well-formed internal general parsed entity representation.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] XHTML5 DOM building and IDness

2007-06-14 Thread Ian Hickson
On Thu, 2 Nov 2006, Henri Sivonen wrote:
 The spec says:
  The rules for parsing XML documents (and thus XHTML documents) into DOM
  trees are covered by the XML and Namespaces in XML specifications, and are
  out of scope of this specification.
 
 However, the spec says the following about the id attribute:

  If the value is not the empty string, user agents must associate the element
  with the given value (exactly) for the purposes of ID matching (e.g. for
  selectors in CSS or for the getElementById()  method in the DOM).

 [...] there is a piece of code somewhere between the XML processor and 
 the resulting DOM tree that is analogous to an xml:id processor and that 
 assigns IDness to attributes that are not in a namespace, have the local 
 name id and belong to elements in the XHTML namespace.

Right, that piece of code is the XHTML UA. Is that a problem? Why would 
the rules resulting from HTML element semantics have to be dealt with by 
the lower level layers?


 The second quote implies that the first quote is not the full story and 
 building a DOM tree from an XHTML document byte stream is not entirely 
 covered by the XML and Namespaces in XML specifications [...]

Not entirely is a polite way of putting it. There's a huge gaping whole 
between the XML spec and the DOM spec, with no actual definition anywhere 
that says how you get from one to the other -- there's no equivalent of 
the HTML parser spec for XML/DOM. It's only because for most things 
there's an obvious mapping that the implementations are interoperable, 
IMHO. This is one reason why I've punted on defining document.write() for 
XML -- without a strict parser spec that defines at which stage the DOM is 
updated, there's no clear definition of how you insert things into the 
parser's input stream, for example.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] 9.2.2: replacement characters. How many?

2007-06-14 Thread Ian Hickson
On Fri, 3 Nov 2006, Elliotte Harold wrote:

 Section 9.2.2 of the current Web Apps 1.0 draft states:
 
 Bytes or sequences of bytes in the original byte stream that could not 
 be converted to Unicode characters must be converted to U+FFFD 
 REPLACEMENT CHARACTER code points.
 
 I'm concerned about the or. For example, suppose there are six upper 
 halves of a Unicode surrogate pair in a row and no lower halves. Does 
 that turn into six replacement characters or one? Both interpretations 
 seem possible.
 
 I suppose I prefer six rather than one, but I don't care a great deal as 
 long as this is locked down one way or the other.

I don't really know how to define this. I'd like to say that it's up to 
the encoding specifications to define it. Any suggestions?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Typo in 9.2.3

2007-06-14 Thread Ian Hickson
On Sun, 5 Nov 2006, Elliotte Harold wrote:

 Otherwise if the next seven chacacters are a case-insensitive match for the
 word DOCTYPE, then consume those characters and switch to the DOCTYPE state.
 
 chacacters -- characters

Fixed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Entity parsing

2007-06-14 Thread Ian Hickson
On Sun, 5 Nov 2006, �istein E. Andersen wrote:

 From section 9.2.3.1. Tokenising entities:
   For some entities, UAs require a semicolon, for others they don't.
 
 This applies to IE.
 
 FWIW, the entities not requiring a semicolon are the ones encoding 
 Latin-1 characters, the other HTML 3.2 entities (amp, gt and lt), as 
 well as quot and the uppercase variants (AMP, COPY, GT, LT, QUOT 
 and REG). [...]

I've defined the parsing and conformance requirements in a way that 
matches IE. As a side-effect, this has made things like naiumlve 
actually conforming. I don't know if we want this. On the one hand, it's 
pragmatic (after all, why require the semicolon?), and is equivalent to 
not requiring quotes around attribute values. On the other, people don't 
want us to make the quotes optional either.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Space characters

2007-06-14 Thread Ian Hickson
On Mon, 6 Nov 2006, Henri Sivonen wrote:
 On Nov 6, 2006, at 07:34, Ian Hickson wrote:
  On Sun, 5 Nov 2006, Henri Sivonen wrote:
   
   Is there a reason why the definition of space characters does not 
   match the XML 1.0 and RELAX NG definition of white space (space, 
   tab, CR, LF) but also includes (line tabulation and form feed)? Is 
   the deviation from XML 1.0 needed for backwards compatibility with 
   text/html UAs?
  
  I made the parser consider VT and FF as being whitespace based on, as 
  I recall, a complete examination of every Unicode character's 
  behaviour in the parsers I was testing. The definition of space 
  characters matches the parser's behaviour for consistency.
  
  The definition of space characters doesn't affect the XML parser 
  stage as far as I can recall, only attribute parsing and DOM 
  conformance.
 
 The potential problem with it affecting DOM conformance is that it may 
 have ripple effects to running XML tooling inside a browser engine. 
 Gecko has an XPath implementation. Disruptive Innovations has created a 
 RELAX NG implementation for Gecko. Running the schemas from 
 syntax.whattf.org on a DOM inside Gecko would be interesting, since it 
 would allow checking DOM snapshots modified by scripts. There may be 
 other reasons to run XML machinery on an HTML DOM in a browser. Both 
 XPath and RELAX NG assume that white space-separated tokens follow the 
 XML notion of white space. Not being able to use the native XPath and 
 RELAX NG notions of splitting on white space would be seriously uncool. 
 Of course, a browser engine might get away with tampering with the XPath 
 or RELAX NG notions of white space since the additional characters don't 
 occur in XML. But does it make sense to inflict the cost of such 
 tweaking on the XML parts of browser engines?
 
 Would there be serious compatibility problems if the HTML5 parsing 
 algorithm required VT and FF to be mapped to space (after expanding 
 NCRs) and the higher-level parts of the spec defined white space as 
 space, tab, CR and LF?

Well, I don't much care about VT, but I really think we should round-trip 
form feed. Consider, for instance, RFCs, which have form feeds. I don't 
like the idea of dropping them on the floor when you convert RFCs to HTML 
and back to text again.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Handling of illegal byte-sequences (typically in UTF-8)

2007-06-14 Thread Ian Hickson
On Fri, 24 Nov 2006, �istein E. Andersen wrote:

 Section 8.1.4:
  Bytes that are not valid UTF-8 sequences must be interpreted as [...] U+FFFD
 
 Section 9.2.2:
  Bytes or sequences of bytes [...] that could not be converted to Unicode 
  characters
  must be converted to U+FFFD
 
 If I read this correctly, section 8.1.4 requires that an illegal UTF-8 
 sequence like F2 BF BF (the three first bytes of a four-byte sequence, 
 obviously not followed by a continuation byte) be converted into exactly 
 three U+FFFD characters (one for each byte), whereas section 9.2.2 also 
 allows one single replacement character (and possibly even two) in this 
 case (and permits an arbitrary number n of repetitions of the three-byte 
 sequence to be replaced by any number of U+FFFD characters between 1 and 
 3n).
 
 I realise that the underspecification in section 9.2.2 may well be 
 intentional, given that this section is not limited to UTF-8, but (quite 
 possibly depending on the handling chosen) this can (more or less 
 easily) be expressed in such a way that it applies to any encoding.
 
 Alternatively, a reference to an authoritative source would of course 
 fulfil the purpose in the particular case of UTF-8 (if such a document 
 can be found).
 
 [Currently, an alert reader might infer that the treatment indicated in 
 section 8.1.4 would be preferable also in section 9.2.2, but such 
 inference for consistency can hardly be expected.]

On Fri, 24 Nov 2006, Henri Sivonen wrote:
 
 I'm inclined to think that interop in error situations doesn't need to 
 go as deep as defining how many replacement characters (in the range 
 1...number of bytes in a faulty sequence) a character decoder has to 
 emit. Apps may want to delegate character decoding to an outside library 
 whose authors don't care about the details of HTML5. (For example, it 
 appears that Safari is leaving this stuff to ICU.) Chances are that 
 there's more value in being able to use a library than in getting a 
 specific number of replacement characters on error.

On Sat, 25 Nov 2006, �istein E. Andersen wrote:
 
 I agree. The current slight inconsistency should probably be amended by 
 making section 8.1.4 more liberal rather than the other way round.

Done.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] HTML syntax: space characters between attributes

2007-06-14 Thread Ian Hickson
On Tue, 28 Nov 2006, Simon Pieters wrote:
 
 The HTML syntax requires space characters between attributes, but the 
 lack of space characters between attributes does not cause a parse error 
 according to the parsing section.
 
   Attributes must be separated from each other and from the
   tag name by one or more space characters.
 
 I'd suggest either making it a parse error or change the syntax to make 
 it optional. (But obviously it can't be optional when the preceding 
 attribute is minimized or unquoted.)

This was changed some time back to make the whitespace optional in most 
cases (except where it would otherwise be ambiguous).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Parsing (and syntax): in unquoted attribute values

2007-06-14 Thread Ian Hickson
On Wed, 29 Nov 2006, Simon Pieters wrote:
 
 The parsing section says that  in unquoted attribute values are a parse 
 error and that it causes the tag token to be emitted. As far as I can 
 tell  does not emit the tag token in at least Firefox, IE6 or Safari. 
 Is it intentional to emit the tag token here? (If it is, why?)
 
 If not, should it still be a parse error (and be disallowed in the 
 syntax section)?

I've removed special processing of .

Note that the following cases no longer close start tags, despite them 
working interoperably in Safari and Firefox:

   divp
   div title p
   div title=p

And the following two no longer close tags either (only worked in 
Firefox):

   div titlep
   /divp

All of these were allowed in SGML, as I understand it.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] file URL is overspecified

2007-06-16 Thread Ian Hickson
On Fri, 15 Jun 2007, Kristof Zelechovski wrote:

 I understand that this is fixed by HTML 5 [...]

Please don't bring up issues we've already fixed. :-)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Minor linking addition to parsing section

2007-06-18 Thread Ian Hickson
On Tue, 5 Dec 2006, James Graham wrote:
 
 It would be useful if in section 9.2.4.4. The trailing end phase, 
 phrases such as Switch back to the main phase and reprocess the token. 
 linked to the part of section 9.2.4.3.6. The insertion mode that 
 define which insertion mode the main phase should be in when making this 
 switch.

I'm not sure I really follow what you mean. You mean link to the sentence 
that reads If the tree construction stage is switched from the main phase 
to the trailing end phase and back again, the various pieces of state are 
not reset; the UA must act as if the state was maintained.?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Windows-1252 entities

2007-06-18 Thread Ian Hickson
On Wed, 6 Dec 2006, Anne van Kesteren wrote:

 The section on handling entities should contain the following mapping:
 [...]
 ... mostly for legacy reasons.

Let me know if the table in that section is what you wanted.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Windows-1252 entities

2007-06-18 Thread Ian Hickson
On Wed, 6 Dec 2006, Sam Ruby wrote:
 
 +1, though I would suggest a one change:
 
   159: 376 // Yuml;

The spec does indeed say this.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Parsing: first bit of Close tag open state

2007-06-18 Thread Ian Hickson
On Wed, 6 Dec 2006, Anne van Kesteren wrote:

 In the top part of the Close tag open state there's no mentioning of 
 consuming the next input character and this is correct. However, then it 
 goes on saying that you should reconsume the current input character in 
 the data state. I think it makes more sense that to say that you just 
 have to switch to the data state there.

This got fixed last week I believe.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Content Model Restrictions on tabletr in HTML

2007-06-18 Thread Ian Hickson
On Wed, 6 Dec 2006, Simon Pieters wrote:
 
 From: Ian Hickson [EMAIL PROTECTED]
  Hm. Actually an optgroup start tag has to imply an /optgroup end tag
  for compatibility with browsers... spec fixed.
 
 Then nested optgroups as allowed in WF2 is just another thing that only 
 works in XHTML5? How many sites would break if /optgroup wasn't 
 implied here?

All those that used more than one optgroup without mentioning 
/optgroup. I don't think that's uncommon.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Content Model Restrictions on tabletr in HTML

2007-06-18 Thread Ian Hickson
On Wed, 6 Dec 2006, Bjoern Hoehrmann wrote:

 * Ian Hickson wrote:
 No conformance criteria are broken if the user agent is assumed to have 
 converted the document to a serialisable form by adding an appropriate 
 tbody element and then serialised that.
 
 If the user agent has not, e.g. it shows a tree of what it thinks it 
 serialised, and that tree doesn't have a tbody between the table and 
 the tr, then the browser has violated A table element must not contain 
 tr elements, under 9.1.2.5. Restrictions on content models.
 
 You are now arguing way outside the context and the draft. For example, 
 the draft does not define what is a serialisable form, when we are to 
 assume a user agent had performed such a conversion, or what it means 
 when something thinks it serialised something; and we were talking 
 about authoring tools, not arbitrary user agents and browsers. My tool 
 did not serialize, by my definition of that word, any 'tbody' element, 
 it would be incorrect to claim otherwise.

I disagree, section 8.1. Writing HTML documents is exactly that (a 
definition of what is a serialisable form), and there is also the XML 
version (it is, presumably, well understood what it means to serialise a 
DOM to an XML instance).


 What you probably mean is when the authoring tool makes claims about the 
 contents of the generated file. If it claims that the file contains a 
 table element with tr child elements then it would be misbehaving, but 
 not because table elements must not have tr child elements, but because 
 there are no such elements in the generated file. But never mind, I 
 certainly do not want to stop you from torturing people who wish to 
 learn about HTML syntax.

If you could provide constructive and less hostile feedback, I would be 
able to fix the spec. Unfortunately as it stands I actually don't know 
what you would like me to change in the document. Could I ask you to 
elaborate? I certainly would like to change the document to be more to 
your liking.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Content Model Restrictions on tabletr in HTML

2007-06-18 Thread Ian Hickson
On Wed, 6 Dec 2006, Henri Sivonen wrote:
 
 Side note:
 I considered the inference of a tbody harmless enough that the validation mode
 I describe as the text/html-compatible subset of XHTML5 allows tr as child of
 table when applied to XML.
 
 Is this a bad idea? I thought it wasn't particularly useful to flag 
 trees with tr as child of table as something that would break if 
 serialized as HTML5 and sent as text/html.

It might break some scripts. Other than that, I don't think it's a big 
deal, no.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] in caption insertion mode

2007-06-18 Thread Ian Hickson
On Sun, 10 Dec 2006, Anne van Kesteren wrote:

 The Anything else case should probably trigger a parse error before 
 reprocessing the current token.

Why? Could you show a sample of markup that would go through this path and 
should trigger an error that isn't flagged?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] innerHTML for HTML and plaintext

2007-06-18 Thread Ian Hickson
On Sat, 9 Dec 2006, Anne van Kesteren wrote:
 On Fri, 08 Dec 2006 22:57:07 +0100, Ian Hickson [EMAIL PROTECTED] wrote:
  
   The section If the child node is a Text or CDATASection node 
   should include the plaintext element.
  
  plaintext in general isn't supported by the innerHTML spec -- for 
  example, it would always introduce a new /plaintext element. Is that 
  a problem?
 
 Yeah, the way the contents of a plaintext element node are returned 
 has given Opera at least one interop issue.

What did you settle on for the implementation? I don't really know how I 
can fix this, given that it is trivial for the plaintext element not to 
be the last element in the DOM.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] several messages about XML syntax and HTML5

2007-06-18 Thread Ian Hickson
On Sun, 10 Dec 2006, Thomas Broyer wrote:
 
 However, text/xml-script would result in a parse-error in HTML5 (if I 
 understand section 9.2 correctly).

I've removed the parse error.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] nobr is not an active formatting element

2007-06-18 Thread Ian Hickson
On Mon, 11 Dec 2006, Anne van Kesteren wrote:

 In current browsers nobr is different is treated differently from 
 other elements.
 
  nobr1nobr2/nobr3
 
 gives:
 
  E: nobr
T: 1
  E: nobr
T: 2
  T: 3
 
  nobr1div2/nobr3/div
 
 gives:
 
  E: nobr
T: 1
E: div
  T: 2
  T: 3
 
 This is quite different from b, strong, etc. and it probably has to 
 be this way too because of site compat.

I've tried to make nobr more compatible with IE (basically it implies a 
/nobr before itself). I'd be extremely interested in implementation 
experience for this.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] HTML syntax: comments before doctype and doctype sniffing

2007-06-18 Thread Ian Hickson
On Mon, 18 Jun 2007, Philip Taylor wrote:
 
 In Firefox 2:
 
 javascript:s='?';for(i=0;i1006;++i)s+='
 ';window.location='data:text/html,'+s+'!doctype
 htmlscriptdocument.write(document.compatMode)/script'
 
 javascript:s='?';for(i=0;i1007;++i)s+='
 ';window.location='data:text/html,'+s+'!doctype
 htmlscriptdocument.write(document.compatMode)/script'
 
 The first produces CSS1Compat, the second BackCompat. As far as I can 
 tell, Firefox requires the doctype to be found when parsing [using 
 standards-mode rules] the first 1024 characters (not bytes) from the 
 first non-whitespace character, and then it reparses the whole document 
 in quirks mode if necessary.

Hm, indeed, how odd. (It doesn't happen if you have purely spaces there, 
you need some sort of content there first.) Still, I don't think we should 
try to duplicate this unless we have evidence that it really is needed, as 
I described in my previous e-mail.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] in cell should handle comments

2007-06-18 Thread Ian Hickson
On Tue, 12 Dec 2006, Anne van Kesteren wrote:

 I don't see why comments have to be processed as if they were in body. 
 They should just be appended to the current node.

Isn't that what happens if they're processed as if they were in body?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Bug in Before DOCTYPE name state?

2007-06-18 Thread Ian Hickson
On Fri, 22 Dec 2006, Thomas Broyer wrote:
 2006/12/22, Ian Hickson:
  On Thu, 21 Dec 2006, Thomas Broyer wrote:
  
   Why is the DOCTYPE marked in error in the former case?
 
  Because otherwise this document:
 
 !DOCTYPEH
 
  ...would emit a DOCTYPE that is not in error (since the token would be 
  emitted before the bit at the end of the DOCTYPE name state).
 
 Doh! right.

This changed recently, by the way, if someone could check that the spec 
still is indeed causing the right errors to be flagged that would be 
great. (I think it is, though some errors moved from the tokeniser to the 
tree construction phase.)


   In other words, why would !DOCTYPE html be in error while 
   !DOCTYPE Html wouldn't?
 
  Both would be not in error, because of the sentence at the end of the 
  DOCTYPE name state.
 
 OK, now understood (thanks you Simon for having enlighted me)

Note that this is now handled quite differently.


  On Thu, 21 Dec 2006, Thomas Broyer wrote:
  
   But it also has this note, which is quite confusing: Because 
   lowercase letters in the name are uppercased by the algorithm above, 
   the HTML letters are actually case-insensitive relative to the 
   markup.
 
  How is it confusing? I would clarify it, but I don't know what is 
  confusing.
 
 Maybe there's no need to clarify it, it might just have been me…

Ok.


   It remains that the tokenization stage is a bit confusing…
 
  Yes. The tree construction stage is even worse. Just implement it 
  exactly as written with no interpretation and you should be fine. ;-)
 
 My problem is that I'm not implementing an emitting parser (à la 
 SAX) but a pulling parser, so I'm stopping as soon as I've found a 
 token and return true to say hey, I've changed the TokenType, Name, 
 Value, etc. properties to reflect a new token. ...so I'm interpreting 
 ;-)
 
 Re tree construction, I'm about to implemented it in two parts: in the 
 pull parser when possible (handling omitted tags and misnested 
 formatting elements) and in a tree fixer otherwise (move the meta 
 and link into head, etc.)

How has that worked for you? Is the spec ok for that approach?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Ampersands not followed by ASCII letters or #

2007-06-18 Thread Ian Hickson
On Wed, 27 Dec 2006, Henri Sivonen wrote:

 I noticed that the Web Apps spec itself contains script samples with 
 unescaped JavaScript  operators in pre blocks.
 
 Considering that this is not an error in HTML 4.01 as SGML and 
 considering that it is harmless in browsers, I think the top-level 
 Anything else case under 8.2.3.1. Tokenising entities should be 
 split in two so that there is also an error-free case for the ASCII 
 characters that aren't '#', aren't ASCII letters and that weren't in 
 error in SGML-based HTML. I don't have The Handbook at my disposal right 
 now, but the error-free case should cover at least '', '' and space 
 characters.

I've allowed:

   U+0009 CHARACTER TABULATION
   U+000A LINE FEED (LF)
   U+000B LINE TABULATION
   U+000C FORM FEED (FF)
   U+0020 SPACE
   U+003C LESS-THAN SIGN
   U+0026 AMPERSAND
   EOF

Let me know if you want any more added to the list.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] adoption agency parse errors

2007-06-18 Thread Ian Hickson
On Sun, 14 Jan 2007, Anne van Kesteren wrote:

 Is it correct that:
 
  !doctype htmlipb/i
 
 throws exactly two parse errors because of step 1, paragraph 3 of the 
 adoption agency algorithm. (Since you need to iterate two times through 
 it the /i will hit there twice.) Some of the submitted testcases from 
 Google to html5lib assume a single error.

My parser explicitly avoided reporting that particular error more than 
once per invokation of the AAA. Exactly how many parse errors are reported 
is up to the implementation, so long as it fits this criteria:

# Conformance checkers must report at least one parse error condition to 
# the user if one or more parse error conditions exist in the document and 
# must not report parse error conditions if none exist in the document. 
# Conformance checkers may report more than one parse error condition if 
# more than one parse error conditions exist in the document. Conformance 
# checkers are not required to recover from parse errors.

Generally speaking, from a UI point of view, you want to only report one 
error per correction that the user has to make. For example, if the user 
omitted a quote mark in a string in a script and that caused the string's 
contents to be treated as code, you'd ideally just want the compiler to 
say missing quote mark, not start listing all the contents of the string 
and say how each part in turn isn't valid script. Sadly, this is often 
quite difficult to achieve!

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] isindex prompt

2007-06-18 Thread Ian Hickson
On Tue, 20 Feb 2007, Anne van Kesteren wrote:

 I think the parsing algorithm should take the prompt= attribute of 
 isindex in account. It replaces the string of characters placed before 
 the input element with its contents. (In that case there will be no 
 characters after the isindex element.)

Done.


On Tue, 20 Feb 2007, Anne van Kesteren wrote:
 
 Also, the prompt= attribute will not be on the input element 
 afterwards.

Done.


On Tue, 20 Feb 2007, Alexey Feldgendler wrote:

 Are there any real-world uses of isindex remaining? Is this element 
 worth thetrouble?

Sadly, yes.


On Tue, 20 Feb 2007, Martijn wrote:
 
 Also, there is an action attribute, so I think it would be wise to
 include that one too.

Done.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] several messages about discouraged things

2007-06-18 Thread Ian Hickson
On Sat, 24 Feb 2007, Keryx Web wrote:
 
 Speaking from __my__ experience, and the experience of those (too few) 
 colleagues that I've met in Sweden who teach standards based web 
 development, it is hard too make the student understand that something 
 is wrong if he/she get's away with it.

Agreed.


 I would like the spec to clearly state what is allowed for backwards 
 compatibility only and what is the preferred way of marking up content.

The spec doesn't allow anything purely for backwards compatibility.


 I would like a spec that clearly says that some ways of marking up 
 content is detrimental to accessibility and perhaps also usability. E.g. 
 frames, including the iframe, or tables used for layout. You would not 
 believe how many colleagues of mine who actually teach that frames are a 
 good thing. My nephew, who studies i a nearby city, even had frames as a 
 required feature of his work!

Frames are out (except iframe, which I don't really see as being a 
problem, though let me know if I'm wrong on this). Tables for layout are 
non-conforming, though I hope to make this clearer in due course. I've 
added a note to myself in the spec to remind myself of this.


On Sun, 25 Feb 2007, Keryx Web wrote:
 
 A few examples that I think is bad practice (99.9 % of the time it's used):
 
 - Inline styles

The media-specific evils of style=, if it is allowed at all, will indeed 
be called out explicitly.


 - Empty p-elements, or p elements containing only nbsp;

The former will be allowed, as there are valid use cases (usually 
involving script). I'd like to ban the latter, but I'm not sure how to do 
it. Any suggestions?


 - A table within a table cell (Has this ever been used for anything but 
 layout?)

There are valid uses of that, though they are rare. But layout tables in 
general will be discouraged (and are non-conforming).


 - Iframes

Why are they bad?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch

2007-06-18 Thread Ian Hickson
On Wed, 7 Mar 2007, Asbjørn Ulsberg wrote:

 (I sent this to the list already, but I think it didn't appear because I 
 sent it with the wrong e-mail address.)
 
 I'm not sure if it has been discussed earlier, but after seeing Chris 
 Wilson's talk on «Browser Wars Episode II: Attack of the DOMs»[1] I 
 think it's pretty obvious that Internet Explorer needs a new switch of 
 some sort, to be allowed to implement and fix the DOM, JavaScript, 
 CSS1-3 etc. without breaking backward compatibility. At least that's 
 what Chris Wilson says.

As I explained on public-html, though, such a switch to introduce yet 
another rendering mode would, on the long term (with this practice 
repeated with each browser version, as Microsoft have indicated is their 
intention), prevent competition in the browser space. That's the worst 
possible outcome from a standards perspective.

   http://lists.w3.org/Archives/Public/public-html/2007Apr/0319.html

(The word processor document space is an example of how bad things can get 
if we go down this road. It is basically impossible to compete with Word 
today because of the myriad of undocumented formats it supports.)


 And I agree. Internet Explorer needs a new switch. So I thought, what 
 about using:
 
   !DOCTYPE html
 
 as the new switch?

That would be, IMHO, disastrous. But, there's nothing we can do to stop 
Microsoft from inventing yet more rendering modes, nor anything we can do 
to stop them using !DOCTYPE html.

We can, however, make it a violation of the specs, and indeed that has now 
been done (quirks mode and DOCTYPE sniffing is part of the spec).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch

2007-06-18 Thread Ian Hickson
On Thu, 8 Mar 2007, Alexey Feldgendler wrote:
 
 Other browsers can also use !DOCTYPE html as an indication to stop 
 applying certain hacks which make them diverge from standards in favor 
 of interoperability with IE.

I've specified DOCTYPE sniffing in the spec now, and that indeed covers 
this suggestion.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch

2007-06-18 Thread Ian Hickson
On Sat, 10 Mar 2007, Jorgen Horstink wrote:
 
 As far as I understand it, the new DOCTYPE switch is meant to 'tell' to 
 browser the document follows the HTML5 specification.

No, the new DOCTYPE is merely meant to trigger standards mode (as opposed 
to quirks mode).


 HTML5 is set up to be backwards compatible with HTML4 documents. The 
 opposite does not hold. There must be at least one new DOCTYPE to 'tell' 
 the browser HTML5 is being served.

No, the opposite does in fact hold. It shouldn't matter whether the 
document is HTML5 or HTML4 or earlier; they all have the same processing 
rules, given by the HTML5 spec.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch

2007-06-18 Thread Ian Hickson
On Sat, 10 Mar 2007, Elliotte Harold wrote:
 
 What are those of us who wish to use XML tools on our documents supposed 
 to use? We will need a real DTD at some point, to declare the entities 
 if nothing else. We will not be able to use !DOCTYPE html.

XML allows you to use, e.g.:

   !DOCTYPE html SYSTEM my.dtd

This is an XML feature, though, unrelated to XHTML5.


 I know some browser-centric folks here just hate DTDs and schemas of any 
 kind; but we will need them, even if the browsers don't. We will create 
 and use them, even if there's no normative DTD in the spec.

That's entirely allowed by XML, indeed.


 One thing that's struck me in working with the spec over the last few 
 days is just how hard it is to follow the various content models, and 
 how much simpler most of them would be to read if they were described in 
 a RELAX NG schema or a DTD.

Having spoken to people who have actually used RELAX NG to describe them, 
I'm not sure that it really would be easier.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch

2007-06-18 Thread Ian Hickson
On Sat, 10 Mar 2007, Robert Brodrecht wrote:
 On Mar 10, 2007, at 4:37 PM, Matthew Ratzloff wrote:
  
  The seem to serve the purpose.  If there are two HTML 5 
  specifications, browser makers can come together to decide which one 
  to support by default when no DOCTYPE is present.  Developers who 
  would prefer the alternate standard could use the appropriate DOCTYPE.
 
 Browsers render in quirksmode by default.  That's been established.  At 
 this point WHATWG has already rejected DTDs in DOCTYPE and seems pretty 
 set on not including it.  I myself would rather have some type of 
 versioning (DTD or otherwise) in the DOCTYPE.  All I've heard from 
 WHATWG is that they don't really even like the DOCTYPE.  If browsers 
 didn't use DOCTYPE as the standards mode switch, DOCTYPE probably 
 wouldn't even be in WHATWG's HTML 5.
 
 I'm sure most people have heard the saying Choose your battles.  
 Fighting for DTDs or some other type of versioning in the DOCTYPE in 
 WHATWG's spec is not a fight that can be won as far as I can tell.  
 Having some method to tell people what spec an author is using can be 
 won.

It's not that it's a fight that can't be won, it's just that the arguments 
I've heard from people about why they think we shouldn't have versioning 
information are more convincing than the arguments from those who think we 
should have versioning information.

(The arguments against versioning appeal to evidence that having 
versioning actively harms the Web and threatens the ability for 
competiting browsers to exist; the arguments in favour tend to be more 
about solving theoretical problems. It's an easy choice, really.)


 If there is no versioning system, there is no way to specify an 
 alternate standard.

Whatever happens, there'll only be one successful HTML standard on the 
Web. We don't need a technical means to chose between them, the market 
will do that for us.

In any case, we just have the one standard right now (since the W3C and 
WHATWG are working on the same document).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Using the HTML5 DOCTYPE as a new quirksmode switch

2007-06-18 Thread Ian Hickson
On Sat, 10 Mar 2007, [EMAIL PROTECTED] wrote:
 On Mar 10, 2007, at 8:38 AM, Mihai Sucan wrote:
 
  There's no way to advertise the document as HTML 5, and it's certainly 
  not the purpose of the specification to do so.
 
 This is a problem.  It is especially a problem now that the W3C is 
 working on their version of HTML 5.  When I asked Ian Hickson how WHATWG 
 would handle divergence in the W3C spec [1], he said he intended to 
 make every effort to keep the two in sync. [2] While I appreciate his 
 effort and I fully believe that he will do his best, we are dealing with 
 a body (i.e. the W3C) who have a history of stubbornness and 
 unwillingness to work with important members of the community. [3] The 
 future is still undecided, but I don't think it is a good idea to 
 operate under the assumption that the W3C will copy and paste the entire 
 WHATWG HTML 5 spec.

 [1] http://blog.whatwg.org/w3c-restarts-html-effort#comment-2020
 [2] http://blog.whatwg.org/w3c-restarts-html-effort#comment-2022
 [3] http://meyerweb.com/eric/thoughts/2006/08/14/angry-indeed/

Right now the two groups are using exactly the same spec, byte-for-byte, 
just with a different header.


 Even if DTDs are non-normative and antiquated in the HTML 5 spec, it at 
 least provides some method for authors to indicate their intentions.  
 If my intention was to write a document conforming to HTML 3.2, I can 
 use the HTML 3.2 DTD to tell anyone in the future that I was using a 
 certain set of elements.

Wouldn't simply the act of using those elements be enough to say which 
elements you used?


 If browsers pay no attention to DTDs, as WHATWG has said time and again, 
 browsers must be rendering the latest and greatest markup.  If in 50 
 years, the i element has been out of use for 40 years, and browsers 
 stop rendering that element and validators throw errors on that element, 
 the document still conforms to the DTD.  It's not the author's fault 
 that the document doesn't perform the way it intended.  Ideally, the 
 browser should care about DTDs.

If you're arguing that browsers in 50 years should have two modes, one 
with the obsolete i element not supported, and one with the obsolete i 
element supported (to support old content), why wouldn't it be better to 
simply have one rendering mode, which supported i?


 The WHATWG HTML 5 spec provides no way to specify what version / fork of 
 HTML the author intended to use.  Even if browsers don't pay attention, 
 I think it is a shame that there is no way to specify (if for nothing 
 else, to future-proof documents).  I blogged about this in more detail. 
 
 http://robertdot.org/2007/03/08/html-5-whatwg-versus-w3c.html

I don't understand why it matters what version you're using. Or, for that 
matter, how most authors are supposed to work out which version they're 
using.


 It seems the WHATWG is staunchly against DTDs, even if it has an 
 appropriate use (e.g. emails in this thread talking about XML entities).

It's perfectly possible to use DTDs and entities when using XML with 
XHTML5. Nothing in the spec prevents that. (Browsers somewhat prevent it, 
since they don't fetch DTDs, though.)


 I've mulled over this awhile.  Since DTDs aren't normative in browsers, 
 perhaps a link element with a rel=specification and an 
 href=http://www.whatwg.org/specs/web-apps/current-work/; (for example) 
 would be a new way to say, this is the specification I used to create 
 this document.  It is easier to remember than the DOCTYPE DTDs on 
 pervious versions of HTML, and it is much more human-readable than DTDs.  
 It addresses my concerns, and doesn't use DTDs.

Why does it matter which spec you used?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Getting .innerHTML and pre\n

2007-06-18 Thread Ian Hickson
On Mon, 19 Mar 2007, Simon Pieters wrote:

 The parsing section says that a linefeed character following a pre start 
 tag token is dropped, and the syntax section says that when serializing 
 a linefeed must be included if the pre starts with a linefeed. So far so 
 good.
 
 However, getting .innerHTML doesn't add the newline. Thus, if you parse 
 and serialize with .innerHTML several times you keep eating linefeeds 
 from pre. I think this is a problem.
 
 Step 2 in the algorithm for getting .innerHTML, If the child node is an 
 Element, should include something along the following lines (some after 
 the Append a U+003E GREATER-THAN SIGN () character. paragraph):
 
   If the child node is an Element with a tag name pre then append a U+000A
   LINE FEED (LF) character.
 
 This will always add the linefeed even when it's not needed, but I guess 
 that's fine.

Fixed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] innerHTML in HTML documents with multiple namespaces

2007-06-19 Thread Ian Hickson
On Tue, 27 Mar 2007, Thomas Broyer wrote:
 
 I'm actually wondering what is supposed to be tag name for an element 
 which is not in the HTML namespace (e.g. created with 
 document.createElementNS). Is it the localName or the tagName (qualified 
 name, i.e. with prefix)?

The tag name is fully qualified (as in tagName).

 In other words, what should document.body.innerHTML end with after this
 script:
 var svg_svg = document.createElementNS(http://www.w3.org/2000/svg;,
 svg:svg);
 document.body.appendChild(svg_svg);
 
 Should it end with svg/svg or svg:svg/svg:svg? (Firefox
 would have svg/svg)

The spec requires the latter at the moment.


 Also, should the tag name be lowercased before inclusion in the output 
 or the algorithm is just assuming the tag name of HTML elements have 
 already been lowercased elsewhere? (Firefox keeps uppercase letters; 
 elements created with document.createElement are HTMLElements and have 
 their names lowercased at creation time; as described in the spec)
 
 Same questions with attribute names ;-)

I think the spec has been clarified regarding this, let me know if it is 
not clear still.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Parsing: comment tokenization

2007-06-19 Thread Ian Hickson
On Sat, 7 Apr 2007, Anne van Kesteren wrote:

 The tokenization section should also handle:
 
  !--
  !---
 
 as correct comments for compat with the web. This means that
 
  !
 
 shows -- and that
 
  !-
 
 shows --.

These comments are not handled (though not conformant).


On Sat, 7 Apr 2007, Nicholas Shanks wrote:
 
 Why on earth is this a good idea?

IE7 does it. The assumption is that content therefore depends on it.

 AFAIK browsers and other HTML clients don't currently treat these as 
 comments

This seems to disagree with my research.


 [...] compelling them to do so will cause several problems:
 
 1) Web developers currently expect things like !--5?-- to result in 
 the comment greater than five?. Changing such expectations on a whim 
 is harmful.

It is not clear to me that this is indeed true.


 2) A double HYPHEN-MINUS delimits comments within tags, this provides 
 compatibility with XML and SGML and changing this needlessly in HTML5 
 will just complicate conversion.

This, unfortunately, is impractical. (I say this despite having personally 
pushed for this for years.)


 3) You claim compat with the web but don't provide any evidence to 
 support that. Are there huge numbers of sites expecting !-- to 
 represent a comment without content? Can such sites not be fixed instead 
 of polluting HTML with additional rules? I'd rather have a handful of 
 broken sites that their authors will fix than saying to the other 99% of 
 authors hey, you can now do this and ending up with millions of broken 
 sites. (I say broken, because they will not be backwards compatible with 
 current or previous UAs)

It seems that they will in fact be compatible; but I agree, we shouldn't 
encourage it. The spec makes them non-conforming.


On Sat, 7 Apr 2007, Nicholas Shanks wrote:

 Even you must (begrudgingly?) admit that comments formatted as in your 
 original post are not backwards compatible, even if they do reflect the 
 state of modern UAs as you say.

How can both those statements be true?


 I don't believe I am 'pretending' anything. Just stating that diverging 
 further from SGML for No Good Reason is pointless. (And yes, supporting 
 a few odd websites that do this already counts as not a Good Reason, 
 websites can always be fixed!)

Sadly, Web sites can't always be fixed. Many sites have been long 
abandoned and are no longer updated.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Parsing: /li should be ignored

2007-06-19 Thread Ian Hickson
On Sat, 14 Apr 2007, Simon Pieters wrote:

 For compatibility with IE the parsing algorithm should probably ignore 
 /li tags.
 
 Test case for the above proposal:
 
   !doctype html
   style
   * { margin:0; padding:0; }
   ul { background:red; }
   li { background:lime; }
   /style
   ulli/liThis line should be green./ul

I've thought this over and as much as I'd like to be compatible with IE on 
this, there are a number of issues with it.

There's the way that every other browser doesn't do this, which makes it a 
very risky change. It also means there may not be an immediate need to do 
this, since browsers only tend to disagree with IE when doing so doesn't 
break much content.

There's the problem that it makes it difficult to know how to handle 
things like:

   ullitest/li!--test--/ul

It would also make future expansion difficult, too. This would have to be 
applied to dt and dd, and would make constructions like:

   x dt xx /dt xx dd xx li xx /li xx /dd xx /x

...have very different results than it appears.

So, unless there's a strong reason to, I suggest we don't change this.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] web-apps/current-work/#datetime-parser

2007-06-19 Thread Ian Hickson
On Tue, 17 Apr 2007, Sam Ruby wrote:

 Step 25
 
 If sign is negative, then shouldn't timezoneminutes also be negated?

Fixed.


 Step 27
 
 Shouldn't that be SUBTRACTING timezonehours hours and timezoneminutes
 minutes?
 
 My current time is 2007-04-17T05:28:33-04:00  The timezone is -4 hours from
 UTC.  To convert to UTC I need to add 4 hours.

Fixed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] void elements vs. content model = empty

2007-06-19 Thread Ian Hickson
On Wed, 18 Apr 2007, ryan wrote:

 So, I was just trying to check my blog for HTML5 conformance [1] and ran 
 into a conformance problem that I had trouble sorting out.
 
 The conformance checker said:
 
1. Fatal Error: End tag param seen even though the element is an empty
   element.
   Line 121, column 73 in resource http://theryanking.com/blog/
 
 so, I went to http://www.whatwg.org/specs/web-apps/current-work/#param 
 to see what the restrictions on param are. In that section it says:
 
  Content model:
 Empty.
 
 which brought up the question what's 'empty' mean?. In my mind, it 
 could either be no content allowed or must be a void element (ie, no 
 end tag).

The content model only describes what conformance means at the DOM level, 
it doesn't affect the syntax.

To tell if you're allowed to have a closing tag, you have to see the 
syntax section, where it says:

# Void elements only have a start tag; end tags must not be specified for 
# void elements.

...and:

# Void elements
#base, link, meta, hr, br, img, embed, param, area, col, input

 -- http://www.whatwg.org/specs/web-apps/current-work/#elements0


 It'd be nice if we could make this clearer in the spec– even though the 
 language and html serialization are two different things, for the sake 
 of authors it'd be nice to have pointers between the two.

Yeah... I'm not really sure how to do that yet. I think on the long term I 
may add an informative block to the element definitions (the green boxes) 
that says something like:

   Syntax in text/html:
  Start tag may be omitted
  End tag must be omitted

...or whatever.


 Also, if there's a difference between content=empty and 'void elements' 
 it deserves an explanation.

One is just about the content model, the other is just about the syntax. 
They're not really related, though it happens to be the case that all 
elements that have an empty content model are void elements in HTML.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Parsing: should foodd/foo close the DD?

2007-06-19 Thread Ian Hickson
On Fri, 20 Apr 2007, Simon Pieters wrote:

 I sent a bug report to Opera saying that given the markup 
 foodd/fooX, X should be a sibling to FOO instead of a child of 
 DD. According to Anne the bug report was invalid per the current spec:
 
 On Fri, 20 Apr 2007 09:03:29 +0200, [EMAIL PROTECTED] wrote:
 
  I think this bug report is invalid. When you hit /foo dd is the 
  bottommost node of the stack. dd is in neither the formatting nor 
  phrasing category (it's in special) and therefore the /foo end tag 
  is ignored.
 
 However, in IE, Firefox and Safari, the DD does get closed at /foo, so 
 perhaps this is a bug in the spec?

I could only get /foo to close the dd in Firefox.

In IE, the foo is treated as a void element.

Opera and Safari seem to follow the spec.

Without further evidence that this breaks things, I'd rather just leave 
the spec as is.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Incorrect character codes

2007-06-19 Thread Ian Hickson
On Fri, 20 Apr 2007, Philip Taylor wrote:

 Section 8.2.3.1:
 U+0061 LATIN SMALL LETTER A through to U+0078 LATIN SMALL LETTER F,
 and U+0041 LATIN CAPITAL LETTER A, through to U+0058 LATIN CAPITAL
 LETTER F
 Should say:
 U+0061 LATIN SMALL LETTER A through to U+0066 LATIN SMALL LETTER F,
 and U+0041 LATIN CAPITAL LETTER A through to U+0046 LATIN CAPITAL
 LETTER F

It seems this is fixed now.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Parsing: in unquoted attribute values

2007-06-19 Thread Ian Hickson
On Wed, 25 Apr 2007, Simon Pieters wrote:

 The parsing section says that  in an unquoted attribute value 
 terminates the tag. However, according to my testing[1], IE7, Gecko, 
 Opera and Webkit don't do this -- they append the  to the attribute 
 value. So I think the parsing section is wrong here.

This was fixed recently.


 Additionally, the syntax section says that authors are not allowed to 
 use  in unquoted attribute values, which should probably be changed if 
 the parsing section is changed.

Oops, forgot to fix that last time. Fixed now.


On Wed, 25 Apr 2007, Anne van Kesteren wrote:
 
 IE also lets  be an attribute. It can also be part of an attribute or 
 element name. This means that:
 
  p/ptest
 
 will become a 'p' element with a 'p' attribute which has 'test' as 
 textContent. This basically means less exceptions in the tokenizer for 
 the '' character which would be fine with me.

HTML5 requires this now.


On Wed, 25 Apr 2007, Anne van Kesteren wrote:
 
 As I just mentioned on IRC, this essentially means removing the SHORTTAG 
 TAGC OMISSION feature of SGML which appears not be supported by Internet 
 Explorer, Opera and maybe Safari.

Indeed.


On Wed, 25 Apr 2007, Jonas Sicking wrote:
 
p/ptest
  
  will become a 'p' element with a 'p' attribute which has 'test' as 
  textContent. This basically means less exceptions in the tokenizer for 
  the '' character which would be fine with me.
 
 We do no longer support this in mozilla (if we ever did). A reason we 
 now explicitly forbid this is we don't want it to ever be possible to 
 create elements with 'illegal' names. Same thing goes for attribute 
 names. This is partially for security reasons since some elements and 
 attributes carry very important security information.

On Thu, 26 Apr 2007, Anne van Kesteren wrote:
 
 Could you elaborate on the security issues? Could you also give a definition
 of illegal names as it's not really clear to me what that means for HTML.

On Fri, 27 Apr 2007, Jonas Sicking wrote:
 
 Basically, for input type=file value=/etc/passwd, if part of the 
 code thinks that that is an input element, where as other parts 
 thinks that is and input element, you might end up in a situation 
 where the browser sends the /etc/passwd file to the server without user 
 interaction.

That seems a bit specious given that for type=file you'd have to ignore 
value= anyway. Furthermore, making the  be _not_ part of the tag name 
is what causes the security issue, as it's only when you _don't_ put it in 
the tag name that you end up with an input element.

Anyway, that's the advantage of having a single, well-defined tokeniser, 
you don't have to worry about differences in opinion. :-)


 It also seems like a bad idea to allow a document to be parsed such as 
 there is no way to serialize it without creating an invalid html5 
 serialization.

We are well past that point. Example:

   p bogus=

...can be parsed but can't be serialised legally.


 As far as element names go, i don't really see a reason to allow more, 
 or less, characters than the XML spec lets you use.

The main reason is that you have to define what happens to the characters 
you don't allow. We don't have the option of fatal failure.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Script, style and backwards compatibility

2007-06-19 Thread Ian Hickson

(Thanks for forwarding forum feedback to the list. Feel free to forward my 
reply back to the forums, and please do continue to forward feedback from 
the forums, or blogs, or anywhere else, to the list!)

On Mon, 30 Apr 2007, Simon Pieters wrote:
  
  From http://forums.whatwg.org/viewtopic.php?t=38
 
 Make noscript allowed in XHTML5

Unfortunately the way noscript works makes it impractical in XHTML.

You can have similar effects, however, by just using script to remove the 
section:

   div class=noscript.../div
   script
 var n = document.getElementsByClassName('noscript');
 for (var i = 0; n lt; n.length; n += 1)
   n[i].parentNode.removeChild(n[i]);
   /script

...or some such. (Untested.)


 and generally remove differences between HTML5 and XHTML5 where 
 possible.

Indeed, removing unnecessary differences is a goal (though it is not the 
most important goal, and so can be trumped; for example backwards 
compatibility would override it, as it does with noscript).


 This could thus also imply:
 
 * Don't disallow lang= in XHTML5

Having both xml:lang= and lang= would actually cause more 
round-tripping problems (if they were both allowed), since xml:lang can't 
be used in HTML. We can't drop xml:lang, though, since XML defines it.


 * Don't disallow base href in XHTML5.

This is mostly disallowed because generic XML processing wouldn't know 
about it, and so URIs in unrelated languages like SVG would change meaning 
based on whether the UA knew XHTML or not.


 * Don't disallow meta charset in XHTML5 (it doesn't do any good, but 
 doesn't harm either).

We could allow it if we required that there be an XML declaration that had 
the same encoding specified, but then that wouldn't be the same as HTML5, 
so we wouldn't have won anything.


On Mon, 30 Apr 2007, Simon Pieters wrote:
 Anne wrote:
  
  xml:lang should be treated the same as xml:id imo (except that for now 
  I suppose they have different handling if both the xml: and normal 
  attribute specified).
 
 Agreed.

We can't treat xml:lang like xml:id. An element can have multiple IDs, it 
can't have multiple languages.


In conclusion, while I agree with the principle of keeping XHTML and HTML 
as close to each other as possible, I don't think they're further apart 
than is actually necessary.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Parsing: don't move meta and link to head

2007-06-19 Thread Ian Hickson
On Mon, 21 May 2007, Anne van Kesteren wrote:

 Internet Explorer 7 and Opera 9 don't move meta and link to the 
 head element during parsing (much like they don't do that for 
 style). I think that's a good enough reason to change the parsing 
 specification to match that behavior. Besides the fact that it is more 
 sensible as the DOM and the original input stream are closer to each 
 other.

Done.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Parsing: ignore /head?

2007-06-19 Thread Ian Hickson
On Mon, 21 May 2007, Anne van Kesteren wrote:

 If we simply ignore /head there's no longer a need to append elements 
 to the head element pointer. In fact, we can remove it. I'm not sure how 
 much this would complicate conformance checking, but it would certainly 
 be very nice not to have such strange appending rules for the limited 
 set of elements that have that now (link, meta, style, base).

This would screw up the placement of comments between head and body.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


<    2   3   4   5   6   7   8   9   10   11   >