Re: [whatwg] parsing: bogus comments - PIs
On 6/13/07, Ian Hickson [EMAIL PROTECTED] wrote: On Wed, 26 Jul 2006, Shadow2531 wrote: So, ?xml-stylesheet type=text/css href=? is a bogus comment. I *was* 100% sure that the PI should be parsed into: !--?xml-stylesheet type=text/css href=?-- Correct. Thanks Ian. Can you comment on innerHTML for this situation? If ?xml-stylesheet type=text/css href=? is parsed into !--?xml-stylesheet type=text/css href=?-- , what should innerHTML show? Assuming you mean the .innerHTML of a parent element, it would show the comment as you've written it above. See the innerHTML definition in the spec: http://www.whatwg.org/specs/web-apps/current-work/#innerhtml0 Thanks. That clears it up now. My notes for reference: Given HTML5 markup: div id=test?xml-stylesheet type=text/css href=?/div Since PIs in markup are parsed as bogus comments, the above is parsed as: div id=test!--?xml-stylesheet type=text/css href=?--/div and the comment is parsed into the DOM as a comment node. document.getElementById(test).innerHTML should then return the string: !--?xml-stylesheet type=text/css href=?-- because that's what !-- + document.getElementById(test).data + -- should equal. Required changes: Since Firefox, IE, Opera and Safari do not conform to this: Firefox and Safari will have to stop ignoring PIs in markup and treat them as comments. Opera and IE will have to start treating PIs in markup as comments. -- Michael
[whatwg] charset sniffing algorithm and space characters around the charset name
As written, the charset sniffing algorithm doesn't trim space characters from around the tentative encoding name. html5lib test case expect the space characters to be trimmed. I suggest trimming space characters (or anything = 0x20 depending on which approach is the right for compat). -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
[whatwg] Parsing: comments (was: Re: About adopting quirks mode parsing)
On Thu, 14 Jun 2007 03:17:16 +0200, Ian Hickson [EMAIL PROTECTED] wrote: I haven't looked at the parsing of comments in PCDATA mode yet but I'm guessing we'll have to support !-- there too. Yeah, and !--- too iirc. There's some other e-mail dealing with that. And maybe also !-- --! given the amount of people that rely on that working versus people that rely on !-- --! -- working... We're encountering some difficulties with the current algorithm. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
[whatwg] server-sent events and rfcs 2068 and 2616
Hi, I was wondering how you mitigate the persistent connection limitations described in RFCs 2068 and 2616 vs server-sent events. It seems the former limits the laters usability. Thanks, Reed
Re: [whatwg] XHTML and document.write()
On Mon, 14 Aug 2006, Anne van Kesteren wrote: Just a FYI. You have to deal with the edge case that the root element might be html:script. Non conforming obviously, but what's supposed to happen should still be defined. I guess you would ignore calls to document.write() in such cases or perhaps copy the element and put it inside a html:html element and try again... Ouch! Not sure if nested html:script element would make things harder here... document.write() in XHTML is defined to raise an exception. There were simply too many edge cases that make no sense whatsoever for me to work out how it could work. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] innerHTML and QNames
On Tue, 3 Oct 2006, Simon Pieters wrote: On getting .innerHTML the spec says that the tag name is used to serialize tags. However, Opera and Firefox use the local name. Also, it isn't certain that element names and attribute names will be all lower-case. Fixed, as per our discussion on IRC. I took the opportunity to clean up the use of the term tag name in a few other places where it was ambiguously used. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)
On Mon, 9 Oct 2006, Robert wrote: In browsers today, the following: a href=test xmlns= ... /a ...is just a link. If we start supporting xmlns= as it works in XML, but in HTML, then literally millions of pages are going to suddenly have their links stop working, because a in the namespace (as opposed to the XHTML namespace), is not an HTML a, and thus isn't a link. How about defining a standard namespace _prefix_ for such additions to HTML? As far as I've seen, all browsers interpret the namespace prefix as part of the tag/attribute, such that for MATHML in HTML, you'd use math:add. It'd require the author use the prefix for all relevant tags, but it should work without changing anything fundamental in UAs that might break other sites. As far as I'm aware, since namespaces don't exist in HTML there's nothing particularily evil about this. On Mon, 9 Oct 2006, Anne van Kesteren wrote: This seems much more annoying to author than the proposed alternative. It's not like we'll have millions of elements to be used in HTML one day. (I hope not, at least!) The language should remain relatively simple. I'm not even sure why people suggest SVG should be included as well as that's a presentational language. It makes much more sense to bind SVG to elements using XBL. I tend to agree with Anne. It's not clear to me what the advantage of the proposed solution would be. It's not really clear to me what the problem is, even. Cheers, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Map lang to xml:lang at the parser level
On Sun, 15 Oct 2006, Simon Pieters wrote: When parsing HTML and serializing as XML you normally want to change the lang attribute to xml:lang. But why not put it in the XML namespace at the parser level? Then when you serialize the DOM as XML it becomes xml:lang automatically. The .lang DOM attribute would reflect xml:lang. This would make it simpler to set/get the language with script in XHTML (no need to use namespace-aware methods). I don't know if this is too expensive on the parser or if there are other flaws but it's just an idea. It's an interesting idea but it isn't really compatible with what legacy UAs do, since they would expose the attribute as 'lang' but this would require them attribute to be fetched using getAttributeNS instead of getAttribute to get the same effect. There are enough other subtleties in the differences between HTML5 and XHTML5 that I think you'd have to have special code to convert between the two anyway. So I'm not sure this would gain you much. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] innerHTML in XML
On Fri, 27 Oct 2006, Anne van Kesteren wrote: foo bar/ bar/ /foo How can foo.innerHTML be well-formed here? On Sat, 28 Oct 2006, Lachlan Hunt wrote: Anne van Kesteren wrote: foo bar/ bar/ /foo How can foo.innerHTML be well-formed here? It could be if it were treated as an external parsed entity. I've made the spec explicitly require that innerHTML return an XML namespace-well-formed internal general parsed entity representation. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Allowed characters in attribute names (was: Re: Stepsfor finding one or two numbers in a string)
Your hypothetical author is unable to insert an embed element because embed is all English to him. Being able to use a Mandarin attribute name will not help him much because he cannot produce the element to use it with. Considering Arabic script and the like, the time is probably near when we will have to learn it anyway. But we still have some time left, so let's just use the opportunities. The day is full of troubles even without your fantasizing. Cheers, Chris -Original Message- From: Charles McCathieNevile [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 4:40 AM To: Kristof Zelechovski; 'Simon Pieters'; 'Thomas Broyer'; [EMAIL PROTECTED] Subject: Re: [whatwg] Allowed characters in attribute names (was: Re: Stepsfor finding one or two numbers in a string) On Wed, 13 Jun 2007 11:18:28 +0200, Kristof Zelechovski [EMAIL PROTECTED] wrote: Why should I want to use a localized attribute name for the embed element? Because the only languages you speak are mandarin, cantonese and han, and you are using an IDE to develop your system that only requires you to deal with localised stuff for the rest of it. Actually, that isn't using a localised attribute name, just one that actually has a little bit of obvious semantics. Would it make sense to require english speakers to use arabic characters? While english is a very widely spoken language, most people still don't speak a latin language. cheers Chaals -- Charles McCathieNevile, Opera Software: Standards Group hablo espanol - je parle français - jeg larer norsk [EMAIL PROTECTED] Catch up: Speed Dial http://opera.com
Re: [whatwg] XHTML5 DOM building and IDness
On Thu, 2 Nov 2006, Henri Sivonen wrote: The spec says: The rules for parsing XML documents (and thus XHTML documents) into DOM trees are covered by the XML and Namespaces in XML specifications, and are out of scope of this specification. However, the spec says the following about the id attribute: If the value is not the empty string, user agents must associate the element with the given value (exactly) for the purposes of ID matching (e.g. for selectors in CSS or for the getElementById() method in the DOM). [...] there is a piece of code somewhere between the XML processor and the resulting DOM tree that is analogous to an xml:id processor and that assigns IDness to attributes that are not in a namespace, have the local name id and belong to elements in the XHTML namespace. Right, that piece of code is the XHTML UA. Is that a problem? Why would the rules resulting from HTML element semantics have to be dealt with by the lower level layers? The second quote implies that the first quote is not the full story and building a DOM tree from an XHTML document byte stream is not entirely covered by the XML and Namespaces in XML specifications [...] Not entirely is a polite way of putting it. There's a huge gaping whole between the XML spec and the DOM spec, with no actual definition anywhere that says how you get from one to the other -- there's no equivalent of the HTML parser spec for XML/DOM. It's only because for most things there's an obvious mapping that the implementations are interoperable, IMHO. This is one reason why I've punted on defining document.write() for XML -- without a strict parser spec that defines at which stage the DOM is updated, there's no clear definition of how you insert things into the parser's input stream, for example. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] 9.2.2: replacement characters. How many?
On Fri, 3 Nov 2006, Elliotte Harold wrote: Section 9.2.2 of the current Web Apps 1.0 draft states: Bytes or sequences of bytes in the original byte stream that could not be converted to Unicode characters must be converted to U+FFFD REPLACEMENT CHARACTER code points. I'm concerned about the or. For example, suppose there are six upper halves of a Unicode surrogate pair in a row and no lower halves. Does that turn into six replacement characters or one? Both interpretations seem possible. I suppose I prefer six rather than one, but I don't care a great deal as long as this is locked down one way or the other. I don't really know how to define this. I'd like to say that it's up to the encoding specifications to define it. Any suggestions? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Typo in 9.2.3
On Sun, 5 Nov 2006, Elliotte Harold wrote: Otherwise if the next seven chacacters are a case-insensitive match for the word DOCTYPE, then consume those characters and switch to the DOCTYPE state. chacacters -- characters Fixed. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
[whatwg] Canvas shadow rendering
I've looked at how Safari renders shadows - the spec should probably define something similar, since it works and it's not insane or anything. Just before a shape/image is drawn, a shadow image is created (based on the original shape/image's alpha values (ignoring the RGB entirely) and the shadow colour/offset/blur). That shadow image is then drawn as normal (affected by globalAlpha and globalCompositeOperation), and then the original shape/image is drawn on top as normal. The shadow image copies the original alpha values, then gets Gaussian-blurred (http://en.wikipedia.org/wiki/Gaussian_blur etc). The σ parameter in the Gaussian function is derived from shadowBlur: as far as I can tell, the best approximation to Safari's behaviour is with σ = (if shadowBlur 8 then shadowBlur/2 else sqrt(2*shadowBlur)). http://canvex.lazyilluminati.com/misc/shadow/shadow1.html (in Safari) shows its shadow rendering compared to that Gaussian function. There's not a perfect correspondence, but there's at least one place where Safari is simply buggy (it cuts off the left edge by one pixel when shadowBlur = 6) so it's never going to be a perfect correspondence, and it looks close enough to me. But if anyone has a better idea of the exact equation, that would be good to know, since I got fed up with trying to guess :-) After that, it's just multiplied by the shadow colour and then drawn. http://canvex.lazyilluminati.com/misc/shadow/shadow2.html shows (in the middle column) that it works the same as Safari's shadows when manually drawing the shadow image (using lots of temporary bitmaps for the blurring) then compositing that and then compositing the original image on top. The shadowOffset and shadowBlur are unaffected by transformations, as in http://canvex.lazyilluminati.com/misc/shadow/shadow3.html. I think the definition would be like: 3.14.11.1.6. Shadows All drawing operations are affected by the four global shadow attributes. The shadowColor attribute sets the color of the shadow. When the context is created, the shadowColor attribute initially must be fully-transparent black. The shadowOffsetX and shadowOffsetY attributes specify the distance that the shadow will be offset in the positive horizontal and positive vertical distance respectively. Their values are in coordinate space units, and are not affected by the transformation matrix. When the context is created, the shadow offset attributes initially have the value 0. The shadowBlur attribute specifies the number of coordinate space units that the blurring is to cover, and is not affected by the transformation matrix. On setting, negative numbers must be ignored, leaving the attribute unmodified. When the context is created, the shadowBlur attribute must initially have the value 0. Support for shadows is optional. When they are supported, then, when shadows are drawn, they must be rendered using the specified color, offset, and blur radius as described below. When they are not supported, shadows must be rendered as if the shadow color was transparent black. [...] 3.14.11.1.11. Drawing model When a shape or image is painted, user agents must follow these steps, in the order given (or act as if they do): * If the current transformation matrix is infinite, then do nothing. Abort these steps. * The coordinates are transformed by the current transformation matrix. * The shape or image is rendered, creating image A, as described in the previous sections. For shapes, the current fill, stroke, and line styles must be honoured. * The shadow image is rendered, as a Gaussian-blurred version of the alpha channel from image A: * Create a shadow bitmap, filled with transparent black. * For every pixel in image A, with position (x, y): * For every pixel in the shadow image, with position (x', y'): * Let u = x' - (x + shadowOffsetX). Let v = y' - (y + shadowOffsetY). * If shadowBlur is zero, then: * If u = v = 0 then let G = 1. Otherwise, let G = 0. Otherwise, shadowBlur in nonzero: * If shadowBlur 8, let σ = shadowBlur/2. Otherwise, let σ = sqrt(2*shadowBlur). * Let G = 1/(2 π σ^2) e^-(u^2 + v^2)/(2 σ^2). * Let (r, g, b, a) be the components of shadowColor. Let a' be the alpha component of the pixel in image A at (x, y). Add the value (r, g, b, a * a' * G) onto the shadow image at (x', y'), using the Porter-Duff 'plus' operator. * The shadow image has its alpha adjusted by globalAlpha. * Within the clip region (as affected by the current transformation matrix), the shadow image is composited over the current canvas bitmap using the current composition operator. * The previous two steps are repeated, using image A instead of the shadow image. (I haven't tried actually implementing it in the way detailed above, so the description may be buggy, but I can't see anything wrong myself so I guess it's probably alright.) It is assumed that all the
Re: [whatwg] Entity parsing
On Sun, 5 Nov 2006, �istein E. Andersen wrote: From section 9.2.3.1. Tokenising entities: For some entities, UAs require a semicolon, for others they don't. This applies to IE. FWIW, the entities not requiring a semicolon are the ones encoding Latin-1 characters, the other HTML 3.2 entities (amp, gt and lt), as well as quot and the uppercase variants (AMP, COPY, GT, LT, QUOT and REG). [...] I've defined the parsing and conformance requirements in a way that matches IE. As a side-effect, this has made things like naiumlve actually conforming. I don't know if we want this. On the one hand, it's pragmatic (after all, why require the semicolon?), and is equivalent to not requiring quotes around attribute values. On the other, people don't want us to make the quotes optional either. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Space characters
On Mon, 6 Nov 2006, Henri Sivonen wrote: On Nov 6, 2006, at 07:34, Ian Hickson wrote: On Sun, 5 Nov 2006, Henri Sivonen wrote: Is there a reason why the definition of space characters does not match the XML 1.0 and RELAX NG definition of white space (space, tab, CR, LF) but also includes (line tabulation and form feed)? Is the deviation from XML 1.0 needed for backwards compatibility with text/html UAs? I made the parser consider VT and FF as being whitespace based on, as I recall, a complete examination of every Unicode character's behaviour in the parsers I was testing. The definition of space characters matches the parser's behaviour for consistency. The definition of space characters doesn't affect the XML parser stage as far as I can recall, only attribute parsing and DOM conformance. The potential problem with it affecting DOM conformance is that it may have ripple effects to running XML tooling inside a browser engine. Gecko has an XPath implementation. Disruptive Innovations has created a RELAX NG implementation for Gecko. Running the schemas from syntax.whattf.org on a DOM inside Gecko would be interesting, since it would allow checking DOM snapshots modified by scripts. There may be other reasons to run XML machinery on an HTML DOM in a browser. Both XPath and RELAX NG assume that white space-separated tokens follow the XML notion of white space. Not being able to use the native XPath and RELAX NG notions of splitting on white space would be seriously uncool. Of course, a browser engine might get away with tampering with the XPath or RELAX NG notions of white space since the additional characters don't occur in XML. But does it make sense to inflict the cost of such tweaking on the XML parts of browser engines? Would there be serious compatibility problems if the HTML5 parsing algorithm required VT and FF to be mapped to space (after expanding NCRs) and the higher-level parts of the spec defined white space as space, tab, CR and LF? Well, I don't much care about VT, but I really think we should round-trip form feed. Consider, for instance, RFCs, which have form feeds. I don't like the idea of dropping them on the floor when you convert RFCs to HTML and back to text again. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Handling of illegal byte-sequences (typically in UTF-8)
On Fri, 24 Nov 2006, �istein E. Andersen wrote: Section 8.1.4: Bytes that are not valid UTF-8 sequences must be interpreted as [...] U+FFFD Section 9.2.2: Bytes or sequences of bytes [...] that could not be converted to Unicode characters must be converted to U+FFFD If I read this correctly, section 8.1.4 requires that an illegal UTF-8 sequence like F2 BF BF (the three first bytes of a four-byte sequence, obviously not followed by a continuation byte) be converted into exactly three U+FFFD characters (one for each byte), whereas section 9.2.2 also allows one single replacement character (and possibly even two) in this case (and permits an arbitrary number n of repetitions of the three-byte sequence to be replaced by any number of U+FFFD characters between 1 and 3n). I realise that the underspecification in section 9.2.2 may well be intentional, given that this section is not limited to UTF-8, but (quite possibly depending on the handling chosen) this can (more or less easily) be expressed in such a way that it applies to any encoding. Alternatively, a reference to an authoritative source would of course fulfil the purpose in the particular case of UTF-8 (if such a document can be found). [Currently, an alert reader might infer that the treatment indicated in section 8.1.4 would be preferable also in section 9.2.2, but such inference for consistency can hardly be expected.] On Fri, 24 Nov 2006, Henri Sivonen wrote: I'm inclined to think that interop in error situations doesn't need to go as deep as defining how many replacement characters (in the range 1...number of bytes in a faulty sequence) a character decoder has to emit. Apps may want to delegate character decoding to an outside library whose authors don't care about the details of HTML5. (For example, it appears that Safari is leaving this stuff to ICU.) Chances are that there's more value in being able to use a library than in getting a specific number of replacement characters on error. On Sat, 25 Nov 2006, �istein E. Andersen wrote: I agree. The current slight inconsistency should probably be amended by making section 8.1.4 more liberal rather than the other way round. Done. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] HTML syntax: space characters between attributes
On Tue, 28 Nov 2006, Simon Pieters wrote: The HTML syntax requires space characters between attributes, but the lack of space characters between attributes does not cause a parse error according to the parsing section. Attributes must be separated from each other and from the tag name by one or more space characters. I'd suggest either making it a parse error or change the syntax to make it optional. (But obviously it can't be optional when the preceding attribute is minimized or unquoted.) This was changed some time back to make the whitespace optional in most cases (except where it would otherwise be ambiguous). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Parsing (and syntax): in unquoted attribute values
On Wed, 29 Nov 2006, Simon Pieters wrote: The parsing section says that in unquoted attribute values are a parse error and that it causes the tag token to be emitted. As far as I can tell does not emit the tag token in at least Firefox, IE6 or Safari. Is it intentional to emit the tag token here? (If it is, why?) If not, should it still be a parse error (and be disallowed in the syntax section)? I've removed special processing of . Note that the following cases no longer close start tags, despite them working interoperably in Safari and Firefox: divp div title p div title=p And the following two no longer close tags either (only worked in Firefox): div titlep /divp All of these were allowed in SGML, as I understand it. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Entity parsing
Le 2007-06-14 à 21:05, Ian Hickson a écrit : I've defined the parsing and conformance requirements in a way that matches IE. As a side-effect, this has made things like naiumlve actually conforming. I don't know if we want this. I'd make it non-conforming for the sake of readability. On the one hand, it's pragmatic (after all, why require the semicolon?), and is equivalent to not requiring quotes around attribute values. On the other, people don't want us to make the quotes optional either. I'm perfectly fine with quotes being optional; I think unquoted attribute values are generally as easy to read as their quoted counterparts, if not sometime easier since you don't have the noise of the quotes. On the other hand, it took me about a minute to figure out the word in your example -- naiumlve -- simply because I couldn't find where to put the delimitation between the end of the entity name and the last few characters in the word. In other words, is this the entity iu, ium, iuml, iumlv or iumlve ? Without a list of entities at hand, it takes a lot of guesswork to find the length it consume and the name of the entity. And not everyone can remember all those entity names. Michel Fortin [EMAIL PROTECTED] http://www.michelf.com/