Re: [whatwg] parsing: bogus comments - PIs

2007-06-14 Thread Michael A. Puls II

On 6/13/07, Ian Hickson [EMAIL PROTECTED] wrote:

On Wed, 26 Jul 2006, Shadow2531 wrote:
  
   So, ?xml-stylesheet type=text/css href=? is a bogus comment.
  
   I *was* 100% sure that the PI should be parsed into:
  
   !--?xml-stylesheet type=text/css href=?--
 
  Correct.

 Thanks Ian. Can you comment on innerHTML for this situation?

 If ?xml-stylesheet type=text/css href=? is parsed into
 !--?xml-stylesheet type=text/css href=?-- , what should
 innerHTML show?

Assuming you mean the .innerHTML of a parent element, it would show the
comment as you've written it above. See the innerHTML definition in the
spec:

   http://www.whatwg.org/specs/web-apps/current-work/#innerhtml0


Thanks. That clears it up now.

My notes for reference:

Given HTML5 markup:
div id=test?xml-stylesheet type=text/css href=?/div

Since PIs in markup are parsed as bogus comments, the above is parsed as:
div id=test!--?xml-stylesheet type=text/css href=?--/div
and the comment is parsed into the DOM as a comment node.

document.getElementById(test).innerHTML should then return the string:
!--?xml-stylesheet type=text/css href=?--
because that's what
!-- + document.getElementById(test).data + --
should equal.

Required changes:
Since Firefox, IE, Opera and Safari do not conform to this:

Firefox and Safari will have to stop ignoring PIs in markup and treat
them as comments.
Opera and IE will have to start treating PIs in markup as comments.

--
Michael


[whatwg] charset sniffing algorithm and space characters around the charset name

2007-06-14 Thread Henri Sivonen
As written, the charset sniffing algorithm doesn't trim space  
characters from around the tentative encoding name. html5lib test  
case expect the space characters to be trimmed.


I suggest trimming space characters (or anything = 0x20 depending on  
which approach is the right for compat).


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




[whatwg] Parsing: comments (was: Re: About adopting quirks mode parsing)

2007-06-14 Thread Anne van Kesteren

On Thu, 14 Jun 2007 03:17:16 +0200, Ian Hickson [EMAIL PROTECTED] wrote:

I haven't looked at the parsing of comments in PCDATA mode yet but I'm
guessing we'll have to support !-- there too.


Yeah, and !--- too iirc. There's some other e-mail dealing with that.  
And maybe also !-- --! given the amount of people that rely on that  
working versus people that rely on !-- --! -- working... We're  
encountering some difficulties with the current algorithm.



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


[whatwg] server-sent events and rfcs 2068 and 2616

2007-06-14 Thread reed

Hi,

I was wondering how you mitigate the persistent connection  
limitations described in RFCs 2068 and 2616 vs server-sent events. It  
seems the former limits the laters usability.


Thanks,
Reed



Re: [whatwg] XHTML and document.write()

2007-06-14 Thread Ian Hickson
On Mon, 14 Aug 2006, Anne van Kesteren wrote:

 Just a FYI. You have to deal with the edge case that the root element 
 might be html:script. Non conforming obviously, but what's supposed to 
 happen should still be defined. I guess you would ignore calls to 
 document.write() in such cases or perhaps copy the element and put it 
 inside a html:html element and try again... Ouch!
 
 Not sure if nested html:script element would make things harder 
 here...

document.write() in XHTML is defined to raise an exception. There were 
simply too many edge cases that make no sense whatsoever for me to work 
out how it could work.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] innerHTML and QNames

2007-06-14 Thread Ian Hickson
On Tue, 3 Oct 2006, Simon Pieters wrote:
 
 On getting .innerHTML the spec says that the tag name is used to 
 serialize tags. However, Opera and Firefox use the local name. Also, it 
 isn't certain that element names and attribute names will be all 
 lower-case.

Fixed, as per our discussion on IRC. I took the opportunity to clean up 
the use of the term tag name in a few other places where it was 
ambiguously used.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)

2007-06-14 Thread Ian Hickson
On Mon, 9 Oct 2006, Robert wrote:
 
  In browsers today, the following:
 a href=test xmlns= ... /a
  ...is just a link. If we start supporting xmlns= as it works in XML, but
  in HTML, then literally millions of pages are going to suddenly have their
  links stop working, because a in the  namespace (as opposed to the XHTML
  namespace), is not an HTML a, and thus isn't a link.
 
 How about defining a standard namespace _prefix_ for such additions to 
 HTML? As far as I've seen, all browsers interpret the namespace prefix 
 as part of the tag/attribute, such that for MATHML in HTML, you'd use 
 math:add. It'd require the author use the prefix for all relevant 
 tags, but it should work without changing anything fundamental in UAs 
 that might break other sites. As far as I'm aware, since namespaces 
 don't exist in HTML there's nothing particularily evil about this.

On Mon, 9 Oct 2006, Anne van Kesteren wrote:
 
 This seems much more annoying to author than the proposed alternative.
 
 It's not like we'll have millions of elements to be used in HTML one 
 day. (I hope not, at least!) The language should remain relatively 
 simple. I'm not even sure why people suggest SVG should be included as 
 well as that's a presentational language. It makes much more sense to 
 bind SVG to elements using XBL.

I tend to agree with Anne. It's not clear to me what the advantage of the 
proposed solution would be. It's not really clear to me what the problem 
is, even.

Cheers,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Map lang to xml:lang at the parser level

2007-06-14 Thread Ian Hickson
On Sun, 15 Oct 2006, Simon Pieters wrote:
 
 When parsing HTML and serializing as XML you normally want to change the 
 lang attribute to xml:lang. But why not put it in the XML namespace at 
 the parser level? Then when you serialize the DOM as XML it becomes 
 xml:lang automatically.
 
 The .lang DOM attribute would reflect xml:lang. This would make it 
 simpler to set/get the language with script in XHTML (no need to use 
 namespace-aware methods).
 
 I don't know if this is too expensive on the parser or if there are 
 other flaws but it's just an idea.

It's an interesting idea but it isn't really compatible with what legacy 
UAs do, since they would expose the attribute as 'lang' but this would 
require them attribute to be fetched using getAttributeNS instead of 
getAttribute to get the same effect.

There are enough other subtleties in the differences between HTML5 and 
XHTML5 that I think you'd have to have special code to convert between the 
two anyway. So I'm not sure this would gain you much.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] innerHTML in XML

2007-06-14 Thread Ian Hickson
On Fri, 27 Oct 2006, Anne van Kesteren wrote:

  foo
   bar/
   bar/
  /foo
 
 How can foo.innerHTML be well-formed here?

On Sat, 28 Oct 2006, Lachlan Hunt wrote:

 Anne van Kesteren wrote:
foo
 bar/
 bar/
/foo
  
  How can foo.innerHTML be well-formed here?
 
 It could be if it were treated as an external parsed entity.

I've made the spec explicitly require that innerHTML return an XML 
namespace-well-formed internal general parsed entity representation.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Allowed characters in attribute names (was: Re: Stepsfor finding one or two numbers in a string)

2007-06-14 Thread Křištof Želechovski


Your hypothetical author is unable to insert an embed element because embed
is all English to him.  Being able to use a Mandarin attribute name will not
help him much because he cannot produce the element to use it with.
Considering Arabic script and the like, the time is probably near when we
will have to learn it anyway.  But we still have some time left, so let's
just use the opportunities.  The day is full of troubles even without your
fantasizing.
Cheers,
Chris

-Original Message-
From: Charles McCathieNevile [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 14, 2007 4:40 AM
To: Kristof Zelechovski; 'Simon Pieters'; 'Thomas Broyer'; [EMAIL PROTECTED]
Subject: Re: [whatwg] Allowed characters in attribute names (was: Re:
Stepsfor finding one or two numbers in a string)

On Wed, 13 Jun 2007 11:18:28 +0200, Kristof Zelechovski  
[EMAIL PROTECTED] wrote:

 Why should I want to use a localized attribute name for the embed  
 element?

Because the only languages you speak are mandarin, cantonese and han, and  
you are using an IDE to develop your system that only requires you to deal  
with localised stuff for the rest of it.

Actually, that isn't using a localised attribute name, just one that  
actually has a little bit of obvious semantics. Would it make sense to  
require english speakers to use arabic characters?

While english is a very widely spoken language, most people still don't  
speak a latin language.

cheers

Chaals

-- 
   Charles McCathieNevile, Opera Software: Standards Group
   hablo espanol  -  je parle français  -  jeg larer norsk
[EMAIL PROTECTED]  Catch up: Speed Dial  http://opera.com



Re: [whatwg] XHTML5 DOM building and IDness

2007-06-14 Thread Ian Hickson
On Thu, 2 Nov 2006, Henri Sivonen wrote:
 The spec says:
  The rules for parsing XML documents (and thus XHTML documents) into DOM
  trees are covered by the XML and Namespaces in XML specifications, and are
  out of scope of this specification.
 
 However, the spec says the following about the id attribute:

  If the value is not the empty string, user agents must associate the element
  with the given value (exactly) for the purposes of ID matching (e.g. for
  selectors in CSS or for the getElementById()  method in the DOM).

 [...] there is a piece of code somewhere between the XML processor and 
 the resulting DOM tree that is analogous to an xml:id processor and that 
 assigns IDness to attributes that are not in a namespace, have the local 
 name id and belong to elements in the XHTML namespace.

Right, that piece of code is the XHTML UA. Is that a problem? Why would 
the rules resulting from HTML element semantics have to be dealt with by 
the lower level layers?


 The second quote implies that the first quote is not the full story and 
 building a DOM tree from an XHTML document byte stream is not entirely 
 covered by the XML and Namespaces in XML specifications [...]

Not entirely is a polite way of putting it. There's a huge gaping whole 
between the XML spec and the DOM spec, with no actual definition anywhere 
that says how you get from one to the other -- there's no equivalent of 
the HTML parser spec for XML/DOM. It's only because for most things 
there's an obvious mapping that the implementations are interoperable, 
IMHO. This is one reason why I've punted on defining document.write() for 
XML -- without a strict parser spec that defines at which stage the DOM is 
updated, there's no clear definition of how you insert things into the 
parser's input stream, for example.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] 9.2.2: replacement characters. How many?

2007-06-14 Thread Ian Hickson
On Fri, 3 Nov 2006, Elliotte Harold wrote:

 Section 9.2.2 of the current Web Apps 1.0 draft states:
 
 Bytes or sequences of bytes in the original byte stream that could not 
 be converted to Unicode characters must be converted to U+FFFD 
 REPLACEMENT CHARACTER code points.
 
 I'm concerned about the or. For example, suppose there are six upper 
 halves of a Unicode surrogate pair in a row and no lower halves. Does 
 that turn into six replacement characters or one? Both interpretations 
 seem possible.
 
 I suppose I prefer six rather than one, but I don't care a great deal as 
 long as this is locked down one way or the other.

I don't really know how to define this. I'd like to say that it's up to 
the encoding specifications to define it. Any suggestions?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Typo in 9.2.3

2007-06-14 Thread Ian Hickson
On Sun, 5 Nov 2006, Elliotte Harold wrote:

 Otherwise if the next seven chacacters are a case-insensitive match for the
 word DOCTYPE, then consume those characters and switch to the DOCTYPE state.
 
 chacacters -- characters

Fixed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


[whatwg] Canvas shadow rendering

2007-06-14 Thread Philip Taylor

I've looked at how Safari renders shadows - the spec should probably
define something similar, since it works and it's not insane or
anything.

Just before a shape/image is drawn, a shadow image is created (based
on the original shape/image's alpha values (ignoring the RGB entirely)
and the shadow colour/offset/blur). That shadow image is then drawn as
normal (affected by globalAlpha and globalCompositeOperation), and
then the original shape/image is drawn on top as normal.

The shadow image copies the original alpha values, then gets
Gaussian-blurred (http://en.wikipedia.org/wiki/Gaussian_blur etc). The
σ parameter in the Gaussian function is derived from shadowBlur: as
far as I can tell, the best approximation to Safari's behaviour is
with σ = (if shadowBlur  8 then shadowBlur/2 else
sqrt(2*shadowBlur)).
http://canvex.lazyilluminati.com/misc/shadow/shadow1.html (in
Safari) shows its shadow rendering compared to that Gaussian function.
There's not a perfect correspondence, but there's at least one place
where Safari is simply buggy (it cuts off the left edge by one pixel
when shadowBlur = 6) so it's never going to be a perfect
correspondence, and it looks close enough to me. But if anyone has a
better idea of the exact equation, that would be good to know, since I
got fed up with trying to guess :-)

After that, it's just multiplied by the shadow colour and then drawn.
http://canvex.lazyilluminati.com/misc/shadow/shadow2.html shows (in
the middle column) that it works the same as Safari's shadows when
manually drawing the shadow image (using lots of temporary bitmaps for
the blurring) then compositing that and then compositing the original
image on top.

The shadowOffset and shadowBlur are unaffected by transformations, as
in http://canvex.lazyilluminati.com/misc/shadow/shadow3.html.


I think the definition would be like:



3.14.11.1.6. Shadows

All drawing operations are affected by the four global shadow attributes.

The shadowColor attribute sets the color of the shadow.

When the context is created, the shadowColor attribute initially must
be fully-transparent black.

The shadowOffsetX and shadowOffsetY attributes specify the distance
that the shadow will be offset in the positive horizontal and positive
vertical distance respectively. Their values are in coordinate space
units, and are not affected by the transformation matrix.

When the context is created, the shadow offset attributes initially
have the value 0.

The shadowBlur attribute specifies the number of coordinate space
units that the blurring is to cover, and is not affected by the
transformation matrix. On setting, negative numbers must be ignored,
leaving the attribute unmodified.

When the context is created, the shadowBlur attribute must initially
have the value 0.

Support for shadows is optional. When they are supported, then, when
shadows are drawn, they must be rendered using the specified color,
offset, and blur radius as described below. When they are not
supported, shadows must be rendered as if the shadow color was
transparent black.

[...]

3.14.11.1.11. Drawing model

When a shape or image is painted, user agents must follow these steps,
in the order given (or act as if they do):
* If the current transformation matrix is infinite, then do nothing.
Abort these steps.
* The coordinates are transformed by the current transformation matrix.
* The shape or image is rendered, creating image A, as described in
the previous sections. For shapes, the current fill, stroke, and line
styles must be honoured.
* The shadow image is rendered, as a Gaussian-blurred version of the
alpha channel from image A:
 * Create a shadow bitmap, filled with transparent black.
 * For every pixel in image A, with position (x, y):
   * For every pixel in the shadow image, with position (x', y'):
 * Let u = x' - (x + shadowOffsetX). Let v = y' - (y + shadowOffsetY).
 * If shadowBlur is zero, then:
 * If u = v = 0 then let G = 1. Otherwise, let G = 0.
   Otherwise, shadowBlur in nonzero:
 * If shadowBlur  8, let σ = shadowBlur/2. Otherwise, let σ
= sqrt(2*shadowBlur).
 * Let G = 1/(2 π σ^2) e^-(u^2 + v^2)/(2 σ^2).
 * Let (r, g, b, a) be the components of shadowColor. Let a' be
the alpha component of the pixel in image A at (x, y). Add the value
(r, g, b, a * a' * G) onto the shadow image at (x', y'), using the
Porter-Duff 'plus' operator.
* The shadow image has its alpha adjusted by globalAlpha.
* Within the clip region (as affected by the current transformation
matrix), the shadow image is composited over the current canvas bitmap
using the current composition operator.
* The previous two steps are repeated, using image A instead of the
shadow image.



(I haven't tried actually implementing it in the way detailed above,
so the description may be buggy, but I can't see anything wrong myself
so I guess it's probably alright.)

It is assumed that all the 

Re: [whatwg] Entity parsing

2007-06-14 Thread Ian Hickson
On Sun, 5 Nov 2006, �istein E. Andersen wrote:

 From section 9.2.3.1. Tokenising entities:
   For some entities, UAs require a semicolon, for others they don't.
 
 This applies to IE.
 
 FWIW, the entities not requiring a semicolon are the ones encoding 
 Latin-1 characters, the other HTML 3.2 entities (amp, gt and lt), as 
 well as quot and the uppercase variants (AMP, COPY, GT, LT, QUOT 
 and REG). [...]

I've defined the parsing and conformance requirements in a way that 
matches IE. As a side-effect, this has made things like naiumlve 
actually conforming. I don't know if we want this. On the one hand, it's 
pragmatic (after all, why require the semicolon?), and is equivalent to 
not requiring quotes around attribute values. On the other, people don't 
want us to make the quotes optional either.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Space characters

2007-06-14 Thread Ian Hickson
On Mon, 6 Nov 2006, Henri Sivonen wrote:
 On Nov 6, 2006, at 07:34, Ian Hickson wrote:
  On Sun, 5 Nov 2006, Henri Sivonen wrote:
   
   Is there a reason why the definition of space characters does not 
   match the XML 1.0 and RELAX NG definition of white space (space, 
   tab, CR, LF) but also includes (line tabulation and form feed)? Is 
   the deviation from XML 1.0 needed for backwards compatibility with 
   text/html UAs?
  
  I made the parser consider VT and FF as being whitespace based on, as 
  I recall, a complete examination of every Unicode character's 
  behaviour in the parsers I was testing. The definition of space 
  characters matches the parser's behaviour for consistency.
  
  The definition of space characters doesn't affect the XML parser 
  stage as far as I can recall, only attribute parsing and DOM 
  conformance.
 
 The potential problem with it affecting DOM conformance is that it may 
 have ripple effects to running XML tooling inside a browser engine. 
 Gecko has an XPath implementation. Disruptive Innovations has created a 
 RELAX NG implementation for Gecko. Running the schemas from 
 syntax.whattf.org on a DOM inside Gecko would be interesting, since it 
 would allow checking DOM snapshots modified by scripts. There may be 
 other reasons to run XML machinery on an HTML DOM in a browser. Both 
 XPath and RELAX NG assume that white space-separated tokens follow the 
 XML notion of white space. Not being able to use the native XPath and 
 RELAX NG notions of splitting on white space would be seriously uncool. 
 Of course, a browser engine might get away with tampering with the XPath 
 or RELAX NG notions of white space since the additional characters don't 
 occur in XML. But does it make sense to inflict the cost of such 
 tweaking on the XML parts of browser engines?
 
 Would there be serious compatibility problems if the HTML5 parsing 
 algorithm required VT and FF to be mapped to space (after expanding 
 NCRs) and the higher-level parts of the spec defined white space as 
 space, tab, CR and LF?

Well, I don't much care about VT, but I really think we should round-trip 
form feed. Consider, for instance, RFCs, which have form feeds. I don't 
like the idea of dropping them on the floor when you convert RFCs to HTML 
and back to text again.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Handling of illegal byte-sequences (typically in UTF-8)

2007-06-14 Thread Ian Hickson
On Fri, 24 Nov 2006, �istein E. Andersen wrote:

 Section 8.1.4:
  Bytes that are not valid UTF-8 sequences must be interpreted as [...] U+FFFD
 
 Section 9.2.2:
  Bytes or sequences of bytes [...] that could not be converted to Unicode 
  characters
  must be converted to U+FFFD
 
 If I read this correctly, section 8.1.4 requires that an illegal UTF-8 
 sequence like F2 BF BF (the three first bytes of a four-byte sequence, 
 obviously not followed by a continuation byte) be converted into exactly 
 three U+FFFD characters (one for each byte), whereas section 9.2.2 also 
 allows one single replacement character (and possibly even two) in this 
 case (and permits an arbitrary number n of repetitions of the three-byte 
 sequence to be replaced by any number of U+FFFD characters between 1 and 
 3n).
 
 I realise that the underspecification in section 9.2.2 may well be 
 intentional, given that this section is not limited to UTF-8, but (quite 
 possibly depending on the handling chosen) this can (more or less 
 easily) be expressed in such a way that it applies to any encoding.
 
 Alternatively, a reference to an authoritative source would of course 
 fulfil the purpose in the particular case of UTF-8 (if such a document 
 can be found).
 
 [Currently, an alert reader might infer that the treatment indicated in 
 section 8.1.4 would be preferable also in section 9.2.2, but such 
 inference for consistency can hardly be expected.]

On Fri, 24 Nov 2006, Henri Sivonen wrote:
 
 I'm inclined to think that interop in error situations doesn't need to 
 go as deep as defining how many replacement characters (in the range 
 1...number of bytes in a faulty sequence) a character decoder has to 
 emit. Apps may want to delegate character decoding to an outside library 
 whose authors don't care about the details of HTML5. (For example, it 
 appears that Safari is leaving this stuff to ICU.) Chances are that 
 there's more value in being able to use a library than in getting a 
 specific number of replacement characters on error.

On Sat, 25 Nov 2006, �istein E. Andersen wrote:
 
 I agree. The current slight inconsistency should probably be amended by 
 making section 8.1.4 more liberal rather than the other way round.

Done.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] HTML syntax: space characters between attributes

2007-06-14 Thread Ian Hickson
On Tue, 28 Nov 2006, Simon Pieters wrote:
 
 The HTML syntax requires space characters between attributes, but the 
 lack of space characters between attributes does not cause a parse error 
 according to the parsing section.
 
   Attributes must be separated from each other and from the
   tag name by one or more space characters.
 
 I'd suggest either making it a parse error or change the syntax to make 
 it optional. (But obviously it can't be optional when the preceding 
 attribute is minimized or unquoted.)

This was changed some time back to make the whitespace optional in most 
cases (except where it would otherwise be ambiguous).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Parsing (and syntax): in unquoted attribute values

2007-06-14 Thread Ian Hickson
On Wed, 29 Nov 2006, Simon Pieters wrote:
 
 The parsing section says that  in unquoted attribute values are a parse 
 error and that it causes the tag token to be emitted. As far as I can 
 tell  does not emit the tag token in at least Firefox, IE6 or Safari. 
 Is it intentional to emit the tag token here? (If it is, why?)
 
 If not, should it still be a parse error (and be disallowed in the 
 syntax section)?

I've removed special processing of .

Note that the following cases no longer close start tags, despite them 
working interoperably in Safari and Firefox:

   divp
   div title p
   div title=p

And the following two no longer close tags either (only worked in 
Firefox):

   div titlep
   /divp

All of these were allowed in SGML, as I understand it.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Entity parsing

2007-06-14 Thread Michel Fortin

Le 2007-06-14 à 21:05, Ian Hickson a écrit :

I've defined the parsing and conformance requirements in a way that  
matches IE. As a side-effect, this has made things like naiumlve  
actually conforming. I don't know if we want this.


I'd make it non-conforming for the sake of readability.

On the one hand, it's pragmatic (after all, why require the  
semicolon?), and is equivalent to not requiring quotes around  
attribute values. On the other, people don't want us to make the  
quotes optional either.


I'm perfectly fine with quotes being optional; I think unquoted  
attribute values are generally as easy to read as their quoted  
counterparts, if not sometime easier since you don't have the noise  
of the quotes.


On the other hand, it took me about a minute to figure out the word  
in your example -- naiumlve -- simply because I couldn't find  
where to put the delimitation between the end of the entity name and  
the last few characters in the word. In other words, is this the  
entity iu, ium, iuml, iumlv or iumlve ? Without a list of  
entities at hand, it takes a lot of guesswork to find the length it  
consume and the name of the entity. And not everyone can remember all  
those entity names.



Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/