Hi,

From: Ian Hickson <[EMAIL PROTECTED]>
On Sun, 18 Jun 2006, Simon Pieters wrote:
>
> The spec asks whether quirks mode parsing should be adopted[1]. I think
> it would be good if parsing worked more or less the same in quirks and
> standards mode. If we want to adopt quirks mode parsing, then here are
> some remarks:
>
> > Comment parsing is different.
>
> I think the current parsing algorithm for comments should remain. I
> don't think we should adopt IE's "overlapping" comments (<!--> being one
> comment), because that isn't logical and isn't how they work in XML and
> comments in other languages (such as /*/ in CSS isn't one comment).

I agree. However, in quirks mode this is a requirement. So if we make the
parsing quirks-compatible (as in, if we remove DOCTYPE-switching for
parsing), we have no choice.

Ok. I could live with that.

> > The following is considered one script block (!):
> >
> >      <script><!-- document.write('</script>'); --></script>
>
> This one is common, I think, and applies to IE6, Safari and Opera even
> in Standards Mode. Script parsing seems to work like this in Mozilla in
> Quirks Mode:
>
> 1. If the parser hits the string "<!--" then set a flag to ignore </script>
> tags.
> 2. If the parser then hits the string "-->" then reset the flag.
> 3. The flag can only be set once.
> 4. If the parser hits EOF, then reset the flag (if it is set) and reparse the
> script.
>
> Opera seems to do the same as Mozilla.

Anything that depends on EOF is a bad idea for security reasons, so I
would be reluctant to do that...

> We would have to drop reparsing though.

...which you seem to agree with. :-)


> I've tried to figure out exactly what IE does, but I have failed. It
> seems to do reparsing sometimes, and others not, and --> after the
> </script> tag makes a difference, and also whether there are characters
> after the --> (before EOF). The flag can also be set more than once.
>
> Safari seems to do pretty much what IE does.

Can't spec what I can't describe! :-)

If we ignore reparsing, I think I know what Opera, Firefox, IE and Safari do. See these test cases:

  http://simon.html5.org/test/html/parsing/pseudo-comments/

How to interpret results: If there's nothing outside the tested element, then the parser allows multiple pseudo-comments. If "a-->" is outside the element in question, then the parser doesn't allow any pseudo-comments; for "b-->" the parser allows one pseudo-comment.

Below are the results:

opera
  standards mode
  quirks mode
     title
     textarea
     style
     script
     noscript
     noembed (with plugins enabled)
     noframes
        one pseudo-comment

firefox
  standards mode
     title
     textarea
        multiple pseudo-comments
     style
     script
     noscript
     noembed
     noframes
        no pseudo-comments
  quirks mode
     title
     textarea
        multiple pseudo-comments
     style
     noscript
     noembed
     noframes
        no pseudo-comments
     script
        one pseudo-comment

ie
  standards mode
  quirks mode
     title
     textarea
     script
     noscript
     noembed
     noframes
        multiple pseudo-comments
     style
        one pseudo-comment

safari
  standards mode
  quirks mode
     title
     textarea
        no pseudo-comments
     style
     script
     noscript
     noembed
     noframes
        multiple pseudo-comments

I'm not sure what's most sensible to do. I think this is needed for at least <script> parsing. My proposal is to allow multiple pseudo-comments for all RCDATA and CDATA elements.

As for an algorithm for how to do that, I think that an extra flag would be sufficient. If the parser hits <!-- while in RCDATA or CDATA, the flag is set to true. Then, if the parser hits --> the flag sets to false. Initially the flag is false. While the flag is true the element can't be closed.

What's also interesting is that Firefox and IE don't replace entities inside pseudo-comments for RCDATA elements (title and textarea), but Opera and Safari do:

  http://simon.html5.org/test/html/parsing/pseudo-comments/rcdata/

Results:

firefox
ie
  standards mode
  quirks mode
     title
     textarea
        entities are not replaced

opera
safari
  standards mode
  quirks mode
     title
     textarea
        entities are replaced

I guess we could follow IE on this one.

> > p can contain table
>
> I think this might be a good thing. I would also like p to be able to contain
> other struct-inline elements, but perhaps that isn't really possible.

Indeed.

It might be desirable also that a valid HTML4 document gets a conforming HTML4 DOM. If it is, then <p>s shouldn't contain <table>.

> > Safari and IE have special parsing rules for <% ... %> (even in
> > standards mode, though clearly this should be quirks-only).
>
> This wouldn't be a bogus comment, as bogus comments end with > (while
> these end with %>), but I think it would be possible to add this if we
> want to be more compatible with IE.

Oh we could add anything to be compatible with IE... the questions are do
we want to be, and do we need to be.

True.

Like you, I don't know. :-) I want to do some research on this in due
course, but I haven't been able to do it yet.

Would be interesting to see such a research. :-)

Regards,
Simon Pieters


Reply via email to