> After reading http://www.w3c.org/MarkUp/SGML/sgml-lex/sgml-lex I am
> convinced that <!-----> is a valid SGML (and therefore HTML) comment.
> Therefore, I believe it is a bug if wget does not recognize such a comment.

I don't think so. Actually the rules for SGML "comments" are
somewhat different. First, a comment need not be part of
a comment declaration, but may as well appear in markup
declarations, e.g. in the role of parameter separators.

Example (from HTML 4 strict):

<!ATTLIST BR
  %coreattrs;                          -- id, class, style, title --
  >

There is at least one comment here, namely between the firsts
"visible" comment delimiter (-- before " id") and the second -- at
the end of the second line. (The coreattrs entity itself has
some more comments in its value's text.)

In addition, a declaration may contain only comments, and nothing
else. This is what is usually referred to as "comment" in web pages'
HTML text.

Example of a declaration that contains nothing but comments:

<!-- a tree --
  -- on mars? --
  >

This comment declaration has two comments and a few separators
in it.

The comment declaration rules are numbered 91, and 92 in the SGML
standard.

A comment declaration [91] is a markup declaration open (<!), optionally
followed by a comment (see below) which might be followed by any number
of separator-or-comment; the declaration is terminated by
markup declaration close (>).

  comment declaration = mdo, (comment, (s | comment)*)?, mdc

A comment [92] is a comment delimiter (--),
followed by any number of SGML characters, followed
by another comment delimiter (--).

  comment = com, SGML characer*, com

(Since the subsentence "followed by..." in [91] is optional (?),
an empty comment declaration will  be "<!" immediately followed
by ">", i.e. "<!>" is a comment, too.)

So in the example <!-----> there are 5 hyphens, the first two
of which can be interpreted as a comment delimiter, as can
the second two. But then there is something else following the
second two, namely a '-'. So this piece of text is as invalid
as <!----z>.


> Note: I haven't studied the source to confirm how it handles such a string.

Neither have I.

Georg


Reply via email to