On 4 Jul 2001, at 23:20, Jacob Burckhardt wrote:
> I run wget on this file:
>
> <! ------------------------------------------------------ >
> <A HREF="a.html">a</a>
> <! ------------------------------------------------------ >
> <A HREF="b.html">b</a>
>
> It downloads b.html, but it does not download a.html.
This is not HTML, nor valid SGML, so you shouldn't be too surprised
at the behavior. What wget is doing is skipping over SGML
declarations and the comments in those declarations. One of those
comments is started by the last two hyphens on the line 1 and
terminated by the first two hyphens on line 3, so the whole of line 2
is commented out.
> However, if the following file is used, then it does download a.html:
>
> <! ------------------------------------------------------ >
> <A HREF="a.html">a</a>
The last two hyphens on line 1 start a comment which is not
terminated. Because it is not terminated, wget backs out and
continues parsing the document anyway. Perhaps it shouldn't.