----- Original Message -----
From: "Tony Lewis" <[EMAIL PROTECTED]>
To: "George Prekas" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Saturday, May 31, 2003 8:47 AM
Subject: Re: Comment handling


> George Prekas wrote:
>
>
> > I have found a bug in Wget version 1.8.2 concerning comment handling (
> <!--
> > comment --> ). Take a look at the following illegal HTML code:
> > <HTML>
> > <BODY>
> > <a href="test1.html">test1.html</a>
> > <!-->
> > <a href="test2.html">test2.html</a>
> < <!-->
> > </BODY>
> > </HTML>
> >
> > Now, save the above snippet as test.html and try wget -Fi test.html. You
> > will notice that it doesn't recognise the second link. I have found a
> > solution to the above situation and have properly patched html-parse.c
and
> I
> > would like some info on how can I give you the patch.
>
> The HTML code is legitimate, but it only contains one link. The following
> three lines constitute a single comment:
>
> <!-->
> <a href="test2.html">test2.html</a>
> <!-->
>
> A comment begins at "<!--" and ends at "-->". The trailing ">" on the
first
> of these lines and the leading "<!" on the third of these lines are part
of
> the comment. That is, the comment text is:
>
> >
> <a href="test2.html">test2.html</a>
> <!
>
> At any rate, one should not expect predictable behavior for broken HTML.
> What should wget do with the following?

You are probably right. I have pointed this because I have seen pages that
use as a separator <!--------------> with lots of dashes and althrough
Internet Explorer shows the page, wget can not download it correctly. What
do think about finishing the comment at the >?

>
> <a href="test1.html">test1.html
> <!-->
> </a>
> <!-->
>
> In one version, it might choose to follow the link to test1.html and in
> another version it might not.
>
> Tony
>
>

Reply via email to