- Original Message -
From: Tony Lewis [EMAIL PROTECTED]
To: George Prekas [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Saturday, May 31, 2003 8:47 AM
Subject: Re: Comment handling
George Prekas wrote:
I have found a bug in Wget version 1.8.2 concerning comment handling (
!--
comment -- ). Take a look at the following illegal HTML code:
HTML
BODY
a href=test1.htmltest1.html/a
!--
a href=test2.htmltest2.html/a
!--
/BODY
/HTML
Now, save the above snippet as test.html and try wget -Fi test.html. You
will notice that it doesn't recognise the second link. I have found a
solution to the above situation and have properly patched html-parse.c
and
I
would like some info on how can I give you the patch.
The HTML code is legitimate, but it only contains one link. The following
three lines constitute a single comment:
!--
a href=test2.htmltest2.html/a
!--
A comment begins at !-- and ends at --. The trailing on the
first
of these lines and the leading ! on the third of these lines are part
of
the comment. That is, the comment text is:
a href=test2.htmltest2.html/a
!
At any rate, one should not expect predictable behavior for broken HTML.
What should wget do with the following?
You are probably right. I have pointed this because I have seen pages that
use as a separator !-- with lots of dashes and althrough
Internet Explorer shows the page, wget can not download it correctly. What
do think about finishing the comment at the ?
a href=test1.htmltest1.html
!--
/a
!--
In one version, it might choose to follow the link to test1.html and in
another version it might not.
Tony