On Wed, 4 Jun 2003, Tony Lewis wrote:
Adding this function to wget seems reasonable to me, but I'd suggest that it
be off by default and enabled from the command line with something
like --quirky_comments.
why not just have the default wget behavior follow comments explicitly
(i've lost track
Aaron S. Hawley wrote:
why not just have the default wget behavior follow comments explicitly
(i've lost track whether wget does that or needs to be ammended) /and/
have an option that goes /beyond/ quirky comments and is just
--ignore-comments ? :)
The issue we've been discussing is what to
Tony Lewis writes:
The issue we've been discussing is what to do about things that almost
follow the rules for HTML comments, but don't quite get it right. By
default, wget ignores legitimate HTML comments.
I think the point of the suggested option was to not even try to
identify HTML
Tony Lewis writes:
The issue we've been discussing is what to do about things that almost
follow the rules for HTML comments, but don't quite get it right. By
default, wget ignores legitimate HTML comments.
I think the point of the suggested option was to not even try to
identify HTML
i suppose my proposal should have been called --disobey-comments (comments
are already ignored by default).
i'm just saying what's going to happen when someone posts to this list:
My Web Pages have [insert obscure comment format] for comments and Wget
is considering them to (not) be comments.
[...]
i suppose my proposal should have been called --disobey-comments (comments
are already ignored by default).
I suppose that this is a good idea, since it won't be enabled by default and
someone could enable it if the page he wants to download is very buggy
concerning the comments.
i'm
On Wed, 4 Jun 2003, George Prekas wrote:
snip
i think the idea of quirky comments modes are cool, but is it the better
solution?
Do you think that the current algorithm shouldn't be improved? Even, a
little bit to handle the common mistakes?
i think Wget's default behavior should be
Aaron S. Hawley wrote:
i'm just saying what's going to happen when someone posts to this list:
My Web Pages have [insert obscure comment format] for comments and Wget
is considering them to (not) be comments. Can you change the [insert
Wget comment mode] comment mode to (not) recognize my
start with -- and the
next nonblank character should not be - or .
That's for now. Please give me some feedback with your thoughts and tell me
if you would like the comment handling mechanism of WGet to change. By the
way, who was written the current one? Maybe, he can help us with his
experience
Georg Bauhaus wrote:
I don't think so. Actually the rules for SGML comments are
somewhat different.
Georg, I think we're talking about apples and oranges here. I'm talking
about what is legitimate in a comment in an SGML document. I think you're
talking about what is legitimate as a comment
, then, about comment handling? I mean, should a comment
finish at the or not?
Maybe the following will work sufficiently well?
We have a candidate for a declaration, that is, we have seen !,
and it looks like this is not a DOCTYPE, ENTITY, and so on
declaration. If it is, apply the usual processing
This is what I have tried, leaving out EOF. Basically the algorithm is quite
tolerant and, after !,, either looks for '[[:space:]]*' or for the next
--[[:space]]*. This will include some very invalid comments, but so what?
I
thought it might blend well with typical wget use. It doesn't
Georg, I think we're talking about apples and oranges here. I'm talking
about what is legitimate in a comment in an SGML document. I think you're
talking about what is legitimate as a comment in an SGML declaration.
Ah, yes, o.K., I was reacting to valid SGML comments, where legitimate
is not
After reading http://www.w3c.org/MarkUp/SGML/sgml-lex/sgml-lex I am
convinced that !- is a valid SGML (and therefore HTML) comment.
Therefore, I believe it is a bug if wget does not recognize such a comment.
I don't think so. Actually the rules for SGML comments are
somewhat different.
- Original Message -
From: Tony Lewis [EMAIL PROTECTED]
To: George Prekas [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Saturday, May 31, 2003 8:47 AM
Subject: Re: Comment handling
George Prekas wrote:
I have found a bug in Wget version 1.8.2 concerning comment handling
George Prekas wrote:
You are probably right. I have pointed this because I have seen pages that
use as a separator !-- with lots of dashes and althrough
Internet Explorer shows the page, wget can not download it correctly. What
do think about finishing the comment at the ?
After
I have found a bug in Wget version 1.8.2 concerning comment handling ( !--
comment -- ). Take a look at the following illegal HTML code:
HTML
BODY
a href=test1.htmltest1.html/a
!--
a href=test2.htmltest2.html/a
!--
/BODY
/HTML
Now, save the above snippet as test.html and try wget -Fi test.html
On Fri, 30 May 2003, George Prekas wrote:
I have found a bug in Wget version 1.8.2 concerning comment handling ( !--
comment -- ). Take a look at the following illegal HTML code:
HTML
BODY
a href=test1.htmltest1.html/a
!--
a href=test2.htmltest2.html/a
!--
/BODY
/HTML
Now, save
George Prekas wrote:
I have found a bug in Wget version 1.8.2 concerning comment handling (
!--
comment -- ). Take a look at the following illegal HTML code:
HTML
BODY
a href=test1.htmltest1.html/a
!--
a href=test2.htmltest2.html/a
!--
/BODY
/HTML
Now, save the above snippet
I have found a bug in Wget version 1.8.2 concerning comment handling ( !--
comment -- ). Take a look at the following illegal HTML code:
HTML
BODY
a href=test1.htmltest1.html/a
!--
a href=test2.htmltest2.html/a
!--
/BODY
/HTML
Now, save the above snippet as test.html and try wget -Fi test.html
I have found a bug in Wget version 1.8.2 concerning comment handling ( !--
comment -- ). Take a look at the following illegal HTML code:
HTML
BODY
a href=test1.htmltest1.html/a
!--
a href=test2.htmltest2.html/a
!--
/BODY
/HTML
Now, save the above snippet as test.html and try wget -Fi test.html
21 matches
Mail list logo