Re: Comment handling

2003-06-05 Thread Aaron S. Hawley
On Wed, 4 Jun 2003, Tony Lewis wrote: Adding this function to wget seems reasonable to me, but I'd suggest that it be off by default and enabled from the command line with something like --quirky_comments. why not just have the default wget behavior follow comments explicitly (i've lost track

Re: Comment handling

2003-06-05 Thread Tony Lewis
Aaron S. Hawley wrote: why not just have the default wget behavior follow comments explicitly (i've lost track whether wget does that or needs to be ammended) /and/ have an option that goes /beyond/ quirky comments and is just --ignore-comments ? :) The issue we've been discussing is what to

Re: Comment handling

2003-06-05 Thread Larry Jones
Tony Lewis writes: The issue we've been discussing is what to do about things that almost follow the rules for HTML comments, but don't quite get it right. By default, wget ignores legitimate HTML comments. I think the point of the suggested option was to not even try to identify HTML

Re: Comment handling

2003-06-05 Thread George Prekas
Tony Lewis writes: The issue we've been discussing is what to do about things that almost follow the rules for HTML comments, but don't quite get it right. By default, wget ignores legitimate HTML comments. I think the point of the suggested option was to not even try to identify HTML

Re: Comment handling

2003-06-05 Thread Aaron S. Hawley
i suppose my proposal should have been called --disobey-comments (comments are already ignored by default). i'm just saying what's going to happen when someone posts to this list: My Web Pages have [insert obscure comment format] for comments and Wget is considering them to (not) be comments.

Re: Comment handling

2003-06-05 Thread George Prekas
[...] i suppose my proposal should have been called --disobey-comments (comments are already ignored by default). I suppose that this is a good idea, since it won't be enabled by default and someone could enable it if the page he wants to download is very buggy concerning the comments. i'm

Re: Comment handling

2003-06-05 Thread Aaron S. Hawley
On Wed, 4 Jun 2003, George Prekas wrote: snip i think the idea of quirky comments modes are cool, but is it the better solution? Do you think that the current algorithm shouldn't be improved? Even, a little bit to handle the common mistakes? i think Wget's default behavior should be

Re: Comment handling

2003-06-05 Thread Tony Lewis
Aaron S. Hawley wrote: i'm just saying what's going to happen when someone posts to this list: My Web Pages have [insert obscure comment format] for comments and Wget is considering them to (not) be comments. Can you change the [insert Wget comment mode] comment mode to (not) recognize my

Re: Comment handling

2003-06-04 Thread George Prekas
start with -- and the next nonblank character should not be - or . That's for now. Please give me some feedback with your thoughts and tell me if you would like the comment handling mechanism of WGet to change. By the way, who was written the current one? Maybe, he can help us with his experience

Re: Comment handling

2003-06-03 Thread Tony Lewis
Georg Bauhaus wrote: I don't think so. Actually the rules for SGML comments are somewhat different. Georg, I think we're talking about apples and oranges here. I'm talking about what is legitimate in a comment in an SGML document. I think you're talking about what is legitimate as a comment

Re: Comment handling

2003-06-03 Thread Georg Bauhaus
, then, about comment handling? I mean, should a comment finish at the or not? Maybe the following will work sufficiently well? We have a candidate for a declaration, that is, we have seen !, and it looks like this is not a DOCTYPE, ENTITY, and so on declaration. If it is, apply the usual processing

Re: Comment handling

2003-06-03 Thread Georg Bauhaus
This is what I have tried, leaving out EOF. Basically the algorithm is quite tolerant and, after !,, either looks for '[[:space:]]*' or for the next --[[:space]]*. This will include some very invalid comments, but so what? I thought it might blend well with typical wget use. It doesn't

Re: Comment handling

2003-06-03 Thread Georg Bauhaus
Georg, I think we're talking about apples and oranges here. I'm talking about what is legitimate in a comment in an SGML document. I think you're talking about what is legitimate as a comment in an SGML declaration. Ah, yes, o.K., I was reacting to valid SGML comments, where legitimate is not

Re: Comment handling

2003-06-02 Thread Georg Bauhaus
After reading http://www.w3c.org/MarkUp/SGML/sgml-lex/sgml-lex I am convinced that !- is a valid SGML (and therefore HTML) comment. Therefore, I believe it is a bug if wget does not recognize such a comment. I don't think so. Actually the rules for SGML comments are somewhat different.

Re: Comment handling

2003-06-01 Thread George Prekas
- Original Message - From: Tony Lewis [EMAIL PROTECTED] To: George Prekas [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Saturday, May 31, 2003 8:47 AM Subject: Re: Comment handling George Prekas wrote: I have found a bug in Wget version 1.8.2 concerning comment handling

Re: Comment handling

2003-06-01 Thread Tony Lewis
George Prekas wrote: You are probably right. I have pointed this because I have seen pages that use as a separator !-- with lots of dashes and althrough Internet Explorer shows the page, wget can not download it correctly. What do think about finishing the comment at the ? After

Comment handling

2003-05-31 Thread George Prekas
I have found a bug in Wget version 1.8.2 concerning comment handling ( !-- comment -- ). Take a look at the following illegal HTML code: HTML BODY a href=test1.htmltest1.html/a !-- a href=test2.htmltest2.html/a !-- /BODY /HTML Now, save the above snippet as test.html and try wget -Fi test.html

Re: Comment handling

2003-05-31 Thread Aaron S. Hawley
On Fri, 30 May 2003, George Prekas wrote: I have found a bug in Wget version 1.8.2 concerning comment handling ( !-- comment -- ). Take a look at the following illegal HTML code: HTML BODY a href=test1.htmltest1.html/a !-- a href=test2.htmltest2.html/a !-- /BODY /HTML Now, save

Re: Comment handling

2003-05-31 Thread Tony Lewis
George Prekas wrote: I have found a bug in Wget version 1.8.2 concerning comment handling ( !-- comment -- ). Take a look at the following illegal HTML code: HTML BODY a href=test1.htmltest1.html/a !-- a href=test2.htmltest2.html/a !-- /BODY /HTML Now, save the above snippet

Comment handling

2003-05-30 Thread George Prekas
I have found a bug in Wget version 1.8.2 concerning comment handling ( !-- comment -- ). Take a look at the following illegal HTML code: HTML BODY a href=test1.htmltest1.html/a !-- a href=test2.htmltest2.html/a !-- /BODY /HTML Now, save the above snippet as test.html and try wget -Fi test.html

Comment handling

2003-05-30 Thread George Prekas
I have found a bug in Wget version 1.8.2 concerning comment handling ( !-- comment -- ). Take a look at the following illegal HTML code: HTML BODY a href=test1.htmltest1.html/a !-- a href=test2.htmltest2.html/a !-- /BODY /HTML Now, save the above snippet as test.html and try wget -Fi test.html