>----- Forwarded message from Hrvoje Niksic <[EMAIL PROTECTED]> -----
>Definitely.  Debug output should be sent with every bug report.

Well I'm not sure it's a bug, because it looks intended so. I'm monitoring
some discussion groups, which always have a root file that is being updated
with every new message, it contains links to all or all new messages.
The root file has usually no timestamp and it is a CGI file, so no .html
extension.  To get only the new messages, I first execute e.g.

wget -x http://host/cgi-bin/root.cgi

and then

wget -r -nc -k -I /cgi-bin/messages http://host/cgi-bin/root.cgi

so that I get the root file in any case, but only the new messages
(if there is a better way to do this without a timestamped file, I'd love
to hear about it). The problem is, that the root.cgi file is a text/html,
but this feature is lost in the second wget invocation. Unfortunately, in 1.6
-r works only on files with a TEXTHTML flag, so no recursive search is done.
I thought about providing the -F flag to tell wget it's an HTML file, but this
works only with the -i flag. When I use the -i flag, like

wget -r -nc -k -F -D dom -A '*.html' -I /cgi-bin/messages -i host/cgi-bin/root.cgi

wget ignores the -D, -A and -I parameters and loads all kind of junk, like
images and ads from completely different domains. This, IMHO, should not
be the case, -D, -I, -A and the like should be obeyed with the -i flag
as well, even when -r is not specified (the more so when -r is specified).

The best solution would be to provide a flag telling wget that the command-line
URL should be downloaded in any case, but its recursive links only if they are
not already there. Such a feature would definitely be used by many people.

--Micha

Reply via email to