Hi!

Ian wrote:
> Well I only said the URLs specified on the command line or by the
> --include-file option are always downloaded. I didn't intend this
> to be interpreted as also applying to URLs which Wget finds while
> examining the contents of the downloaded html files. At the moment,
> the domain acceptance/rejection checks are only performed when
> downloaded html files are examined for further URLs to be
> downloaded (for the --recursive and --page-requisites options),
> which is why it behaves as it does.

Ah! Now I understand, thanks for explaining again.


[host wil�dcards]
> > -Dbar.com behaves strictly: www.bar.com, www2.bar.com
> > -D*bar.com behaves like now: www.bar.com, www2.bar.com, www.foobar.com
> > -D*bar.com* gets www.bar.com, www2.bar.com, www.foobar.com,
> > sex-bar.computer-dating.com
[...]
> It sounds like it should work okay. I'd prefer to let -Dbar.com
> also match fubar.com for compatibility's sake. If you wanted to
> match www.bar.com and www2.bar.com, but not www.fubar.com you
> could use -D.bar.com, but that wouldn't work if you wanted to
> match bar.com without the www (well, a leading . could be treated
> as a special case).

Sounds a bit more complicated to programme (that's why I did not suggest
it), 
but I must admit I am a fan of 
backwards compatibility :) so your version sounds like a good idea.

> It would be easiest and more consistent (currently) to use
> "shell-globbing" wildcards (as used for the file-acceptance
> rules) rather than grep/egrep-style wildcards.

Well, you got me once again.
Google found this page:
http://www.mkssoftware.com/docs/man1/grep.1.asp
Do I understand correctly that grep/egrep enables the user/programme to
search 
files (strings/records?) for a string expression?
While it appears (to me) to be more powerful than the mentioned
wildcards, 
I do not see the compelling reason to use it, as I think that wildcard
matching 
will work as well (apart from the consistency reason you mentioned).

CU
Jens

Reply via email to