Hi! Ian wrote: > Well I only said the URLs specified on the command line or by the > --include-file option are always downloaded. I didn't intend this > to be interpreted as also applying to URLs which Wget finds while > examining the contents of the downloaded html files. At the moment, > the domain acceptance/rejection checks are only performed when > downloaded html files are examined for further URLs to be > downloaded (for the --recursive and --page-requisites options), > which is why it behaves as it does.
Ah! Now I understand, thanks for explaining again. [host wil�dcards] > > -Dbar.com behaves strictly: www.bar.com, www2.bar.com > > -D*bar.com behaves like now: www.bar.com, www2.bar.com, www.foobar.com > > -D*bar.com* gets www.bar.com, www2.bar.com, www.foobar.com, > > sex-bar.computer-dating.com [...] > It sounds like it should work okay. I'd prefer to let -Dbar.com > also match fubar.com for compatibility's sake. If you wanted to > match www.bar.com and www2.bar.com, but not www.fubar.com you > could use -D.bar.com, but that wouldn't work if you wanted to > match bar.com without the www (well, a leading . could be treated > as a special case). Sounds a bit more complicated to programme (that's why I did not suggest it), but I must admit I am a fan of backwards compatibility :) so your version sounds like a good idea. > It would be easiest and more consistent (currently) to use > "shell-globbing" wildcards (as used for the file-acceptance > rules) rather than grep/egrep-style wildcards. Well, you got me once again. Google found this page: http://www.mkssoftware.com/docs/man1/grep.1.asp Do I understand correctly that grep/egrep enables the user/programme to search files (strings/records?) for a string expression? While it appears (to me) to be more powerful than the mentioned wildcards, I do not see the compelling reason to use it, as I think that wildcard matching will work as well (apart from the consistency reason you mentioned). CU Jens
