On 28 Mar 2002 at 18:01, Jens Rösner wrote: > > > I came across a crash caused by a cookie > > > two days ago. I disabled cookies and it worked. > > I'm hoping you had debug output on when it crashed, otherwise this > > is a different crash to the one I already know about. Can you > > confirm this, please? > > Yes, I had debug output on.
Thanks for the confirmation. > > > wget -nc -x -r -l0 -t10 -H -Dstory.de,audi -o example.log -k -d > > > -R.gif,.exe,*tn*,*thumb*,*small* -F -i example.html > > > > > > Result with 1.8.1 and 1.7.1 with -nh: > > > audistory.com: Only index.html > > > audistory.de: Everything > > > audi100-online: only the first page > > > kolaschnik.de: only the first page > > > > Yes, that's how I thought it would behave. Any URLs specified on > > the command line or in a --include-file file are always downloaded > > irregardless of the domain acceptance rules. > > Well, one page of a rejected URL is downloaded, not more. > Whereas the only accepted domain audistory.de gets downloaded > completely. > Doesn't this differ from what you just said? Well I only said the URLs specified on the command line or by the --include-file option are always downloaded. I didn't intend this to be interpreted as also applying to URLs which Wget finds while examining the contents of the downloaded html files. At the moment, the domain acceptance/rejection checks are only performed when downloaded html files are examined for further URLs to be downloaded (for the --recursive and --page-requisites options), which is why it behaves as it does. > Agreed! How about introducing "wildcards" like > -Dbar.com behaves strictly: www.bar.com, www2.bar.com > -D*bar.com behaves like now: www.bar.com, www2.bar.com, www.foobar.com > -D*bar.com* gets www.bar.com, www2.bar.com, www.foobar.com, > sex-bar.computer-dating.com > That would leave current command lines operational > and introduce many possibilities without (too much) fuss. > Or have I overlooked anything here? It sounds like it should work okay. I'd prefer to let -Dbar.com also match fubar.com for compatibility's sake. If you wanted to match www.bar.com and www2.bar.com, but not www.fubar.com you could use -D.bar.com, but that wouldn't work if you wanted to match bar.com without the www (well, a leading . could be treated as a special case). It would be easiest and more consistent (currently) to use "shell-globbing" wildcards (as used for the file-acceptance rules) rather than grep/egrep-style wildcards.