Howdy! > > I came across a crash caused by a cookie > > two days ago. I disabled cookies and it worked. > I'm hoping you had debug output on when it crashed, otherwise this > is a different crash to the one I already know about. Can you > confirm this, please?
Yes, I had debug output on. > > wget -nc -x -r -l0 -t10 -H -Dstory.de,audi -o example.log -k -d > > -R.gif,.exe,*tn*,*thumb*,*small* -F -i example.html > > > > Result with 1.8.1 and 1.7.1 with -nh: > > audistory.com: Only index.html > > audistory.de: Everything > > audi100-online: only the first page > > kolaschnik.de: only the first page > > Yes, that's how I thought it would behave. Any URLs specified on > the command line or in a --include-file file are always downloaded > irregardless of the domain acceptance rules. Well, one page of a rejected URL is downloaded, not more. Whereas the only accepted domain audistory.de gets downloaded completely. Doesn't this differ from what you just said? > One of the changes you > desire is that the domain acceptance rules should apply to these > too, which sounds like a reasonable thing to expect. That's my impression, too (obviously ;) > > What I would have liked and expected: > > audistory.com: Everything > > audistory.de: Everything > > audi100-online: Everything > > kolaschnik.de: nothing > > That requires the first change and also different domain matching > rules. I don't think that should be changed without adding extra > options to do so. The --domains and --exclude-domains lists are > meant to be actual domain names. I.e. -Dbar.com is meant to match > bar.com and foo.bar.com, and it's just a happy accident that it > also matches fubar.com (perhaps it shouldn't, really). I think if > someone specified -Dbar.com and it matched > sex-bar.computer-dating.com, they might be a bit surprised! Agreed! How about introducing "wildcards" like -Dbar.com behaves strictly: www.bar.com, www2.bar.com -D*bar.com behaves like now: www.bar.com, www2.bar.com, www.foobar.com -D*bar.com* gets www.bar.com, www2.bar.com, www.foobar.com, sex-bar.computer-dating.com That would leave current command lines operational and introduce many possibilities without (too much) fuss. Or have I overlooked anything here? > > Independent from the the question how the string "audi" > > should be matched within the URL, I think rejected URLs > > should not be parsed or be retrieved. > > Well they are all parsed before it is decided whether to retrieve > them or not! Ooopsie again. /me looks up "parse" parse=analyse Yes, I understand now! Kind regards Jens