Re: spanning hosts

Jens Rösner Thu, 28 Mar 2002 08:37:30 -0800

Howdy!

> > I came across a crash caused by a cookie
> > two days ago. I disabled cookies and it worked.
> I'm hoping you had debug output on when it crashed, otherwise this
> is a different crash to the one I already know about. Can you
> confirm this, please?


Yes, I had debug output on.

> > wget -nc -x -r -l0 -t10 -H -Dstory.de,audi -o example.log -k -d
> > -R.gif,.exe,*tn*,*thumb*,*small* -F -i example.html
> >
> > Result with 1.8.1 and 1.7.1 with -nh:
> > audistory.com: Only index.html
> > audistory.de: Everything
> > audi100-online: only the first page
> > kolaschnik.de: only the first page
> 
> Yes, that's how I thought it would behave. Any URLs specified on
> the command line or in a --include-file file are always downloaded
> irregardless of the domain acceptance rules. 

Well, one page of a rejected URL is downloaded, not more.
Whereas the only accepted domain audistory.de gets downloaded
completely.
Doesn't this differ from what you just said?


> One of the changes you
> desire is that the domain acceptance rules should apply to these
> too, which sounds like a reasonable thing to expect.

That's my impression, too (obviously ;)


> > What I would have liked and expected:
> > audistory.com: Everything
> > audistory.de: Everything
> > audi100-online: Everything
> > kolaschnik.de: nothing
> 
> That requires the first change and also different domain matching
> rules. I don't think that should be changed without adding extra
> options to do so. The --domains and --exclude-domains lists are
> meant to be actual domain names. I.e. -Dbar.com is meant to match
> bar.com and foo.bar.com, and it's just a happy accident that it
> also matches fubar.com (perhaps it shouldn't, really). I think if
> someone specified -Dbar.com and it matched
> sex-bar.computer-dating.com, they might be a bit surprised!

Agreed! How about introducing "wildcards" like 
-Dbar.com behaves strictly: www.bar.com, www2.bar.com
-D*bar.com behaves like now: www.bar.com, www2.bar.com, www.foobar.com
-D*bar.com* gets www.bar.com, www2.bar.com, www.foobar.com,
sex-bar.computer-dating.com
That would leave current command lines operational 
and introduce many possibilities without (too much) fuss.
Or have I overlooked anything here?

> > Independent from the the question how the string "audi"
> > should be matched within the URL, I think rejected URLs
> > should not be parsed or be retrieved.
> 
> Well they are all parsed before it is decided whether to retrieve
> them or not!

Ooopsie again. /me looks up "parse"
parse=analyse
Yes, I understand now!

Kind regards
Jens

Re: spanning hosts

Reply via email to