Re: Fetching outside the domain ?

Tomi NA Wed, 25 Oct 2006 05:00:46 -0700

2006/10/23, Andrzej Bialecki <[EMAIL PROTECTED]>:

Tomi NA wrote:
> 2006/10/18, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
>
>> Btw we have some virtual local hosts, hoz does the
>> db.ignore.external.links
>> deal with that ?
>
> Update:
> setting db.ignore.external.links to true in nutch-site (and later also
> in nutch-default as a sanity check) *doesn't work*: I feed the crawl
> process a handfull of URLs and can only helplessly watch as the crawl
> spreads to dozens of other sites.


Could you give an example of a root URL, which leads to this symptom
(i.e. leaks outside the original site)?


I'll try to find out exactly where the crawler starts to run loose as
I have several web sites in my initial URL list.

> In answer to your question, it seems pointless to talk about virtual
> host handling if the elementary filtering logic doesn't seem to
> work... :-\

Well, if this logic doesn't work it needs to be fixed, that's all.


Won't argue with you there.

t.n.a.

Re: Fetching outside the domain ?

Reply via email to