2006/10/23, Andrzej Bialecki <[EMAIL PROTECTED]>:
Tomi NA wrote:
> 2006/10/18, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
>
>> Btw we have some virtual local hosts, hoz does the
>> db.ignore.external.links
>> deal with that ?
>
> Update:
> setting db.ignore.external.links to true in nutch-site (and later also
> in nutch-default as a sanity check) *doesn't work*: I feed the crawl
> process a handfull of URLs and can only helplessly watch as the crawl
> spreads to dozens of other sites.
Could you give an example of a root URL, which leads to this symptom
(i.e. leaks outside the original site)?
I'll try to find out exactly where the crawler starts to run loose as
I have several web sites in my initial URL list.
> In answer to your question, it seems pointless to talk about virtual
> host handling if the elementary filtering logic doesn't seem to
> work... :-\
Well, if this logic doesn't work it needs to be fixed, that's all.
Won't argue with you there.
t.n.a.