Tomi NA wrote:
2006/10/18, [EMAIL PROTECTED] <[EMAIL PROTECTED]>:

Btw we have some virtual local hosts, hoz does the db.ignore.external.links
deal with that ?

Update:
setting db.ignore.external.links to true in nutch-site (and later also
in nutch-default as a sanity check) *doesn't work*: I feed the crawl
process a handfull of URLs and can only helplessly watch as the crawl
spreads to dozens of other sites.

Could you give an example of a root URL, which leads to this symptom (i.e. leaks outside the original site)?


In answer to your question, it seems pointless to talk about virtual
host handling if the elementary filtering logic doesn't seem to
work... :-\

Well, if this logic doesn't work it needs to be fixed, that's all.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to