Look in nutch-default.xml
The properties db.max.outlinks.per.page and http.content.limit might
need to have their values increased.
Cheers,
Carl.
Jeff Maki wrote:
Hello everyone,
I'm not going to post my config files as not to spam you all, but I
have a general question: I'm trying to index the pages of a website
(obviously), and I've created a special page with a link to all the
pages I want to index. I then pointed nutch to this special link page.
I set max_outlinks appropriately, and I do see all the page URLs I
expect go by in the log for the fetching stage.
When nutch gets to indexing, however, not all the documents appear in
the log--it looks as if not all of the fetched pages are being
indexed. Searching for terms I know are on the missing pages also
turns up nothing--they're not in the index!?
Can anybody tell me what factors affect the indexing stage? I want to
have nutch index *all* documents it fetches. How can I do this?
Any tips/ideas/things to configure?
Thanks in advance,
-Jeff
_____________________________________________________________________
This has been cleaned & processed by www.rocketspam.co.nz
_____________________________________________________________________