Re: nutch 0.9, multiple nodes, not fetching topN links to fetch

John Mendenhall Thu, 24 Jan 2008 17:21:24 -0800

Thank you in advance for any assistance you can
provide, or pointers at where I should look.


I am using nutch 0.9, with 1 master, 4 slaves.
I am crawling a single site with 1.4 million urls.

I am running the std generate/fetch/updatedb cycle
with topN at 100000.
It appears all 97 tasks get mapped.  Only one task
sees any action.
The one task crawls about 3% of my topN and stops
eventually with java.lang.OutOfMemoryError: Java heap space
errors.

I believe I have two problems.  One is the heap space
issue.  The other is the mapping is not spreading out
all the urls to multiple map task slots.

What settings do I need to modify to get the generated
topN (100000) urls to be spread out amongst all map
task slots?

Thanks!

JohnM

-- 
john mendenhall
[EMAIL PROTECTED]
surf utopia
internet services

Re: nutch 0.9, multiple nodes, not fetching topN links to fetch

Reply via email to