Thank you in advance for any assistance you can
provide, or pointers at where I should look.

I am using nutch 0.9, with 1 master, 4 slaves.
I am crawling a single site with 1.4 million urls.

I am running the std generate/fetch/updatedb cycle
with topN at 100000.
It appears all 97 tasks get mapped.  Only one task
sees any action.
The one task crawls about 3% of my topN and stops
eventually with java.lang.OutOfMemoryError: Java heap space
errors.

I believe I have two problems.  One is the heap space
issue.  The other is the mapping is not spreading out
all the urls to multiple map task slots.

What settings do I need to modify to get the generated
topN (100000) urls to be spread out amongst all map
task slots?

Thanks!

JohnM

-- 
john mendenhall
[EMAIL PROTECTED]
surf utopia
internet services

Reply via email to