Thank you in advance for any assistance you can provide, or pointers at where I should look.
I am using nutch 0.9, with 1 master, 4 slaves. I am crawling a single site with 1.4 million urls. I am running the std generate/fetch/updatedb cycle with topN at 100000. It appears all 97 tasks get mapped. Only one task sees any action. The one task crawls about 3% of my topN and stops eventually with java.lang.OutOfMemoryError: Java heap space errors. I believe I have two problems. One is the heap space issue. The other is the mapping is not spreading out all the urls to multiple map task slots. What settings do I need to modify to get the generated topN (100000) urls to be spread out amongst all map task slots? Thanks! JohnM -- john mendenhall [EMAIL PROTECTED] surf utopia internet services
