(nutch-nightly, hadoop 0.9.1, linux/686, 4GB ram)

At the end of a long index in a crawl cycle I got a  
java.lang.outOfMemoryError: Java heap space from the indexer. I have  
4GB of ram. There appears to be 142150 docs.

Any idea what this could be caused by?


The bin/nutch index commandline reported:

Indexer: java.io.IOException: Job failed!
         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java: 
399)
         at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
         at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
         at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)


And the hadoop log reported:

2007-01-17 05:09:40,257 INFO  indexer.Indexer - merging segments  
_2qf2 (125000 docs) _2sdx (2500 docs) _2ucs (2500 docs) _2wbn (2500  
docs) _2yai (2500 docs) _309d (2500 docs) _3288 (2500 docs) _329n (50  
docs) _32b2 (50 docs) _32ch (50 docs) _32dw (50 docs) _32fb (50 docs)  
_32gq (50 docs) _32i5 (50 docs) _32jk (50 docs) _32kz (50 docs) _32me  
(50 docs) _32nt (50 docs) _32p8 (50 docs) _32qn (50 docs) _32s2 (50  
docs) _32th (50 docs) _32uw (50 docs) _32wb (50 docs) _32xq (50 docs)  
_32z5 (50 docs) _330k (50 docs) _331z (50 docs) _333e (50 docs) _334t  
(50 docs) _3368 (50 docs) _337n (50 docs) _3392 (50 docs) _33ah (50  
docs) _33bw (50 docs) _33db (50 docs) _33eq (50 docs) _33g5 (50 docs)  
_33hk (50 docs) _33iz (50 docs) _33ke (50 docs) _33lt (50 docs) _33n8  
(50 docs) _33on (50 docs) _33q2 (50 docs) _33rh (50 docs) _33sw (50  
docs) _33ub (50 docs) _33vq (50 docs) _33x6 (50 docs) into _33x7  
(142150 docs)
2007-01-17 05:09:40,647 WARN  mapred.LocalJobRunner - job_w1h0ii
java.lang.OutOfMemoryError: Java heap space
2007-01-17 05:09:41,005 FATAL indexer.Indexer - Indexer:  
java.io.IOException: Job failed!
         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java: 
399)
         at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
         at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
         at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)



--
http://variogr.am/
[EMAIL PROTECTED]




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to