(nutch-nightly, hadoop 0.9.1, linux/686, 4GB ram)
At the end of a long index in a crawl cycle I got a
java.lang.outOfMemoryError: Java heap space from the indexer. I have
4GB of ram. There appears to be 142150 docs.
Any idea what this could be caused by?
The bin/nutch index commandline reported:
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
399)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)
And the hadoop log reported:
2007-01-17 05:09:40,257 INFO indexer.Indexer - merging segments
_2qf2 (125000 docs) _2sdx (2500 docs) _2ucs (2500 docs) _2wbn (2500
docs) _2yai (2500 docs) _309d (2500 docs) _3288 (2500 docs) _329n (50
docs) _32b2 (50 docs) _32ch (50 docs) _32dw (50 docs) _32fb (50 docs)
_32gq (50 docs) _32i5 (50 docs) _32jk (50 docs) _32kz (50 docs) _32me
(50 docs) _32nt (50 docs) _32p8 (50 docs) _32qn (50 docs) _32s2 (50
docs) _32th (50 docs) _32uw (50 docs) _32wb (50 docs) _32xq (50 docs)
_32z5 (50 docs) _330k (50 docs) _331z (50 docs) _333e (50 docs) _334t
(50 docs) _3368 (50 docs) _337n (50 docs) _3392 (50 docs) _33ah (50
docs) _33bw (50 docs) _33db (50 docs) _33eq (50 docs) _33g5 (50 docs)
_33hk (50 docs) _33iz (50 docs) _33ke (50 docs) _33lt (50 docs) _33n8
(50 docs) _33on (50 docs) _33q2 (50 docs) _33rh (50 docs) _33sw (50
docs) _33ub (50 docs) _33vq (50 docs) _33x6 (50 docs) into _33x7
(142150 docs)
2007-01-17 05:09:40,647 WARN mapred.LocalJobRunner - job_w1h0ii
java.lang.OutOfMemoryError: Java heap space
2007-01-17 05:09:41,005 FATAL indexer.Indexer - Indexer:
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
399)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)
--
http://variogr.am/
[EMAIL PROTECTED]
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general