Hello all, I am running Nutch in a Virtual Machine (Debian) with 8 GB RAM and 1,5TB for the hadoop temporal folder. Running the index process with a 1.3GB segments folder, I got "OutOfMemoryError: GC overhead limit exceeded" (see below)
I created the segments using slice=50000 and I also set HADOOP_HEAPSIZE with the maximal physical memory (8000). Do I need more memory to run the index process? Are there some limitation to run Nutch in a Virtual Machine? Thank you! Pato ... ... 2010-03-05 19:52:13,864 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2010-03-05 19:52:13,864 INFO plugin.PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology) 2010-03-05 19:52:13,867 INFO lang.LanguageIdentifier - Language identifier configuration [1-4/2048] 2010-03-05 19:52:22,961 INFO lang.LanguageIdentifier - Language identifier plugin supports: it(1000) is(1000) hu(1000) th(1000) sv(1000) sq(1000) fr(1000) ru(1000) fi(1000) es(1000) en(1000) el(1000) ee(1000) pt(1000) de(1000) da(1000) pl(1000) no(1000) nl(1000) 2010-03-05 19:52:22,961 INFO indexer.IndexingFilters - Adding org.apache.nutch.analysis.lang.LanguageIndexingFilter 2010-03-05 19:52:22,963 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2010-03-05 19:52:22,964 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2010-03-05 19:52:36,278 WARN mapred.LocalJobRunner - job_local_0001 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:775) at org.apache.hadoop.io.Text.encode(Text.java:388) at org.apache.hadoop.io.Text.encode(Text.java:369) at org.apache.hadoop.io.Text.writeString(Text.java:409) at org.apache.nutch.parse.Outlink.write(Outlink.java:52) at org.apache.nutch.parse.ParseData.write(ParseData.java:152) at org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:135) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:613) at org.apache.nutch.indexer.IndexerMapReduce.map(IndexerMapReduce.java:67) at org.apache.nutch.indexer.IndexerMapReduce.map(IndexerMapReduce.java:50) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) 2010-03-05 19:52:37,277 FATAL indexer.Indexer - Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.indexer.Indexer.index(Indexer.java:72) at org.apache.nutch.indexer.Indexer.run(Indexer.java:92) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.Indexer.main(Indexer.java:101) __________________________________________________ Do You Yahoo!? Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen Massenmails. http://mail.yahoo.com