Hello all,
I am  running Nutch in a Virtual Machine (Debian) with 8 GB RAM and 1,5TB for 
the hadoop temporal folder.
Running the index process with a 1.3GB segments folder,  I got  
"OutOfMemoryError: GC overhead limit exceeded"  (see below)

I created the segments using slice=50000
and I also set HADOOP_HEAPSIZE with the maximal physical memory (8000).

Do I need more memory to run the index process?
Are there some limitation to run Nutch in a Virtual Machine?

Thank you!
Pato

...
...                                                                             
                                                                                
           
2010-03-05 19:52:13,864 INFO  plugin.PluginRepository -         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)                                        
                                              
2010-03-05 19:52:13,864 INFO  plugin.PluginRepository -         Ontology Model 
Loader (org.apache.nutch.ontology.Ontology)                                     
                                             
2010-03-05 19:52:13,867 INFO  lang.LanguageIdentifier - Language identifier 
configuration [1-4/2048]  
2010-03-05 19:52:22,961 INFO  lang.LanguageIdentifier - Language identifier 
plugin supports: it(1000) is(1000) hu(1000) th(1000) sv(1000) sq(1000) fr(1000) 
ru(1000) fi(1000) es(1000) en(1000) el(1000) ee(1000) pt(1000) de(1000) 
da(1000) pl(1000) no(1000) nl(1000)                                           
2010-03-05 19:52:22,961 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.analysis.lang.LanguageIndexingFilter                           
                                                             
2010-03-05 19:52:22,963 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.basic.BasicIndexingFilter                              
                                                             
2010-03-05 19:52:22,964 INFO  indexer.IndexingFilters - Adding 
org.apache.nutch.indexer.anchor.AnchorIndexingFilter
2010-03-05 19:52:36,278 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
        at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:775)
        at org.apache.hadoop.io.Text.encode(Text.java:388)
        at org.apache.hadoop.io.Text.encode(Text.java:369)
        at org.apache.hadoop.io.Text.writeString(Text.java:409)
        at org.apache.nutch.parse.Outlink.write(Outlink.java:52)
        at org.apache.nutch.parse.ParseData.write(ParseData.java:152)
        at org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:135)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:613)
        at 
org.apache.nutch.indexer.IndexerMapReduce.map(IndexerMapReduce.java:67)
        at 
org.apache.nutch.indexer.IndexerMapReduce.map(IndexerMapReduce.java:50)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
2010-03-05 19:52:37,277 FATAL indexer.Indexer - Indexer: java.io.IOException: 
Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:72)
        at org.apache.nutch.indexer.Indexer.run(Indexer.java:92)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:101)

__________________________________________________
Do You Yahoo!?
Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen 
Massenmails. 
http://mail.yahoo.com

Reply via email to