Hello Ted, I ran the command 'ps -aux' and I confirmed that only 1GB was defined. I adjust NUTCH_HEAPSIZE to 8GB (physical RAM) and ran it again successfully.
Do you know which parameters need to be adjusted if not enough physical RAM is available on the server? For example for 2GB RAM. I ran a web crawl (depth=6) without the parameter topN and the segments grew exponetially. Later, I had a lot of problems by merging the segments and by indexing (not enough memory, too many opened files, etc.). Thank you for your help Pato ----- Ursprüngliche Mail ---- Von: Ted Yu <yuzhih...@gmail.com> An: nutch-user@lucene.apache.org Gesendet: Samstag, den 6. März 2010, 15:42:38 Uhr Betreff: Re: By Indexing I get: OutOfMemoryError: GC overhead limit exceeded ... Can you use 'ps aux' to find out the -Xmx commandine parameter passed to java for the following action ? On Fri, Mar 5, 2010 at 1:14 PM, Patricio Galeas <pgal...@yahoo.de> wrote: > Hello all, > I am running Nutch in a Virtual Machine (Debian) with 8 GB RAM and 1,5TB > for the hadoop temporal folder. > Running the index process with a 1.3GB segments folder, I got > "OutOfMemoryError: GC overhead limit exceeded" (see below) > > I created the segments using slice=50000 > and I also set HADOOP_HEAPSIZE with the maximal physical memory (8000). > > Do I need more memory to run the index process? > Are there some limitation to run Nutch in a Virtual Machine? > > Thank you! > Pato > > ... > ... > 2010-03-05 19:52:13,864 INFO plugin.PluginRepository - Nutch > Scoring (org.apache.nutch.scoring.ScoringFilter) > 2010-03-05 19:52:13,864 INFO plugin.PluginRepository - Ontology > Model Loader (org.apache.nutch.ontology.Ontology) > 2010-03-05 19:52:13,867 INFO lang.LanguageIdentifier - Language identifier > configuration [1-4/2048] > 2010-03-05 19:52:22,961 INFO lang.LanguageIdentifier - Language identifier > plugin supports: it(1000) is(1000) hu(1000) th(1000) sv(1000) sq(1000) > fr(1000) ru(1000) fi(1000) es(1000) en(1000) el(1000) ee(1000) pt(1000) > de(1000) da(1000) pl(1000) no(1000) nl(1000) > 2010-03-05 19:52:22,961 INFO indexer.IndexingFilters - Adding > org.apache.nutch.analysis.lang.LanguageIndexingFilter > 2010-03-05 19:52:22,963 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.basic.BasicIndexingFilter > 2010-03-05 19:52:22,964 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.anchor.AnchorIndexingFilter > 2010-03-05 19:52:36,278 WARN mapred.LocalJobRunner - job_local_0001 > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) > at java.nio.charset.CharsetEncoder.encode(CharsetEncoder.java:775) > at org.apache.hadoop.io.Text.encode(Text.java:388) > at org.apache.hadoop.io.Text.encode(Text.java:369) > at org.apache.hadoop.io.Text.writeString(Text.java:409) > at org.apache.nutch.parse.Outlink.write(Outlink.java:52) > at org.apache.nutch.parse.ParseData.write(ParseData.java:152) > at > org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:135) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:613) > at > org.apache.nutch.indexer.IndexerMapReduce.map(IndexerMapReduce.java:67) > at > org.apache.nutch.indexer.IndexerMapReduce.map(IndexerMapReduce.java:50) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) > 2010-03-05 19:52:37,277 FATAL indexer.Indexer - Indexer: > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) > at org.apache.nutch.indexer.Indexer.index(Indexer.java:72) > at org.apache.nutch.indexer.Indexer.run(Indexer.java:92) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.indexer.Indexer.main(Indexer.java:101) > > __________________________________________________ > Do You Yahoo!? > Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz > gegen Massenmails. > http://mail.yahoo.com > __________________________________________________ Do You Yahoo!? Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen Massenmails. http://mail.yahoo.com