Hi Markus,

> -----Original Message-----
> From: Markus Jelsma [mailto:[email protected]]
> Sent: Thursday, 27 October 2011 11:33 PM
> To: [email protected]
> Subject: Re: OutOfMemoryError when indexing into Solr
> 
> Interesting, how many records and how large are your records?

There a bit more than 80,000 documents.

<property>
      <name>http.content.limit</name> <value>150000000</value>
</property>

<property>
   <name>indexer.max.tokens</name><value>100000</value> 
</property>

> How did you increase JVM heap size?

opts="-XX:+UseConcMarkSweepGC -Xms500m -Xmx6000m -XX:MinHeapFreeRatio=10 
-XX:MaxHeapFreeRatio=30 -XX:MaxPermSize=512m -XX:+CMSClassUnloadingEnabled"

> Do you have custom indexing filters?

Yes. They add a few fields to each document. These fields are small, within a 
hundred of bytes per document.

> Can you decrease the commit.size?

Yes. Thank you. Good idea. I did not even consider it because, for whatever 
reason, this option was not in my nutch-default.xml. I've put it to 100. I hope 
that Solr commit is not done after sending each bunch. Else this would have a 
very negative impact on performance because Solr commits are very expensive.  
 

> Do you also index large amounts of anchors (without deduplication) and pass 
> in a very large linkdb?

I do index anchors, but don't think that there is anything extraordinary about 
them. As I only index less than 100K pages, my linkdb should not be nearly as 
large as in cases when people index millions of documents.
 
> The reducer of IndexerMapReduce is a notorious RAM consumer.

If reducing solr.commit.size helps, it would make sense to decrease the default 
value. Sending small bunches of documents to Solr without commits is not that 
expensive to risk having memory problems.

Thanks again.

Regards,

Arkadi


> 
> On Thursday 27 October 2011 05:54:54 [email protected] wrote:
> > Hi,
> >
> > I am working with a Nutch 1.4 snapshot and having a very strange
> problem
> > that makes the system run out of memory when indexing into Solr. This
> does
> > not look like a trivial lack of memory problem that can be solved by
> > giving more memory to the JVM. I've increased the max memory size
> from 2Gb
> > to 3Gb, then to 6Gb, but this did not make any difference.
> >
> > A log extract is included below.
> >
> > Would anyone have any idea of how to fix this problem?
> >
> > Thanks,
> >
> > Arkadi
> >
> >
> > 2011-10-27 07:08:22,162 INFO  solr.SolrWriter - Adding 1000 documents
> > 2011-10-27 07:08:42,248 INFO  solr.SolrWriter - Adding 1000 documents
> > 2011-10-27 07:13:54,110 WARN  mapred.LocalJobRunner - job_local_0254
> > java.lang.OutOfMemoryError: Java heap space
> >        at java.util.Arrays.copyOfRange(Arrays.java:3209)
> >        at java.lang.String.<init>(String.java:215)
> >        at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
> >        at java.nio.CharBuffer.toString(CharBuffer.java:1157)
> >        at org.apache.hadoop.io.Text.decode(Text.java:350)
> >        at org.apache.hadoop.io.Text.decode(Text.java:322)
> >        at org.apache.hadoop.io.Text.readString(Text.java:403)
> >        at
> org.apache.nutch.parse.ParseText.readFields(ParseText.java:50)
> >        at
> >
> org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWri
> tab
> > leConfigurable.java:54) at
> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeseriali
> zer
> > .deserialize(WritableSerialization.java:67) at
> >
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeseriali
> zer
> > .deserialize(WritableSerialization.java:40) at
> >
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:99
> 1)
> > at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:931)
> at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(Red
> uce
> > Task.java:241) at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTas
> k.j
> > ava:237) at
> >
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:
> 81)
> > at
> >
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:
> 50)
> > at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216
> )
> > 2011-10-27 07:13:54,382 ERROR solr.SolrIndexer - java.io.IOException:
> Job
> > failed!
> 
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350

Reply via email to