Hi,
I am working with a Nutch 1.4 snapshot and having a very strange problem that
makes the system run out of memory when indexing into Solr. This does not look
like a trivial lack of memory problem that can be solved by giving more memory
to the JVM. I've increased the max memory size from 2Gb to 3Gb, then to 6Gb,
but this did not make any difference.
A log extract is included below.
Would anyone have any idea of how to fix this problem?
Thanks,
Arkadi
2011-10-27 07:08:22,162 INFO solr.SolrWriter - Adding 1000 documents
2011-10-27 07:08:42,248 INFO solr.SolrWriter - Adding 1000 documents
2011-10-27 07:13:54,110 WARN mapred.LocalJobRunner - job_local_0254
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:215)
at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
at java.nio.CharBuffer.toString(CharBuffer.java:1157)
at org.apache.hadoop.io.Text.decode(Text.java:350)
at org.apache.hadoop.io.Text.decode(Text.java:322)
at org.apache.hadoop.io.Text.readString(Text.java:403)
at org.apache.nutch.parse.ParseText.readFields(ParseText.java:50)
at
org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:54)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at
org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:991)
at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:931)
at
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:241)
at
org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:237)
at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:81)
at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2011-10-27 07:13:54,382 ERROR solr.SolrIndexer - java.io.IOException: Job
failed!