I 'm running into an error trying to run solrdedup
bin/nutch solrdedup http://127.0.0.1:8080/solr-nutch/

2010-09-23 18:37:16,119 INFO  mapred.JobClient - Running job: job_local_0001
2010-09-23 18:37:17,123 INFO  mapred.JobClient -  map 0% reduce 0%
2010-09-23 18:52:17,801 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.OutOfMemoryError: Java heap space
        at
org.apache.solr.common.util.JavaBinCodec.readSolrDocument(JavaBinCodec.java:
323)
        at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:204)
        at
org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:405)
        at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:171)
        at
org.apache.solr.common.util.JavaBinCodec.readSolrDocumentList(JavaBinCodec.j
ava:339)
        at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:206)
        at
org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:11
0)
        at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:173)
        at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:101)
        at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(Binar
yResponseParser.java:39)
        at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:466)
        at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS
olrServer.java:243)
        at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:
89)
        at
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
        at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRecord
Reader(SolrDeleteDuplicates.java:233)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)


I'm running solr via tomcat.  Tomcat is being started with the memory
parameters of :
-Xms2048m -Xmx2048m

So basically there is 2 gb of memory allocated to stack space.  I have
noticed that by changing the parameters some, the location of the error can
change some, but the bottom line is I still run out of stack space.

Nutch runs for about 15 minute and then the error occurs.

I only have 1 solr index and data/index directory size is about 85gb
I'm using the deliver solrconfig.xml file

Is there something else I need to do?  Some change to the Solr or Tomcat
config I have missed.


Config:
Nutch Release 1.2 - 08/07/2010
CentOS Linux 5.5 
Linux 2.6.18-194.3.1.el5 on x86_64 
Intel(R) Xeon(R) CPU X3220 @ 2.40GHz
8gb of ram


Thanks
Brad


Reply via email to