>Secondly, it's adding my documents in small chunks.  I was fetching in 100
>document cycles and when I run solrindex I get messages such as the
>following.

In nutch-default.xml file, there is a parameter about solr commit size.
If you want to add big chunks to index, you should increase this paremeter:


<property>
  <name>solr.commit.size</name>
  <value>250</value>
  <description>
  Defines the number of documents to send to Solr in a single update batch.
  Decrease when handling very large documents to prevent Nutch from running
  out of memory. NOTE: It does not explicitly trigger a server side commit.
  </description>
</property>


On Thu, Apr 25, 2013 at 4:35 PM, Bai Shen <[email protected]> wrote:

> I'm having two problems with the solrindex job in Nutch 2.1
>
> When I run it with -all, it indexes every single parsed document, not just
> the newly generated ones, as fetch and parse do.
>
> Secondly, it's adding my documents in small chunks.  I was fetching in 100
> document cycles and when I run solrindex I get messages such as the
> following.
>
> Adding 87 documents
> Adding 5 documents
> Adding 2 documents
> Adding 3 documents
> Adding 14 documents
> Adding 34 documents
> Adding 233 documents
>
> Any ideas what causes this and how to fix it?
>
> Thanks.
>

Reply via email to