>Secondly, it's adding my documents in small chunks. I was fetching in 100 >document cycles and when I run solrindex I get messages such as the >following.
In nutch-default.xml file, there is a parameter about solr commit size. If you want to add big chunks to index, you should increase this paremeter: <property> <name>solr.commit.size</name> <value>250</value> <description> Defines the number of documents to send to Solr in a single update batch. Decrease when handling very large documents to prevent Nutch from running out of memory. NOTE: It does not explicitly trigger a server side commit. </description> </property> On Thu, Apr 25, 2013 at 4:35 PM, Bai Shen <[email protected]> wrote: > I'm having two problems with the solrindex job in Nutch 2.1 > > When I run it with -all, it indexes every single parsed document, not just > the newly generated ones, as fetch and parse do. > > Secondly, it's adding my documents in small chunks. I was fetching in 100 > document cycles and when I run solrindex I get messages such as the > following. > > Adding 87 documents > Adding 5 documents > Adding 2 documents > Adding 3 documents > Adding 14 documents > Adding 34 documents > Adding 233 documents > > Any ideas what causes this and how to fix it? > > Thanks. >

