Here's the easiest thing to try to figure out where to concentrate your energies..... Just comment out the server.add call in your SolrJ program. Well, and any commits you're doing from SolrJ.
My bet: Your program will run at about the same speed it does when you actually index the docs, indicating that your problem is in the data acquisition side. Of course the older I get, the more times I've been wrong :). You can also monitor the CPU usage on the box running Solr. I often see it idling along < 30% when indexing, or even < 10%, again indicating that the bottleneck is on the acquisition side. Note I haven't mentioned any solutions, I'm a believer in identifying the _problem_ before worrying about a solution. Best, Erick On Wed, Mar 5, 2014 at 4:29 PM, Jack Krupansky <j...@basetechnology.com> wrote: > Make sure you're not doing a commit on each individual document add. Commit > every few minutes or every few hundred or few thousand documents is > sufficient. You can set up auto commit in solrconfig.xml. > > -- Jack Krupansky > > -----Original Message----- From: Rallavagu > Sent: Wednesday, March 5, 2014 2:37 PM > To: solr-user@lucene.apache.org > Subject: Indexing huge data > > > All, > > Wondering about best practices/common practices to index/re-index huge > amount of data in Solr. The data is about 6 million entries in the db > and other source (data is not located in one resource). Trying with > solrj based solution to collect data from difference resources to index > into Solr. It takes hours to index Solr. > > Thanks in advance