Re: Indexing huge data

Erick Erickson Wed, 05 Mar 2014 18:03:47 -0800

Here's the easiest thing to try to figure out where to
concentrate your energies..... Just comment out the
server.add call in your SolrJ program. Well, and any
commits you're doing from SolrJ.

My bet: Your program will run at about the same speed
it does when you actually index the docs, indicating that
your problem is in the data acquisition side. Of course
the older I get, the more times I've been wrong :).

You can also monitor the CPU usage on the box running
Solr. I often see it idling along < 30% when indexing, or
even < 10%, again indicating that the bottleneck is on the
acquisition side.

Note I haven't mentioned any solutions, I'm a believer in
identifying the _problem_ before worrying about a solution.

Best,
Erick

On Wed, Mar 5, 2014 at 4:29 PM, Jack Krupansky <j...@basetechnology.com> wrote:
> Make sure you're not doing a commit on each individual document add. Commit
> every few minutes or every few hundred or few thousand documents is
> sufficient. You can set up auto commit in solrconfig.xml.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rallavagu
> Sent: Wednesday, March 5, 2014 2:37 PM
> To: solr-user@lucene.apache.org
> Subject: Indexing huge data
>
>
> All,
>
> Wondering about best practices/common practices to index/re-index huge
> amount of data in Solr. The data is about 6 million entries in the db
> and other source (data is not located in one resource). Trying with
> solrj based solution to collect data from difference resources to index
> into Solr. It takes hours to index Solr.
>
> Thanks in advance

Re: Indexing huge data

Reply via email to