On Tue, Jun 14, 2011 at 8:31 PM, Mark <static.void....@gmail.com> wrote:
> Hello all,
>
> We are using DIH to index our data (~6M documents) and its taking an
> extremely long time (~24 hours). I am trying to find ways that we can speed
> this up. I've been reading through older posts and it's my understanding
> this should not take that long.
[...]

What is the size of the data, and which database are you using?
We saw significant improvements in indexing speed in going from
MS-SQL to mysql. Also, if you are using MS-SQL, try using the
open-source jtds JDBC driver, rather than the Microsoft one: While
this probably does not affect the speed, we found it more reliable
in maintaining connections.

Another thing to try is indexing simultaneously to multiple Solr cores,
and merging at the end, assuming that you can shard your SELECT
from the database. If the database server does not get overly loaded,
the indexing speed seems to scale linearly with the number of cores.
If the database server is getting overloaded, consider clustering and
replication for the database.

Regards,
Gora

Reply via email to