Re: howto increase indexing speed?

2013-10-16 Thread primoz . skale
I think DIH uses only one core per instance. IMHO 300 doc/sec is quite 
good. If you would like to use more cores you need to use solrj. Or maybe 
more than one DIH and more cores of course.

Primoz



From:   Giovanni Bricconi giovanni.bricc...@banzai.it
To: solr-user solr-user@lucene.apache.org
Date:   16.10.2013 16:25
Subject:howto increase indexing speed?



I have a small solr setup, not even on a physical machine but a vmware
virtual machine with a single cpu that reads data using DIH from a
database. The machine has no phisical disks attached but stores data on a
netapp nas.

Currently this machine indexes 320 documents/sec, not bad but we plan to
double the index and we would like to keep nearly the same.

Doing some basic checks during the indexing I have found with iostat that
the usage of the disks is nearly 8% and the source database is running
fine, instead the  virtual cpu is 95% running on solr.

Now I can quite easily add another virtual cpu to the solr box, but as far
as I know this won't help because DIH doesn't work in parallel. Am I 
wrong?

What would you do? Rewrite the feeding process quitting dih and using 
solrj
to feed data in parallel? Would you instead keep DIH and switch to a
sharded configuration?

Thank you for any hints

Giovanni



Re: howto increase indexing speed?

2013-10-16 Thread Walter Underwood
You might consider local disks. I once ran Solr with the indexes on an 
NFS-mounted volume and the slowdown was severe.

wunder

On Oct 16, 2013, at 7:40 AM, primoz.sk...@policija.si wrote:

 I think DIH uses only one core per instance. IMHO 300 doc/sec is quite 
 good. If you would like to use more cores you need to use solrj. Or maybe 
 more than one DIH and more cores of course.
 
 Primoz
 
 
 
 From:   Giovanni Bricconi giovanni.bricc...@banzai.it
 To: solr-user solr-user@lucene.apache.org
 Date:   16.10.2013 16:25
 Subject:howto increase indexing speed?
 
 
 
 I have a small solr setup, not even on a physical machine but a vmware
 virtual machine with a single cpu that reads data using DIH from a
 database. The machine has no phisical disks attached but stores data on a
 netapp nas.
 
 Currently this machine indexes 320 documents/sec, not bad but we plan to
 double the index and we would like to keep nearly the same.
 
 Doing some basic checks during the indexing I have found with iostat that
 the usage of the disks is nearly 8% and the source database is running
 fine, instead the  virtual cpu is 95% running on solr.
 
 Now I can quite easily add another virtual cpu to the solr box, but as far
 as I know this won't help because DIH doesn't work in parallel. Am I 
 wrong?
 
 What would you do? Rewrite the feeding process quitting dih and using 
 solrj
 to feed data in parallel? Would you instead keep DIH and switch to a
 sharded configuration?
 
 Thank you for any hints
 
 Giovanni
 

--
Walter Underwood
wun...@wunderwood.org