Thanks Eric. For the record, we are using 1.4.1 and SolrJ.
On 31 October 2010 01:54, Erick Erickson erickerick...@gmail.com wrote:
What version of Solr are you using?
About committing. I'd just let the solr defaults handle that. You configure
this in the autocommit section of solrconfig.xml. I'm pretty sure this
gets
triggered even if you're using SolrJ.
That said, it's probably wise to issue a commit after all your data is
indexed
too, just to flush any remaining documents since the last autocommit.
Optimize should not be issued until you're all done, if at all. If
you're not deleting (or updating) documents, don't bother to optimize
unless the number of files in your index directory gets really large.
Recent Solr code almost removes the need to optimize unless you
delete documents, but I confess I don't know the revision number
recent refers to, perhaps only trunk...
HTH
Erick
On Thu, Oct 28, 2010 at 9:56 AM, Savvas-Andreas Moysidis
savvas.andreas.moysi...@googlemail.com wrote:
Hello,
We currently index our data through a SQL-DIH setup but due to our model
(and therefore sql query) becoming complex we need to index our data
programmatically. As we didn't have to deal with commit/optimise before,
we
are now wondering whether there is an optimal approach to that. Is there
a
batch size after which we should fire a commit or should we execute a
commit
after indexing all of our data? What about optimise?
Our document corpus is 4m documents and through DIH the resulting index
is
around 1.5G
We have searched previous posts but couldn't find a definite answer. Any
input much appreciated!
Regards,
-- Savvas