Re: Commit/Optimise question

2010-10-31 Thread Savvas-Andreas Moysidis
Thanks Eric. For the record, we are using 1.4.1 and SolrJ.

On 31 October 2010 01:54, Erick Erickson erickerick...@gmail.com wrote:

 What version of Solr are you using?

 About committing. I'd just let the solr defaults handle that. You configure
 this in the autocommit section of solrconfig.xml. I'm pretty sure this
  gets
 triggered even if you're using SolrJ.

 That said, it's probably wise to issue a commit after all your data is
 indexed
 too, just to flush any remaining documents since the last autocommit.

 Optimize should not be issued until you're all done, if at all. If
 you're not deleting (or updating) documents, don't bother to optimize
 unless the number of files in your index directory gets really large.
 Recent Solr code almost removes the need to optimize unless you
 delete documents, but I confess I don't know the revision number
 recent refers to, perhaps only trunk...

 HTH
 Erick

 On Thu, Oct 28, 2010 at 9:56 AM, Savvas-Andreas Moysidis 
 savvas.andreas.moysi...@googlemail.com wrote:

  Hello,
 
  We currently index our data through a SQL-DIH setup but due to our model
  (and therefore sql query) becoming complex we need to index our data
  programmatically. As we didn't have to deal with commit/optimise before,
 we
  are now wondering whether there is an optimal approach to that. Is there
 a
  batch size after which we should fire a commit or should we execute a
  commit
  after indexing all of our data? What about optimise?
 
  Our document corpus is  4m documents and through DIH the resulting index
  is
  around 1.5G
 
  We have searched previous posts but couldn't find a definite answer. Any
  input much appreciated!
 
  Regards,
  -- Savvas
 



Re: Commit/Optimise question

2010-10-30 Thread Erick Erickson
What version of Solr are you using?

About committing. I'd just let the solr defaults handle that. You configure
this in the autocommit section of solrconfig.xml. I'm pretty sure this  gets
triggered even if you're using SolrJ.

That said, it's probably wise to issue a commit after all your data is
indexed
too, just to flush any remaining documents since the last autocommit.

Optimize should not be issued until you're all done, if at all. If
you're not deleting (or updating) documents, don't bother to optimize
unless the number of files in your index directory gets really large.
Recent Solr code almost removes the need to optimize unless you
delete documents, but I confess I don't know the revision number
recent refers to, perhaps only trunk...

HTH
Erick

On Thu, Oct 28, 2010 at 9:56 AM, Savvas-Andreas Moysidis 
savvas.andreas.moysi...@googlemail.com wrote:

 Hello,

 We currently index our data through a SQL-DIH setup but due to our model
 (and therefore sql query) becoming complex we need to index our data
 programmatically. As we didn't have to deal with commit/optimise before, we
 are now wondering whether there is an optimal approach to that. Is there a
 batch size after which we should fire a commit or should we execute a
 commit
 after indexing all of our data? What about optimise?

 Our document corpus is  4m documents and through DIH the resulting index
 is
 around 1.5G

 We have searched previous posts but couldn't find a definite answer. Any
 input much appreciated!

 Regards,
 -- Savvas