Optimizing cores in SolrCloud

2013-11-14 Thread michael.boom
A few weeks ago optimization in SolrCloud was discussed in this thred:
http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-td4097499.html#a4098020

The thread was covering the distributed optimization inside a collection.
My use case requires manually running optimizations every week or so,
because I do delete by query often, and deletedDocs number gets to huge
amounts, and the only way to regain that space is by optimizing.

Since I have a pretty steady high load, I can't do it over night and i was
thinking to do it one core at a time - meaning optimizing shard1_replica1
and then shard1_replica2 and so on, using 
curl
'http://localhost:8983/solr/collection1_shard1_replica1/update?optimize=truedistrib=false'

My question is how would this reflect on the performance of the system? All
queries that would be routed to that shard replica would be very slow I
assume. 

Would there be any problems if a replica is optimized and another is not?
Anybody tried something like this? Any tips or stories ?
Thank you!



-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Optimizing cores in SolrCloud

2013-11-14 Thread Erick Erickson
I'm going to answer with something completely different G

First, though, optimization happens in the background, so it
shouldn't have too big an impact on query performance outside of
I/O contention. There also shouldn't be any problem with one
shard being optimized and one not.

Second, have you considered tweaking some of the TieredMergePolicy
knobs? In particular.
reclaimDeletesWeight
which defaults to 2.0. You can set this in your solrconfig.xml. Through
a clever bit of reflection, you can actually set most (all?) of the
member vars in TieredMergePolicy.java.

Bumping up the weight might cause the segment merges to merge-away
the deleted docs frequently enough to satisfy you.

Best,
Erick


On Thu, Nov 14, 2013 at 5:39 AM, michael.boom my_sky...@yahoo.com wrote:

 A few weeks ago optimization in SolrCloud was discussed in this thred:

 http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-td4097499.html#a4098020

 The thread was covering the distributed optimization inside a collection.
 My use case requires manually running optimizations every week or so,
 because I do delete by query often, and deletedDocs number gets to huge
 amounts, and the only way to regain that space is by optimizing.

 Since I have a pretty steady high load, I can't do it over night and i was
 thinking to do it one core at a time - meaning optimizing shard1_replica1
 and then shard1_replica2 and so on, using
 curl
 '
 http://localhost:8983/solr/collection1_shard1_replica1/update?optimize=truedistrib=false
 '

 My question is how would this reflect on the performance of the system? All
 queries that would be routed to that shard replica would be very slow I
 assume.

 Would there be any problems if a replica is optimized and another is not?
 Anybody tried something like this? Any tips or stories ?
 Thank you!



 -
 Thanks,
 Michael
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Optimizing cores in SolrCloud

2013-11-14 Thread michael.boom
Thanks Erick!

That's a really interesting idea, i'll try it!
Another question would be, when does the merging actually happens? Is it
triggered or conditioned by something?

Currently I have a core with ~13M maxDocs and ~3M deleted docs, and although
I see a lot of merges in SPM, deleted documents aren't really going
anywhere.
For merging I have the example settings, haven't changed it.




-
Thanks,
Michael
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871p4100936.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Optimizing cores in SolrCloud

2013-11-14 Thread Walter Underwood
Earlier, you said that optimize is the only way that deleted documents are 
expunged. That is false. They are expunged when the segment they are in is 
merged. A forced merge (optimize) merges all segments, so will expunge all 
deleted document. But those documents will be expunged by merges eventually.

When you have deleted docs in the largest segment, you have to wait for a merge 
of that segment.

My best advice is to stop looking at the deleted documents count and worry 
about something that makes a difference to your users.

For about 10 years, I worked on Ultraseek Server, a search engine with the same 
design for merging and document deletion. With over 10K installations, we never 
had a customer who had a problem caused by deleted documents.

wunder

On Nov 14, 2013, at 7:41 AM, michael.boom my_sky...@yahoo.com wrote:

 Thanks Erick!
 
 That's a really interesting idea, i'll try it!
 Another question would be, when does the merging actually happens? Is it
 triggered or conditioned by something?
 
 Currently I have a core with ~13M maxDocs and ~3M deleted docs, and although
 I see a lot of merges in SPM, deleted documents aren't really going
 anywhere.
 For merging I have the example settings, haven't changed it.
 
 
 
 
 -
 Thanks,
 Michael
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Optimizing-cores-in-SolrCloud-tp4100871p4100936.html
 Sent from the Solr - User mailing list archive at Nabble.com.