Re: 'Optimizing' Solr Index Size

2013-08-07 Thread Erick Erickson
The general advice is to not merge (optimize) unless your
index is relatively static. You're quite correct, optimizing
simply recovers the space from deleted documents, otherwise
it won't change much (except having fewer segments).

Here's a _great_ video that Mike McCandless put together:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

But in general _whenever_ segments are merged, the
resulting segment will have all the data from deleted docs
removed, and segments are merged continually when
data is being added to the index.

Quick-n-dirty way to estimate the space savings
optimize will give you. Look at the admin page for the core and
the ratio of deleted docs to numDocs is about the unused
space that would be regained by an optimize. From there it's
your call G...

Best
Erick


On Tue, Aug 6, 2013 at 12:02 PM, Brendan Grainger 
brendan.grain...@gmail.com wrote:

 To maybe answer another one of my questions about the 50Gb recovered when
 running:

 curl '

 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
 '

 It looks to me that it was from deleted docs being completely removed from
 the index.

 Thanks



 On Tue, Aug 6, 2013 at 11:45 AM, Brendan Grainger 
 brendan.grain...@gmail.com wrote:

  Well, I guess I can answer one of my questions which I didn't exactly
  explicitly state, which is: how do I force solr to merge segments to a
  given maximum. I forgot about doing this:
 
  curl '
 
 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
  '
 
  which reduced the number of segments in my index from 12 to 10.
 Amazingly,
  it also reduced the space used by almost 50Gb. Is that even possible?
 
  Thanks again
  Brendan
 
 
 
  On Tue, Aug 6, 2013 at 10:55 AM, Brendan Grainger 
  brendan.grain...@gmail.com wrote:
 
  Hi All,
 
  First of all, what I was actually trying to do is actually get a little
  space back. So if there is a better way to do this by adjusting the
  MergePolicy or something else please let me know. My index is currently
  200Gb. In the past (Solr 1.4) we've found that optimizing the index will
  double the size of the index temporarily then usually when it's done we
 end
  up with a smaller index and slightly faster search query times.
 
  Should I even bother optimizing? My impression was that with the
  TieredMergePolicy this would be less necessary. Would merging segments
 into
  larger ones save any space and if so is there a way to tell solr to do
 that?
 
  Thanks
  Brendan
 
 
 
 
  --
  Brendan Grainger
  www.kuripai.com
 



 --
 Brendan Grainger
 www.kuripai.com



Re: 'Optimizing' Solr Index Size

2013-08-07 Thread Brendan Grainger
Thanks Erick,  our index is relatively static. I think the deletes must be
coming from 'reindexing' the same documents so definitely handy to recover
the space. I've seen that video before. Definitely very interesting.

Brendan


On Wed, Aug 7, 2013 at 8:04 AM, Erick Erickson erickerick...@gmail.comwrote:

 The general advice is to not merge (optimize) unless your
 index is relatively static. You're quite correct, optimizing
 simply recovers the space from deleted documents, otherwise
 it won't change much (except having fewer segments).

 Here's a _great_ video that Mike McCandless put together:

 http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

 But in general _whenever_ segments are merged, the
 resulting segment will have all the data from deleted docs
 removed, and segments are merged continually when
 data is being added to the index.

 Quick-n-dirty way to estimate the space savings
 optimize will give you. Look at the admin page for the core and
 the ratio of deleted docs to numDocs is about the unused
 space that would be regained by an optimize. From there it's
 your call G...

 Best
 Erick


 On Tue, Aug 6, 2013 at 12:02 PM, Brendan Grainger 
 brendan.grain...@gmail.com wrote:

  To maybe answer another one of my questions about the 50Gb recovered when
  running:
 
  curl '
 
 
 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
  '
 
  It looks to me that it was from deleted docs being completely removed
 from
  the index.
 
  Thanks
 
 
 
  On Tue, Aug 6, 2013 at 11:45 AM, Brendan Grainger 
  brendan.grain...@gmail.com wrote:
 
   Well, I guess I can answer one of my questions which I didn't exactly
   explicitly state, which is: how do I force solr to merge segments to a
   given maximum. I forgot about doing this:
  
   curl '
  
 
 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
   '
  
   which reduced the number of segments in my index from 12 to 10.
  Amazingly,
   it also reduced the space used by almost 50Gb. Is that even possible?
  
   Thanks again
   Brendan
  
  
  
   On Tue, Aug 6, 2013 at 10:55 AM, Brendan Grainger 
   brendan.grain...@gmail.com wrote:
  
   Hi All,
  
   First of all, what I was actually trying to do is actually get a
 little
   space back. So if there is a better way to do this by adjusting the
   MergePolicy or something else please let me know. My index is
 currently
   200Gb. In the past (Solr 1.4) we've found that optimizing the index
 will
   double the size of the index temporarily then usually when it's done
 we
  end
   up with a smaller index and slightly faster search query times.
  
   Should I even bother optimizing? My impression was that with the
   TieredMergePolicy this would be less necessary. Would merging segments
  into
   larger ones save any space and if so is there a way to tell solr to do
  that?
  
   Thanks
   Brendan
  
  
  
  
   --
   Brendan Grainger
   www.kuripai.com
  
 
 
 
  --
  Brendan Grainger
  www.kuripai.com
 




-- 
Brendan Grainger
www.kuripai.com


Re: 'Optimizing' Solr Index Size

2013-08-06 Thread Brendan Grainger
Well, I guess I can answer one of my questions which I didn't exactly
explicitly state, which is: how do I force solr to merge segments to a
given maximum. I forgot about doing this:

curl '
http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
'

which reduced the number of segments in my index from 12 to 10. Amazingly,
it also reduced the space used by almost 50Gb. Is that even possible?

Thanks again
Brendan



On Tue, Aug 6, 2013 at 10:55 AM, Brendan Grainger 
brendan.grain...@gmail.com wrote:

 Hi All,

 First of all, what I was actually trying to do is actually get a little
 space back. So if there is a better way to do this by adjusting the
 MergePolicy or something else please let me know. My index is currently
 200Gb. In the past (Solr 1.4) we've found that optimizing the index will
 double the size of the index temporarily then usually when it's done we end
 up with a smaller index and slightly faster search query times.

 Should I even bother optimizing? My impression was that with the
 TieredMergePolicy this would be less necessary. Would merging segments into
 larger ones save any space and if so is there a way to tell solr to do that?

 Thanks
 Brendan




-- 
Brendan Grainger
www.kuripai.com


Re: 'Optimizing' Solr Index Size

2013-08-06 Thread Brendan Grainger
To maybe answer another one of my questions about the 50Gb recovered when
running:

curl '
http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
'

It looks to me that it was from deleted docs being completely removed from
the index.

Thanks



On Tue, Aug 6, 2013 at 11:45 AM, Brendan Grainger 
brendan.grain...@gmail.com wrote:

 Well, I guess I can answer one of my questions which I didn't exactly
 explicitly state, which is: how do I force solr to merge segments to a
 given maximum. I forgot about doing this:

 curl '
 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
 '

 which reduced the number of segments in my index from 12 to 10. Amazingly,
 it also reduced the space used by almost 50Gb. Is that even possible?

 Thanks again
 Brendan



 On Tue, Aug 6, 2013 at 10:55 AM, Brendan Grainger 
 brendan.grain...@gmail.com wrote:

 Hi All,

 First of all, what I was actually trying to do is actually get a little
 space back. So if there is a better way to do this by adjusting the
 MergePolicy or something else please let me know. My index is currently
 200Gb. In the past (Solr 1.4) we've found that optimizing the index will
 double the size of the index temporarily then usually when it's done we end
 up with a smaller index and slightly faster search query times.

 Should I even bother optimizing? My impression was that with the
 TieredMergePolicy this would be less necessary. Would merging segments into
 larger ones save any space and if so is there a way to tell solr to do that?

 Thanks
 Brendan




 --
 Brendan Grainger
 www.kuripai.com




-- 
Brendan Grainger
www.kuripai.com