Ha! Searching "partial optimize" on
http://www.lucidimagination.com/search , we discover SOLR-603 which
gives the 'maxSegments' option to the <optimize> command. The text
does not include the word 'partial'.

It's on http://wiki.apache.org/solr/UpdateXmlMessages. The command
gives a number of Lucene segments, and I have no idea how this will
translate to disk space. To minimize disk space, you could run it
repetitively with the number of segments decreasing to one.

On Thu, Oct 1, 2009 at 11:49 AM, Lance Norskog <goks...@gmail.com> wrote:
> I've heard there is a new "partial optimize" feature in Lucene, but it
> is not mentioned in the Solr or Lucene wikis so I cannot advise you
> how to use it.
>
> On a previous project we had a 500GB index for 450m documents. It took
> 14 hours to optimize. We found that Solr worked well (given enough RAM
> for sorting and faceting requests) but that the IT logistics of a 500G
> fileset were too much.
>
> Also, if you want your query servers to continue serving while
> propogating the newly optimized index, you need 2X space to store both
> copies on the slave during the transfer. For us this 35 minutes over
> 1G ethernet.
>
> On Thu, Oct 1, 2009 at 7:36 AM, Walter Underwood <wun...@wunderwood.org> 
> wrote:
>> I've now worked on three different search engines and they all have a 3X
>> worst
>> case on space, so I'm familiar with this case. --wunder
>>
>> On Oct 1, 2009, at 7:15 AM, Mark Miller wrote:
>>
>>> Nice one ;) Its not technically a case where optimize requires > 2x
>>> though in case the user asking gets confused. Its a case unrelated to
>>> optimize that can grow your index. Then you need < 2x for the optimize,
>>> since you won't copy the deletes.
>>>
>>> It also requires that you jump hoops to delete everything. If you delete
>>> everything with *:*, that is smart enough not to just do a delete on
>>> every document - it just creates a new index, allowing the removal of
>>> the old very efficiently.
>>>
>>> Def agree on the more disk space.
>>>
>>> Walter Underwood wrote:
>>>>
>>>> Here is how you need 3X. First, index everything and optimize. Then
>>>> delete everything and reindex without any merges.
>>>>
>>>> You have one full-size index containing only deleted docs, one
>>>> full-size index containing reindexed docs, and need that much space
>>>> for a third index.
>>>>
>>>> Honestly, disk is cheap, and there is no way to make Lucene work
>>>> reliably with less disk. 1TB is a few hundred dollars. You have a free
>>>> search engine, buy some disk.
>>>>
>>>> wunder
>>>>
>>>> On Oct 1, 2009, at 6:25 AM, Grant Ingersoll wrote:
>>>>
>>>>>> 151GB or as little as from 183GB to 182GB.  Is that size after a
>>>>>> commit close to the size the index would be after an optimize?  For
>>>>>> that matter, are there cases where optimization can take more than
>>>>>> 2x?  I've heard of cases but have not observed them in my system.
>>>>>
>>>>> I seem to recall a case where it can be 3x, but I don't know that it
>>>>> has been observed much.
>>>>
>>>
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>
>>
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to