Phillip Farber wrote:
> I am trying to automate a build process that adds documents to 10
> shards over 5 machines and need to limit the size of a shard to no
> more than 200GB because I only have 400GB of disk available to
> optimize a given shard.
>
> Why does the size (du) of an index typically decrease after a commit? 
> I've observed a decrease in size of as much as from 296GB down to
> 151GB or as little as from 183GB to 182GB.  Is that size after a
> commit close to the size the index would be after an optimize?  
Likely. Until you commit or close the Writer, the unoptimized index is
the "live" index. And then you also have the optimized index. Once you
commit and make the optimized index the "live" index, the unoptimized
index can be removed (depending on your delete policy, which by default
only keeps the latest commit point).
> For that matter, are there cases where optimization can take more than
> 2x?  I've heard of cases but have not observed them in my system.  I
> only do adds to the shards, never query them. An LVM snapshot of the
> shard receives the queries.
There are cases where it takes over 2x - but they involve using reopen.
If you have more than one Reader on the index, and only reopen some of
them, the new Readers created can hold open the partially optimized
segments that existed at that moment, creating a need for greater than 2x.
>
> Is doing a commit before I take a du a reliable way to gauge the size
> of the shard?  It is really bad news to allow a shard to go over 200GB
> in my use case.  How do others manage this problem of 2x space needed
> to optimize with "limited" dosk space?
Get more disk space ;) Or don't optimize. A lower mergefactor can make
optimizations less necessary.
>
> Advice greatly appreciated.
>
> Phil
>


-- 
- Mark

http://www.lucidimagination.com



Reply via email to