In my experience, disks are not always cheap :) Running in AWS I have found
several contexts which require local storage for cost effective performance
of SOLR, but that does require scaling the instance as a whole to increase
capacity (hence the particular motivation for this question).

Generally the use cases I am considering don't re-index the whole index
inplace, but rather I have used an A/B strategy to stand up a parallel
cluster and index to that the cut over using some other method (aliases or
DNS draining depending on the context). So as far as a re-indexing
operation is concerned, this seems controllable by favoring certain
methodologies.

The merging considerations are certainly interesting and naunced. Has there
been any investigation into a "minimum number of segments" setting which
could force a minimum number of segments (say 5 or 10) so that no one
segment operation could involve the entire index?

On Tue, Jun 22, 2021 at 1:37 PM Dave <[email protected]> wrote:

> The 3x index size has been around for a long time. Usually it’s for a full
> optimize.  When this happens the original index stays in place, 1x, and is
> being reconstructed, 2x, then merged into the replacement 3x, once it’s all
> done you are back to less than 1x but you need the space or the optimize
> will fail.  The new rules are that you never optimize but you will always
> want that extra space just in case, and disks are cheap,
>
> > On Jun 22, 2021, at 4:24 PM, Stephen Lewis Bianamara <
> [email protected]> wrote:
> >
> > Thanks Shawn! That is really helpful to know. Can you say more about
> what
> > circumstance might cause an index to triple in size? Is it connected with
> > bulk operations like "optimize" which can be avoided, or is it inherent
> to
> > situations like merging segments? And if so, can this requirement be
> > adjusted by an appropriate setting of maxMergedSegmentMB or something
> > similar?
> >
> > I guess I'm wondering if there is any info or references I could look at
> to
> > determine what the limit should be for a given case even if the general
> > guidance is that 3x is needed.
> >
> > Thanks!
> >
> >> On Tue, Jun 22, 2021 at 1:05 PM Shawn Heisey <[email protected]>
> wrote:
> >>
> >>> On 6/22/2021 11:45 AM, Stephen Lewis Bianamara wrote:
> >>> However, SOLR 8 looks to have a different behavior wherein the index is
> >>> perhaps updated in place, and thus a 100GB / shard index might only
> need
> >> a
> >>> bit more headroom (call it 110GB say). Is this always the case with
> >>> recovery on SOLR 8+? Or are there some situations where you might need
> >>> 200GB for the recovery?
> >>
> >>
> >> The general recommendation, for normal operation and not just recovery,
> >> is to ensure you have enough space available so that the index can
> >> triple in size temporarily.  The 3x requirement only comes about with a
> >> very specific set of circumstances involving reindexing in-place on an
> >> existing index -- for MOST usage, you want enough space for the index to
> >> double in size temporarily. But because we cannot be sure how you are
> >> going to use Solr, we always err on the side of caution and tell people
> >> the index could triple in size before it goes back down.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>

Reply via email to