In my experience, disks are not always cheap :) Running in AWS I have found several contexts which require local storage for cost effective performance of SOLR, but that does require scaling the instance as a whole to increase capacity (hence the particular motivation for this question).
Generally the use cases I am considering don't re-index the whole index inplace, but rather I have used an A/B strategy to stand up a parallel cluster and index to that the cut over using some other method (aliases or DNS draining depending on the context). So as far as a re-indexing operation is concerned, this seems controllable by favoring certain methodologies. The merging considerations are certainly interesting and naunced. Has there been any investigation into a "minimum number of segments" setting which could force a minimum number of segments (say 5 or 10) so that no one segment operation could involve the entire index? On Tue, Jun 22, 2021 at 1:37 PM Dave <[email protected]> wrote: > The 3x index size has been around for a long time. Usually it’s for a full > optimize. When this happens the original index stays in place, 1x, and is > being reconstructed, 2x, then merged into the replacement 3x, once it’s all > done you are back to less than 1x but you need the space or the optimize > will fail. The new rules are that you never optimize but you will always > want that extra space just in case, and disks are cheap, > > > On Jun 22, 2021, at 4:24 PM, Stephen Lewis Bianamara < > [email protected]> wrote: > > > > Thanks Shawn! That is really helpful to know. Can you say more about > what > > circumstance might cause an index to triple in size? Is it connected with > > bulk operations like "optimize" which can be avoided, or is it inherent > to > > situations like merging segments? And if so, can this requirement be > > adjusted by an appropriate setting of maxMergedSegmentMB or something > > similar? > > > > I guess I'm wondering if there is any info or references I could look at > to > > determine what the limit should be for a given case even if the general > > guidance is that 3x is needed. > > > > Thanks! > > > >> On Tue, Jun 22, 2021 at 1:05 PM Shawn Heisey <[email protected]> > wrote: > >> > >>> On 6/22/2021 11:45 AM, Stephen Lewis Bianamara wrote: > >>> However, SOLR 8 looks to have a different behavior wherein the index is > >>> perhaps updated in place, and thus a 100GB / shard index might only > need > >> a > >>> bit more headroom (call it 110GB say). Is this always the case with > >>> recovery on SOLR 8+? Or are there some situations where you might need > >>> 200GB for the recovery? > >> > >> > >> The general recommendation, for normal operation and not just recovery, > >> is to ensure you have enough space available so that the index can > >> triple in size temporarily. The 3x requirement only comes about with a > >> very specific set of circumstances involving reindexing in-place on an > >> existing index -- for MOST usage, you want enough space for the index to > >> double in size temporarily. But because we cannot be sure how you are > >> going to use Solr, we always err on the side of caution and tell people > >> the index could triple in size before it goes back down. > >> > >> Thanks, > >> Shawn > >> > >> >
