Sorry, I meant I am trying to reduce the index size... I am not using the
index optimize feature at this point.

Experiment one:
Index document of size ~10KB for only once. Total index size in multiple
shards ~117KB

Experiment two:
Index document of size ~10KB for 10,000 times. Total index size in multiple
shards ~250MB

I am assuming that the terms (keys) in the inverted index wouldn't increase
by indexing the same document multiple times. Therefore I would expect the
increase in index size would be minimal compared to indexing a totally
different document. Can you tell me what I am missing?

On Tue, May 18, 2021 at 12:48 PM Dave <[email protected]> wrote:

> At a certain point the index size doesn’t matter. When you re index a
> document you do not delete the actual residing document, you mark it as
> deleted and add on the replacement.  An optimize is what removes the marked
> deleted files, but an optimize is really no longer a recommended process
> since solr is very good at merging as well as the fact disk is
> inexpensive.  The reason the index increased in guessing is that even
> though it’s only indexed, that data is still stored and of course
> duplicated.  If it’s performance has not been adversely effected I would
> not ever run the optimize command. I’ve pushed an index that is naturally
> 450gb all the way to 800gb+ and it ran great, assuming you have the disk
> space available
>
> > On May 18, 2021, at 12:37 PM, Kudrettin Güleryüz <[email protected]>
> wrote:
> >
> > Hello,
> >
> > Experimenting with optimizing the index size.
> >
> > Can you help me understand why indexing but not storing a file 10,000
> > increases the index size by 2,500 times? 7.3 here. Schema and all other
> > conditions are kept constant.
> >
> > Thanks
>

Reply via email to