Sorry, I meant I am trying to reduce the index size... I am not using the index optimize feature at this point.
Experiment one: Index document of size ~10KB for only once. Total index size in multiple shards ~117KB Experiment two: Index document of size ~10KB for 10,000 times. Total index size in multiple shards ~250MB I am assuming that the terms (keys) in the inverted index wouldn't increase by indexing the same document multiple times. Therefore I would expect the increase in index size would be minimal compared to indexing a totally different document. Can you tell me what I am missing? On Tue, May 18, 2021 at 12:48 PM Dave <[email protected]> wrote: > At a certain point the index size doesn’t matter. When you re index a > document you do not delete the actual residing document, you mark it as > deleted and add on the replacement. An optimize is what removes the marked > deleted files, but an optimize is really no longer a recommended process > since solr is very good at merging as well as the fact disk is > inexpensive. The reason the index increased in guessing is that even > though it’s only indexed, that data is still stored and of course > duplicated. If it’s performance has not been adversely effected I would > not ever run the optimize command. I’ve pushed an index that is naturally > 450gb all the way to 800gb+ and it ran great, assuming you have the disk > space available > > > On May 18, 2021, at 12:37 PM, Kudrettin Güleryüz <[email protected]> > wrote: > > > > Hello, > > > > Experimenting with optimizing the index size. > > > > Can you help me understand why indexing but not storing a file 10,000 > > increases the index size by 2,500 times? 7.3 here. Schema and all other > > conditions are kept constant. > > > > Thanks >
