At a certain point the index size doesn’t matter. When you re index a document you do not delete the actual residing document, you mark it as deleted and add on the replacement. An optimize is what removes the marked deleted files, but an optimize is really no longer a recommended process since solr is very good at merging as well as the fact disk is inexpensive. The reason the index increased in guessing is that even though it’s only indexed, that data is still stored and of course duplicated. If it’s performance has not been adversely effected I would not ever run the optimize command. I’ve pushed an index that is naturally 450gb all the way to 800gb+ and it ran great, assuming you have the disk space available
> On May 18, 2021, at 12:37 PM, Kudrettin Güleryüz <[email protected]> wrote: > > Hello, > > Experimenting with optimizing the index size. > > Can you help me understand why indexing but not storing a file 10,000 > increases the index size by 2,500 times? 7.3 here. Schema and all other > conditions are kept constant. > > Thanks
