Thanks! We update each index nightly, we don’t clear, but bring in New and Deltas, delete expired/404. All our data are basically webpages, so none are very large. Some PDFs but again not too large. We are running Solr 7.5, hopefully you can access the links.
https://www.dropbox.com/s/lzd6hkoikhagujs/CoreOne.png?dl=0 https://www.dropbox.com/s/ae6rayb38q39u9c/CoreTwo.png?dl=0 Brett -----Original Message----- From: Erick Erickson <erickerick...@gmail.com> Sent: Thursday, August 8, 2019 5:49 PM To: solr-user@lucene.apache.org Subject: Re: Indexed Data Size On the surface, this makes no sense at all, so there’s something I don’t understand here ;). How often do you update your index? Having files from a long time ago is perfectly reasonable if you’re not updating regularly. But your statement that some of these are huge for just a 50K document index is odd unless they’re _huge_ documents. I wouldn’t optimize, unless you’re on Solr 7.5+ as that’ll create a single segment, see: https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/ and https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ The extensions you mentioned are perfectly reasonable. Each segment is made up of multiple files. .fdt for instance contains stored data. See: https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/codecs/lucene62/package-summary.html Can you give us a long listing of one of your index directories? Best, Erick > On Aug 8, 2019, at 5:17 PM, Moyer, Brett <bmo...@tiaa.org> wrote: > > In our data/solr/<shard_replica>/data/index on the filesystem, we have files > that go back 1 year. I don’t understand why and I doubt they are in use. > Files with extensions like fdx,cfe,doc,pos,tip,dvm etc. Some of these are > very large and running us out of server space. Our search indexes themselves > are not large, in total we might have 50k documents. How can I reduce this > /data/solr space? Is this what the Solr Optimize command is for? Thanks! > > Brett > > ********************************************************************** > *** This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender immediately > and then delete it. > > TIAA > ********************************************************************** > *** ************************************************************************* This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA *************************************************************************