Brett, it’s probably because you hit the 5g default segment size limit on Solr and in order to merge segments a huge number of the docs within the segment must be marked as deleted. So even if large amounts of docs are deleted docs within the segment, the segment is still there, happily taking up space. That could theoretically be a reason for a optimize, but you’d want to specify maxsegments with the goal of not merging to a single segment for the entire index. Ideally you should just keep as many of the logs as you actually use (which is hopefully more limited than what you are keeping). Since the segments will be somewhat time based they would eventually disappear/merge through time, hopefully negating any reason to consider having to optimize
Greg On Tue, Aug 13, 2019 at 3:31 PM Moyer, Brett <bmo...@tiaa.org> wrote: > Turns out this is due to a job that indexes logs. We were able to clear > some with another job. We are working through the value of these indexed > logs. Thanks for all your help! > > Brett Moyer > Manager, Sr. Technical Lead | TFS Technology > Public Production Support > Digital Search & Discovery > > 8625 Andrew Carnegie Blvd | 4th floor > Charlotte, NC 28263 > Tel: 704.988.4508 > Fax: 704.988.4907 > bmo...@tiaa.org > > -----Original Message----- > From: Shawn Heisey <apa...@elyograg.org> > Sent: Friday, August 9, 2019 2:25 PM > To: solr-user@lucene.apache.org > Subject: Re: Indexed Data Size > > On 8/9/2019 12:17 PM, Moyer, Brett wrote: > > The biggest is /data/solr/system_logs_shard1_replica_n1/data/index, > files with the extensions I stated previously. Each is 5gb and there are a > few hundred. Dated by to last 3 months. I don’t understand why there are so > many files with such small indexes. Not sure how to clean them up. > > Can you get a screenshot of the core overview for that particular core? > Solr should correctly calculate the size on the overview based on what > files are actually in the index directory. > > Thanks, > Shawn > ************************************************************************* > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender > immediately and then delete it. > > TIAA > ************************************************************************* >