Brett, it’s probably because you hit the 5g default segment size limit on
Solr and in order to merge segments a huge number of the docs within the
segment must be marked as deleted. So even if large amounts of docs are
deleted docs within the segment, the segment is still there, happily taking
up space. That could theoretically be a reason for a optimize, but you’d
want to specify maxsegments with the goal of not merging to a single
segment for the entire index. Ideally you should just keep as many of the
logs as you actually use (which is hopefully more limited than what you are
keeping). Since the segments will be somewhat time based they would
eventually disappear/merge through time, hopefully negating any reason to
consider having to optimize

Greg

On Tue, Aug 13, 2019 at 3:31 PM Moyer, Brett <bmo...@tiaa.org> wrote:

> Turns out this is due to a job that indexes logs. We were able to clear
> some with another job. We are working through the value of these indexed
> logs. Thanks for all your help!
>
> Brett Moyer
> Manager, Sr. Technical Lead | TFS Technology
>   Public Production Support
>   Digital Search & Discovery
>
> 8625 Andrew Carnegie Blvd | 4th floor
> Charlotte, NC 28263
> Tel: 704.988.4508
> Fax: 704.988.4907
> bmo...@tiaa.org
>
> -----Original Message-----
> From: Shawn Heisey <apa...@elyograg.org>
> Sent: Friday, August 9, 2019 2:25 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Indexed Data Size
>
> On 8/9/2019 12:17 PM, Moyer, Brett wrote:
> > The biggest is /data/solr/system_logs_shard1_replica_n1/data/index,
> files with the extensions I stated previously. Each is 5gb and there are a
> few hundred. Dated by to last 3 months. I don’t understand why there are so
> many files with such small indexes. Not sure how to clean them up.
>
> Can you get a screenshot of the core overview for that particular core?
> Solr should correctly calculate the size on the overview based on what
> files are actually in the index directory.
>
> Thanks,
> Shawn
> *************************************************************************
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender
> immediately and then delete it.
>
> TIAA
> *************************************************************************
>

Reply via email to