Dave:

Yeah, every time there’s generic advice, there’s some situations where it’s not 
the best choice ;).

In your situation, you’re trading of some space savings for moving up to 450G 
all at once. Which sounds like it is worthwhile to you, although I’d check perf 
numbers sometime....

You may want to check out expungeDeletes. That will deal only with segments 
with more than 10% deleted docs, and may get you most all of the benefits of 
optimize without the problems. Specifically, let’s say you have a segment right 
at the limit (5G by default) that has exactly one deleted doc. Optimize will 
rewrite that, expungeDeletes will not. It’s an open question whether there’s 
any practical difference, ‘cause if all the segments in your index have > 10% 
deleted documents, they all get rewritten in either case….

And the mechanism for optimize changed pretty significantly in Solr 7.5, the 
short form is that before that the result was a single massive segment, whereas 
after that the default max segment size of 5G is respected by default (although 
you can force to one segment if you take explicit actions).

Here are two articles that explain it all:
Pre Solr 7.4: 
https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/
Post Solr 7.4: 
https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/

Best,
Erick

> On Dec 2, 2020, at 11:05 PM, Dave <hastings.recurs...@gmail.com> wrote:
> 
> I’m going to go against the advice SLIGHTLY, it really depends on how you 
> have things set up as far as your solr server hosting is done. If you’re 
> searching off the same solr server you’re indexing to, yeah don’t ever 
> optimize it will take care of itself, people much smarter than us, like 
> Erick/Walter/Yonik, have spent time on this and if they say don’t do it don't 
> do it. 
> 
> In my particular use case I do see a measured improvement from optimizing 
> every three or four months.  In my case a large portion, over 75% of the 
> documents, which each measure around 500k to 3mg get reindexed every month, 
> as the fields in the documents change every month, while documents are added 
> to it daily as well.  So when I can go from a 650gb index to a 450gb once in 
> a while it makes a difference if I only have 500gb of memory to work with on 
> the searchers and can fit all the segments straight to memory. Also I use the 
> old set up of master slave, so my indexing server, when it’s optimizing has 
> no impact on the searching servers.  Once the optimized index gets warmed 
> back up in the searcher I do notice improvement in my qtimes (I like to 
> think) however I’ve been using my same integration process of occasional hard 
> optimizations since 1.4, and it might just be i like to watch the index 
> inflate three times the size then shrivel up. Old habits die hard. 
> 
>> On Dec 2, 2020, at 10:28 PM, Matheo Software <i...@matheo-software.com> 
>> wrote:
>> 
>> Hi Erick,
>> Hi Walter,
>> 
>> Thanks for these information,
>> 
>> I will learn seriously about the solr article you gave me. 
>> I thought it was important to always delete and optimize collection.
>> 
>> More information concerning my collection,
>> Index size is about 390Go for 130M docs (3-5ko / doc), around 25 fields 
>> (indexed, stored)
>> All Tuesday I do an update of around 1M docs and all Thusday I do an add new 
>> docs (around 50 000). 
>> 
>> Many thanks !
>> 
>> Regards,
>> Bruno
>> 
>> -----Message d'origine-----
>> De : Erick Erickson [mailto:erickerick...@gmail.com] 
>> Envoyé : mercredi 2 décembre 2020 14:07
>> À : solr-user@lucene.apache.org
>> Objet : Re: Solr8.7 - How to optmize my index ?
>> 
>> expungeDeletes is unnecessary, optimize is a superset of expungeDeletes.
>> The key difference is commit=true. I suspect if you’d waited until your 
>> indexing process added another doc and committed, you’d have seen the index 
>> size drop.
>> 
>> Just to check, you send the command to my_core but talk about collections.
>> Specifying the collection is sufficient, but I’ll assume that’s a typo and 
>> you’re really saying my_collection.
>> 
>> I agree with Walter like I always do, you shouldn’t be running optimize 
>> without some proof that it’s helping. About the only time I think it’s 
>> reasonable is when you have a static index, unless you can demonstrate 
>> improved performance. The optimize button was removed precisely because it 
>> was so tempting. In much earlier versions of Lucene, it made a demonstrable 
>> difference so was put front and center. In more recent versions of Solr 
>> optimize doesn’t help nearly as much so it was removed.
>> 
>> You say you have 38M deleted documents. How many documents total? If this is 
>> 50% of your index, that’s one thing. If it’s 5%, it’s certainly not worth 
>> the effort. You’re rewriting 466G of index, if you’re not seeing 
>> demonstrable performance improvements, that’s a lot of wasted effort…
>> 
>> See: https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/
>> and the linked article for what happens in pre 7.5 solr versions.
>> 
>> Best,
>> Erick
>> 
>>> On Dec 1, 2020, at 2:31 PM, Info MatheoSoftware <i...@matheosoftware.com> 
>>> wrote:
>>> 
>>> Hi All,
>>> 
>>> 
>>> 
>>> I found the solution, I must do :
>>> 
>>> curl ‘http://xxxxxxx:8983/solr/my_core/update?
>>> <http://xxxxxxx:8983/solr/my_core/update?optimize=true>
>>> commit=true&expungeDeletes=true’
>>> 
>>> 
>>> 
>>> It works fine
>>> 
>>> 
>>> 
>>> Thanks,
>>> 
>>> Bruno
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> De : Matheo Software [mailto:i...@matheo-software.com] Envoyé : mardi 
>>> 1 décembre 2020 13:28 À : solr-user@lucene.apache.org Objet : Solr8.7 
>>> - How to optmize my index ?
>>> 
>>> 
>>> 
>>> Hi All,
>>> 
>>> 
>>> 
>>> With Solr5.4, I used the UI button but in Solr8.7 UI this button is missing.
>>> 
>>> 
>>> 
>>> So I decide to use the command line:
>>> 
>>> curl http://xxxxxxx:8983/solr/my_core/update?optimize=true
>>> 
>>> 
>>> 
>>> My collection my_core exists of course.
>>> 
>>> 
>>> 
>>> The answer of the command line is:
>>> 
>>> {
>>> 
>>> "responseHeader":{
>>> 
>>>  "status":0,
>>> 
>>>  "QTime":18}
>>> 
>>> }
>>> 
>>> 
>>> 
>>> But nothing change.
>>> 
>>> I always have 38M deleted docs in my collection and directory size no 
>>> change like with solr5.4.
>>> 
>>> The size of the collection stay always at : 466.33Go
>>> 
>>> 
>>> 
>>> Could you tell me how can I purge deleted docs ?
>>> 
>>> 
>>> 
>>> Cordialement, Best Regards
>>> 
>>> Bruno Mannina
>>> 
>>> <http://www.matheo-software.com> www.matheo-software.com
>>> 
>>> <http://www.patent-pulse.com> www.patent-pulse.com
>>> 
>>> Tél. +33 0 970 738 743
>>> 
>>> Mob. +33 0 634 421 817
>>> 
>>> <https://www.facebook.com/PatentPulse> facebook (1) 
>>> <https://twitter.com/matheosoftware> 1425551717 
>>> <https://www.linkedin.com/company/matheo-software> 1425551737 
>>> <https://www.youtube.com/user/MatheoSoftware> 1425551760
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _____
>>> 
>>> 
>>> <https://www.avast.com/antivirus> Avast logo
>>> 
>>> L'absence de virus dans ce courrier électronique a été vérifiée par le 
>>> logiciel antivirus Avast.
>>> www.avast.com <https://www.avast.com/antivirus>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> L'absence de virus dans ce courrier électronique a été vérifiée par le 
>>> logiciel antivirus Avast.
>>> https://www.avast.com/antivirus
>> 
>> 
>> -- 
>> L'absence de virus dans ce courrier électronique a été vérifiée par le 
>> logiciel antivirus Avast.
>> https://www.avast.com/antivirus
>> 

Reply via email to