So the “boost files” are files for an external file field? Ages ago I accidentally put search indexes on an NFS volume. Solr was 100X slower.
wunder Walter Underwood [email protected] http://observer.wunderwood.org/ (my blog) > On Oct 27, 2021, at 7:37 AM, Dominic Humphries <[email protected]> > wrote: > > At last, I think we've got it! > > Our external boost files live on an NFS volume so they can be updated once > by a worker machine and all the followers will get the update. Which is all > very nice. > > But if we instead source those files from the local filesystem instead of > one mounted from the network, the performance issue goes away! > > I've tested this manually and it looks good; I'm now in the process of > updating our terraform etc so the instances will be able to use local > copies of these files. Assuming the update works, the matter will finally > be fixed! > > So the reason we were seeing performance issues was that we were using > NFS-mounted external files to update our boosts - which is probably > edge-case enough to be why nobody else was reporting it! > > I'll update one last time to confirm all is well with the new images, and > hopefully this issue can be put to bed at last. > > Thanks all for your help! > > Dominic > > On Tue, 26 Oct 2021 at 15:31, Dominic Humphries <[email protected]> wrote: > >> No problem, I've been trying to get my head around how it all works myself! >> >> As per >> https://solr.apache.org/guide/8_9/working-with-external-files-and-processes.html >> our schema defines a field type: >> <fieldType name="fileboost" keyField="id" defVal="1" stored="false" >> indexed="false" class="solr.ExternalFileField"/> >> which is then used to define a field: >> <field name="boostvalue" type="fileboost"/> >> which pulls data from a file, external_boostvalue, living >> in $SOLR_HOME/data >> >> This is used to set a boost value that increases the visibility of some >> search results. >> >> Setting this file to be empty completely removes the performance hit we >> see taking several minutes to resolve after each replication. But we do >> need the functionality still, and I'm unclear on why this is an issue for >> 8.9 when it wasn't for 8.3 >> >> Hope this clarifies the problem! >> >> Dominic >> >> On Mon, 25 Oct 2021 at 19:03, Charlie Hull < >> [email protected]> wrote: >> >>> Hi Dominic, >>> >>> Could you clarify what you mean by boost files in this context? Just >>> curious.... >>> >>> Charlie >>> >>> On 25/10/2021 17:11, Dominic Humphries wrote: >>>> Performance with the replica pulling from 8.3.1 was actually worse. And >>>> looking at the data in the databases and the boost file contents, I'm >>>> dubious it's a problem of incompatible boost files. I think the >>> performance >>>> of importing/applying the boosts really is what's responsible for the >>> issue >>>> we see. Not sure what else to test to verify or disprove this.. >>>> >>>> On Mon, 25 Oct 2021 at 14:56, Dominic Humphries <[email protected]> >>> wrote: >>>> >>>>> I think I found it! >>>>> >>>>> I didn't realise, but we have boost files for the core I'm testing and >>> the >>>>> boost is applied after replication! Setting the contents of the files >>> to >>>>> empty completely removes the post-replication performance problem we >>> were >>>>> seeing. >>>>> >>>>> So now my question becomes "Why is boosting taking so much longer for >>> the >>>>> upgrade?" >>>>> >>>>> Since the upgrade has its own independent set of data, I'm wondering if >>>>> it's as simple as the IDs it's trying to boost don't exist and it takes >>>>> longer to find out an item is missing than it does to find one that >>> does? I >>>>> believe I can point an 8.9.0 follower at an 8.3.1 leader, that seems >>> like >>>>> the next logical step - if there's no performance hit when it has the >>> same >>>>> data as the 8.3.1 replica, then that's almost certainly the problem. >>>>> >>>>> Fingers crossed! >>>>> >>>>> On Sun, 24 Oct 2021 at 10:26, Deepak Goel <[email protected]> wrote: >>>>> >>>>>> There could be some testing and cooling happening post-replication. >>> will >>>>>> have to dig a bit more into the code. >>>>>> >>>>>> Deepak >>>>>> "The greatness of a nation can be judged by the way its animals are >>>>>> treated >>>>>> - Mahatma Gandhi" >>>>>> >>>>>> +91 73500 12833 >>>>>> [email protected] >>>>>> >>>>>> Facebook: https://www.facebook.com/deicool >>>>>> LinkedIn: www.linkedin.com/in/deicool >>>>>> >>>>>> "Plant a Tree, Go Green" >>>>>> >>>>>> Make In India : http://www.makeinindia.com/home >>>>>> >>>>>> >>>>>> On Thu, Oct 21, 2021 at 9:57 PM Dominic Humphries >>>>>> <[email protected]> wrote: >>>>>> >>>>>>> One more tidbit: I just tried leaving replication off for a few hours >>>>>> and >>>>>>> then triggering a "big" replication run so I could see the distinct >>>>>> stages. >>>>>>> >>>>>>> - Beginning replication didn't cause any performance degradation. >>>>>>> - Several minutes of downloading the replication files saw no >>>>>>> degradation >>>>>>> - Only after downloading had completed did we start to see >>>>>> performance >>>>>>> issues in our tests >>>>>>> - But we saw the "number of docs/timestamp of latest file" both >>> jump >>>>>>> almost immediately after downloading completed and never move >>> again >>>>>>> - But the performance degradation continued for about seven more >>>>>> minutes >>>>>>> even though replication was clearly finished at this point >>>>>>> >>>>>>> >>>>>>> Is there some kind of re-indexing optimization thing that solr can >>> run >>>>>>> post-replication? At this point it's about my only remaining >>> suspect.. >>>>>>> >>> >>> -- >>> Charlie Hull - Managing Consultant at OpenSource Connections Limited >>> <www.o19s.com> >>> Founding member of The Search Network <https://thesearchnetwork.com/> >>> and co-author of Searching the Enterprise >>> <https://opensourceconnections.com/about-us/books-resources/> >>> tel/fax: +44 (0)8700 118334 >>> mobile: +44 (0)7767 825828 >>> >>> OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin >>> Amtsgericht Charlottenburg | HRB 230712 B >>> Geschäftsführer: John M. Woodell | David E. Pugh >>> Finanzamt: Berlin Finanzamt für Körperschaften II >>> >>
