Very good question, for which I currently have no answer! On Wed, 27 Oct 2021 at 17:15, Deepak Goel <[email protected]> wrote:
> Why wouldnt the performance hit not happen for 8.3.1? > > On Wed, 27 Oct 2021, 20:07 Dominic Humphries, <[email protected]> > wrote: > > > At last, I think we've got it! > > > > Our external boost files live on an NFS volume so they can be updated > once > > by a worker machine and all the followers will get the update. Which is > all > > very nice. > > > > But if we instead source those files from the local filesystem instead of > > one mounted from the network, the performance issue goes away! > > > > I've tested this manually and it looks good; I'm now in the process of > > updating our terraform etc so the instances will be able to use local > > copies of these files. Assuming the update works, the matter will finally > > be fixed! > > > > So the reason we were seeing performance issues was that we were using > > NFS-mounted external files to update our boosts - which is probably > > edge-case enough to be why nobody else was reporting it! > > > > I'll update one last time to confirm all is well with the new images, and > > hopefully this issue can be put to bed at last. > > > > Thanks all for your help! > > > > Dominic > > > > On Tue, 26 Oct 2021 at 15:31, Dominic Humphries <[email protected]> > > wrote: > > > > > No problem, I've been trying to get my head around how it all works > > myself! > > > > > > As per > > > > > > https://solr.apache.org/guide/8_9/working-with-external-files-and-processes.html > > > our schema defines a field type: > > > <fieldType name="fileboost" keyField="id" defVal="1" stored="false" > > > indexed="false" class="solr.ExternalFileField"/> > > > which is then used to define a field: > > > <field name="boostvalue" type="fileboost"/> > > > which pulls data from a file, external_boostvalue, living > > > in $SOLR_HOME/data > > > > > > This is used to set a boost value that increases the visibility of some > > > search results. > > > > > > Setting this file to be empty completely removes the performance hit we > > > see taking several minutes to resolve after each replication. But we do > > > need the functionality still, and I'm unclear on why this is an issue > for > > > 8.9 when it wasn't for 8.3 > > > > > > Hope this clarifies the problem! > > > > > > Dominic > > > > > > On Mon, 25 Oct 2021 at 19:03, Charlie Hull < > > > [email protected]> wrote: > > > > > >> Hi Dominic, > > >> > > >> Could you clarify what you mean by boost files in this context? Just > > >> curious.... > > >> > > >> Charlie > > >> > > >> On 25/10/2021 17:11, Dominic Humphries wrote: > > >> > Performance with the replica pulling from 8.3.1 was actually worse. > > And > > >> > looking at the data in the databases and the boost file contents, > I'm > > >> > dubious it's a problem of incompatible boost files. I think the > > >> performance > > >> > of importing/applying the boosts really is what's responsible for > the > > >> issue > > >> > we see. Not sure what else to test to verify or disprove this.. > > >> > > > >> > On Mon, 25 Oct 2021 at 14:56, Dominic Humphries <[email protected] > > > > >> wrote: > > >> > > > >> >> I think I found it! > > >> >> > > >> >> I didn't realise, but we have boost files for the core I'm testing > > and > > >> the > > >> >> boost is applied after replication! Setting the contents of the > files > > >> to > > >> >> empty completely removes the post-replication performance problem > we > > >> were > > >> >> seeing. > > >> >> > > >> >> So now my question becomes "Why is boosting taking so much longer > for > > >> the > > >> >> upgrade?" > > >> >> > > >> >> Since the upgrade has its own independent set of data, I'm > wondering > > if > > >> >> it's as simple as the IDs it's trying to boost don't exist and it > > takes > > >> >> longer to find out an item is missing than it does to find one that > > >> does? I > > >> >> believe I can point an 8.9.0 follower at an 8.3.1 leader, that > seems > > >> like > > >> >> the next logical step - if there's no performance hit when it has > the > > >> same > > >> >> data as the 8.3.1 replica, then that's almost certainly the > problem. > > >> >> > > >> >> Fingers crossed! > > >> >> > > >> >> On Sun, 24 Oct 2021 at 10:26, Deepak Goel <[email protected]> > wrote: > > >> >> > > >> >>> There could be some testing and cooling happening > post-replication. > > >> will > > >> >>> have to dig a bit more into the code. > > >> >>> > > >> >>> Deepak > > >> >>> "The greatness of a nation can be judged by the way its animals > are > > >> >>> treated > > >> >>> - Mahatma Gandhi" > > >> >>> > > >> >>> +91 73500 12833 > > >> >>> [email protected] > > >> >>> > > >> >>> Facebook: https://www.facebook.com/deicool > > >> >>> LinkedIn: www.linkedin.com/in/deicool > > >> >>> > > >> >>> "Plant a Tree, Go Green" > > >> >>> > > >> >>> Make In India : http://www.makeinindia.com/home > > >> >>> > > >> >>> > > >> >>> On Thu, Oct 21, 2021 at 9:57 PM Dominic Humphries > > >> >>> <[email protected]> wrote: > > >> >>> > > >> >>>> One more tidbit: I just tried leaving replication off for a few > > hours > > >> >>> and > > >> >>>> then triggering a "big" replication run so I could see the > distinct > > >> >>> stages. > > >> >>>> > > >> >>>> - Beginning replication didn't cause any performance > > degradation. > > >> >>>> - Several minutes of downloading the replication files saw no > > >> >>>> degradation > > >> >>>> - Only after downloading had completed did we start to see > > >> >>> performance > > >> >>>> issues in our tests > > >> >>>> - But we saw the "number of docs/timestamp of latest file" > both > > >> jump > > >> >>>> almost immediately after downloading completed and never move > > >> again > > >> >>>> - But the performance degradation continued for about seven > > more > > >> >>> minutes > > >> >>>> even though replication was clearly finished at this point > > >> >>>> > > >> >>>> > > >> >>>> Is there some kind of re-indexing optimization thing that solr > can > > >> run > > >> >>>> post-replication? At this point it's about my only remaining > > >> suspect.. > > >> >>>> > > >> > > >> -- > > >> Charlie Hull - Managing Consultant at OpenSource Connections Limited > > >> <www.o19s.com> > > >> Founding member of The Search Network <https://thesearchnetwork.com/> > > >> and co-author of Searching the Enterprise > > >> <https://opensourceconnections.com/about-us/books-resources/> > > >> tel/fax: +44 (0)8700 118334 > > >> mobile: +44 (0)7767 825828 > > >> > > >> OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin > > >> Amtsgericht Charlottenburg | HRB 230712 B > > >> Geschäftsführer: John M. Woodell | David E. Pugh > > >> Finanzamt: Berlin Finanzamt für Körperschaften II > > >> > > > > > >
