So the “boost files” are files for an external file field?

Ages ago I accidentally put search indexes on an NFS volume. Solr was 100X 
slower.

wunder
Walter Underwood
[email protected]
http://observer.wunderwood.org/  (my blog)

> On Oct 27, 2021, at 7:37 AM, Dominic Humphries <[email protected]> 
> wrote:
> 
> At last, I think we've got it!
> 
> Our external boost files live on an NFS volume so they can be updated once
> by a worker machine and all the followers will get the update. Which is all
> very nice.
> 
> But if we instead source those files from the local filesystem instead of
> one mounted from the network, the performance issue goes away!
> 
> I've tested this manually and it looks good; I'm now in the process of
> updating our terraform etc so the instances will be able to use local
> copies of these files. Assuming the update works, the matter will finally
> be fixed!
> 
> So the reason we were seeing performance issues was that we were using
> NFS-mounted external files to update our boosts - which is probably
> edge-case enough to be why nobody else was reporting it!
> 
> I'll update one last time to confirm all is well with the new images, and
> hopefully this issue can be put to bed at last.
> 
> Thanks all for your help!
> 
> Dominic
> 
> On Tue, 26 Oct 2021 at 15:31, Dominic Humphries <[email protected]> wrote:
> 
>> No problem, I've been trying to get my head around how it all works myself!
>> 
>> As per
>> https://solr.apache.org/guide/8_9/working-with-external-files-and-processes.html
>> our schema defines a field type:
>>    <fieldType name="fileboost" keyField="id" defVal="1" stored="false"
>> indexed="false" class="solr.ExternalFileField"/>
>> which is then used to define a field:
>>    <field name="boostvalue" type="fileboost"/>
>> which pulls data from a file, external_boostvalue, living
>> in $SOLR_HOME/data
>> 
>> This is used to set a boost value that increases the visibility of some
>> search results.
>> 
>> Setting this file to be empty completely removes the performance hit we
>> see taking several minutes to resolve after each replication. But we do
>> need the functionality still, and I'm unclear on why this is an issue for
>> 8.9 when it wasn't for 8.3
>> 
>> Hope this clarifies the problem!
>> 
>> Dominic
>> 
>> On Mon, 25 Oct 2021 at 19:03, Charlie Hull <
>> [email protected]> wrote:
>> 
>>> Hi Dominic,
>>> 
>>> Could you clarify what you mean by boost files in this context? Just
>>> curious....
>>> 
>>> Charlie
>>> 
>>> On 25/10/2021 17:11, Dominic Humphries wrote:
>>>> Performance with the replica pulling from 8.3.1 was actually worse. And
>>>> looking at the data in the databases and the boost file contents, I'm
>>>> dubious it's a problem of incompatible boost files. I think the
>>> performance
>>>> of importing/applying the boosts really is what's responsible for the
>>> issue
>>>> we see. Not sure what else to test to verify or disprove this..
>>>> 
>>>> On Mon, 25 Oct 2021 at 14:56, Dominic Humphries <[email protected]>
>>> wrote:
>>>> 
>>>>> I think I found it!
>>>>> 
>>>>> I didn't realise, but we have boost files for the core I'm testing and
>>> the
>>>>> boost is applied after replication! Setting the contents of the files
>>> to
>>>>> empty completely removes the post-replication performance problem we
>>> were
>>>>> seeing.
>>>>> 
>>>>> So now my question becomes "Why is boosting taking so much longer for
>>> the
>>>>> upgrade?"
>>>>> 
>>>>> Since the upgrade has its own independent set of data, I'm wondering if
>>>>> it's as simple as the IDs it's trying to boost don't exist and it takes
>>>>> longer to find out an item is missing than it does to find one that
>>> does? I
>>>>> believe I can point an 8.9.0 follower at an 8.3.1 leader, that seems
>>> like
>>>>> the next logical step - if there's no performance hit when it has the
>>> same
>>>>> data as the 8.3.1 replica, then that's almost certainly the problem.
>>>>> 
>>>>> Fingers crossed!
>>>>> 
>>>>> On Sun, 24 Oct 2021 at 10:26, Deepak Goel <[email protected]> wrote:
>>>>> 
>>>>>> There could be some testing and cooling happening post-replication.
>>> will
>>>>>> have to dig a bit more into the code.
>>>>>> 
>>>>>> Deepak
>>>>>> "The greatness of a nation can be judged by the way its animals are
>>>>>> treated
>>>>>> - Mahatma Gandhi"
>>>>>> 
>>>>>> +91 73500 12833
>>>>>> [email protected]
>>>>>> 
>>>>>> Facebook: https://www.facebook.com/deicool
>>>>>> LinkedIn: www.linkedin.com/in/deicool
>>>>>> 
>>>>>> "Plant a Tree, Go Green"
>>>>>> 
>>>>>> Make In India : http://www.makeinindia.com/home
>>>>>> 
>>>>>> 
>>>>>> On Thu, Oct 21, 2021 at 9:57 PM Dominic Humphries
>>>>>> <[email protected]> wrote:
>>>>>> 
>>>>>>> One more tidbit: I just tried leaving replication off for a few hours
>>>>>> and
>>>>>>> then triggering a "big" replication run so I could see the distinct
>>>>>> stages.
>>>>>>> 
>>>>>>>    - Beginning replication didn't cause any performance degradation.
>>>>>>>    - Several minutes of downloading the replication files saw no
>>>>>>> degradation
>>>>>>>    - Only after downloading had completed did we start to see
>>>>>> performance
>>>>>>>    issues in our tests
>>>>>>>    - But we saw the "number of docs/timestamp of latest file" both
>>> jump
>>>>>>>    almost immediately after downloading completed and never move
>>> again
>>>>>>>    - But the performance degradation continued for about seven more
>>>>>> minutes
>>>>>>>    even though replication was clearly finished at this point
>>>>>>> 
>>>>>>> 
>>>>>>> Is there some kind of re-indexing optimization thing that solr can
>>> run
>>>>>>> post-replication? At this point it's about my only remaining
>>> suspect..
>>>>>>> 
>>> 
>>> --
>>> Charlie Hull - Managing Consultant at OpenSource Connections Limited
>>> <www.o19s.com>
>>> Founding member of The Search Network <https://thesearchnetwork.com/>
>>> and co-author of Searching the Enterprise
>>> <https://opensourceconnections.com/about-us/books-resources/>
>>> tel/fax: +44 (0)8700 118334
>>> mobile: +44 (0)7767 825828
>>> 
>>> OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
>>> Amtsgericht Charlottenburg | HRB 230712 B
>>> Geschäftsführer: John M. Woodell | David E. Pugh
>>> Finanzamt: Berlin Finanzamt für Körperschaften II
>>> 
>> 

Reply via email to