Re: Facet Performance

Erick Erickson Tue, 16 Jun 2020 07:30:15 -0700
Did you try the autowarming like I mentioned in my previous e-mail?

> On Jun 16, 2020, at 10:18 AM, James Bodkin <james.bod...@loveholidays.com> 
> wrote:
> 
> We've changed the schema to enable docValues for these fields and this led to 
> an improvement in the response time. We found a further improvement by also 
> switching off indexed as these fields are used for faceting and filtering 
> only.
> Since those changes, we've found that the first-execution for queries is 
> really noticeable. I thought this would be the filterCache based on what I 
> saw in NewRelic however it is probably trying to read the docValues from 
> disk. How can we use the autowarming to improve this?
> 
> For example, I've run the following queries in sequence and each query has a 
> first-execution penalty.
> 
> Query 1:
> 
> q=*:*
> facet=true
> facet.field=D_DepartureAirport
> facet.field=D_Destination
> facet.limit=-1
> rows=0
> 
> Query 2:
> 
> q=*:*
> fq=D_DepartureAirport:(2660) 
> facet=true
> facet.field=D_Destination
> facet.limit=-1
> rows=0
> 
> Query 3:
> 
> q=*:*
> fq=D_DepartureAirport:(2661)
> facet=true
> facet.field=D_Destination
> facet.limit=-1
> rows=0
> 
> Query 4:
> 
> q=*:*
> fq=D_DepartureAirport:(2660+OR+2661)
> facet=true
> facet.field=D_Destination
> facet.limit=-1
> rows=0
> 
> We've kept the field type as a string, as the value is mapped by application 
> that accesses Solr. In the examples above, the values are mapped to airports 
> and destinations.
> Is it possible to prewarm the above queries without having to define all the 
> potential filters manually in the auto warming?
> 
> At the moment, we update and optimise our index in a different environment 
> and then copy the index to our production instances by using a rolling 
> deployment in Kubernetes.
> 
> Kind Regards,
> 
> James Bodkin
> 
> On 12/06/2020, 18:58, "Erick Erickson" <erickerick...@gmail.com> wrote:
> 
>    I question whether fiterCache has anything to do with it, I suspect what’s 
> really happening is that first time you’re reading the relevant bits from 
> disk into memory. And to double check you should have docVaues enabled for 
> all these fields. The “uninverting” process  can be very expensive, and 
> docValues bypasses that.
> 
>    As of Solr 7.6, you can define “uninvertible=true” to your field(Type) to 
> “fail fast” if Solr needs to uninvert the field.
> 
>    But that’s an aside. In either case, my claim is that first-time execution 
> does “something”, either reads the serialized docValues from disk or 
> uninverts the file on Solr’s heap.
> 
>    You can have this autowarmed by any combination of
>    1> specifying an autowarm count on your queryResultCache. That’s hit or 
> miss, as it replays the most recent N queries which may or may not contain 
> the sorts. That said, specifying 10-20 for autowarm count is usually a good 
> idea, assuming you’re not committing more than, say, every 30 seconds. I’d 
> add the same to filterCache too.
> 
>    2> specifying a newSearcher or firstSearcher query in solrconfig.xml. The 
> difference is that newSearcher is fired every time a commit happens, while 
> firstSearcher is only fired when Solr starts, the theory being that there’s 
> no cache autowarming available when Solr fist powers up. Usually, people 
> don’t bother with firstSearcher or just make it the same as newSearcher. Note 
> that a query doesn’t have to be “real” at all. You can just add all the facet 
> fields to a *:* query in a single go.
> 
>    BTW, Trie fields will stay around for a long time even though deprecated. 
> Or at least until we find something to replace them with that doesn’t have 
> this penalty, so I’d feel pretty safe using those and they’ll be more 
> efficient than strings.
> 
>    Best,
>    Erick
>
Re: Facet Performance

Reply via email to