On top of the already good suggestions to reduce the scope of your
experiment, let's see:

boost:def(boostFieldA,1) // boostFieldA is docValue float type

The first part looks all right to me, it's expensive though, independently
of the number of rows returned (as the boost request parameter is parsed as
an additional query that affects the score).
Enabling doc-values on such a field is probably the best option you have.

In regards to the second part:
bf=mul(termfreq(termScoreFieldB,$q),1000.0) // termScoreFieldB is a
textField. No docValue, just indexed

This *adds* to the score:

Returns the number of times the term appears in the field for that document.

termfreq(text,'memory')
So I am not even sure how multi term is managed(of course this depends also
on the tokenization of termScoreFieldB.
the* 1000* there smells a lot of bad practice, as you are adding to your
score, and your score is not probabilistic, nor limited to a constant range
of values (the main Lucene score value depends on the query and the index).
It feels you are likely going to get a better behaviour modelling such
requirement as an additional boost query rather then a boost function, but
I am curious to know what is that you are attempting to do.

Cheers
--------------------------
Alessandro Benedetti
Apache Lucene/Solr PMC member and Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Wed, 19 Jan 2022 at 13:44, Joel Bernstein <[email protected]> wrote:

> Testing out a smaller "rows" param is key. Then you can isolate the
> performance difference due to the 500 rows. Adding more shards is going to
> increase the penalty for having 500 rows, so it's good to understand how
> big that penalty is.
>
> Then test out smaller result sets by adjusting the query. Gradually
> increase the result set size by adjusting the query. You then can get a
> feel for how result set size affects performance. This will give you an
> indication how much it will help to have more shards.
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Jan 19, 2022 at 6:19 AM Charlie Hull <
> [email protected]> wrote:
>
> > Hi Ashwin,
> >
> > What happens if you reduce the number of rows requested? Do you really
> > need 500 results each time? I think this will ask for 500 results from
> > *each shard* too.
> > https://solr.apache.org/guide/8_7/pagination-of-results.html
> >
> > Also it looks like you mean boost=def(boostFieldA,1) not
> > boost:def(boostFieldA,1), am I right?
> >
> > Cheers
> >
> > Charlie
> >
> > On 19/01/2022 02:43, Ashwin Ramesh wrote:
> > > Gentle ping! Promise it's my final one! :)
> > >
> > > On Thu, Jan 13, 2022 at 8:01 AM Ashwin Ramesh<[email protected]>
> wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I have a few questions about how we can improve our solr query
> > >> performance, especially for boosts (BF, BQ, boost, etc).
> > >>
> > >> *System Specs:*
> > >> Solr Version: 7.7.x
> > >> Heap Size: 31gb
> > >> Num Docs: >100M
> > >> Shards: 8
> > >> Replication Factor: 6
> > >> Index is completely mapped into memory
> > >>
> > >>
> > >> Example query:
> > >> {
> > >> q=hello world
> > >> qf=title description keywords
> > >> pf=title^0.5
> > >> ps=0
> > >> fq=type:P
> > >> boost:def(boostFieldA,1) // boostFieldA is docValue float type
> > >> bf=mul(termfreq(termScoreFieldB,$q),1000.0) // termScoreFieldB is a
> > >> textField. No docValue, just indexed
> > >> rows:500
> > >> fl=id,score
> > >> }
> > >>
> > >> numFound: >21M
> > >> qTime: 800ms
> > >>
> > >> Experimentation of params:
> > >>
> > >>     - When I remove the boost parameter, the qTime drops to 525ms
> > >>     - When I remove the bf parameter, the qTime dropes to 650ms
> > >>     - When I remove both the boost & bf parameters, the qTime drops to
> > >>     400ms
> > >>
> > >>
> > >> Questions:
> > >>
> > >>     1. Is there any way to improve the performance of the boosts
> > (specific
> > >>     field types, etc)?
> > >>     2. Will sharding further such that each core only has to score a
> > >>     smaller subset of documents help with query performance?
> > >>     3. Is there any performance impact when boosting/querying against
> > >>     sparse fields, both indexed=true or docValues=true?
> > >>     4. It seems the base case scoring is 400ms, which is already quite
> > >>     high. Is this because the query (hello world) implicitly gets
> > parsed as
> > >>     (hello OR world)? Thus it would be more computationally expensive?
> > >>     5. Any other advice :) ?
> > >>
> > >>
> > >> Thanks in advance,
> > >>
> > >> Ash
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > --
> > Charlie Hull - Managing Consultant at OpenSource Connections Limited
> > Founding member of The Search Network <http://www.thesearchnetwork.com>
> > and co-author of Searching the Enterprise
> > <
> >
> https://opensourceconnections.com/wp-content/uploads/2020/08/ES_book_final_journal_version.pdf
> > >
> > tel/fax: +44 (0)8700 118334
> > mobile: +44 (0)7767 825828
> >
> > OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
> > Amtsgericht Charlottenburg | HRB 230712 B
> > Geschäftsführer: John M. Woodell | David E. Pugh
> > Finanzamt: Berlin Finanzamt für Körperschaften II
> >
> > --
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
>

Reply via email to