In fact, the terms in the NOT seem to result in wildly different response
times for no apparent reason:

AND+NOT+(internship+OR+intern+OR+graduate) 23.966s
AND+NOT+(internship+OR+intern+OR+welder) 3.368s
AND+NOT+(internship+OR+welder+OR+graduate) 23.958s
AND+NOT+(welder+OR+intern+OR+graduate) 24.465s
AND+NOT+(internship+OR+intern+OR+graduate+OR+welder) 2.062s
AND+NOT+(internship+OR+intern+OR+welder+OR+graduate)'  1.722s
AND+NOT+(internship+OR+graduate+OR+intern) 25.353s
AND+NOT+(internship+OR+graduate+OR+welder) 24.473s

On Mon, 11 Nov 2024 at 14:03, Dominic Humphries <domi...@adzuna.com> wrote:

> Hm.
>
> An update: Making the exact same request, but adding a single term (in
> this case, "welder") to the "and not" results in a massively smaller debug
> output, and the search is done in under half a second. I can't understand
> why such a massive difference from *adding* a term?
>
>   "debug":{
>     "rawquerystring":"(carroll_county OR Aldi OR Cashier OR Kohls) AND NOT
> (internship OR intern OR graduate OR welder)",
>     "querystring":"(carroll_county OR Aldi OR Cashier OR Kohls) AND NOT
> (internship OR intern OR graduate OR welder)",
>
> "parsedquery":"FunctionScoreQuery(FunctionScoreQuery(+(+(((+(company:carrollcounti
> company:\"carrol counti\")) | keywords:carroll_counti |
> (+(description:carrollcounti description:\"carrol counti\"))^2.0 |
> (+(title:carrollcounti title:\"carrol counti\"))^5.0)~0.01 ((ti
> tle:aldi)^5.0 | (description:aldi)^2.0 | keywords:Aldi |
> company:aldi)~0.01 ((+(company:walmart company:target company:costco
> company:\"best buy\" company:\"home depot\" company:\"whole food\"
> company:\"shop assist\" company:\"picker packer\" company:checker
> company:\"reta
> il sale assist\" company:\"store associ\" company:cashier)) |
> (+(description:walmart description:target description:costco
> description:\"best buy\" description:\"home depot\" description:\"whole
> food\" description:\"shop assist\" description:\"picker packer\"
> description:c
> hecker description:\"retail sale assist\" description:\"store associ\"
> description:cashier))^2.0 | (+(title:walmart title:target title:costco
> title:\"best buy\" title:\"home depot\" title:\"whole food\" title:\"shop
> assist\" title:\"picker packer\" title:checker title:\"re
> tail sale assist\" title:\"store associ\" title:cashier))^5.0 |
> keywords:Cashier)~0.01 ((+(title:\"tj maxx\" title:maci title:h&m
> title:nike title:kohl))^5.0 | (+(description:\"tj maxx\" description:maci
> description:h&m description:nike description:kohl))^2.0 | keywords:Ko
> hl | (+(company:\"tj maxx\" company:maci company:h&m company:nike
> company:kohl)))~0.01) -((keywords:internship | Synonym(company:apprentic
> company:apprenticeship company:intern company:internship) |
> (Synonym(description:apprentic description:apprenticeship description:inte
> rn description:internship))^2.0 | (Synonym(title:apprentic
> title:apprenticeship title:intern title:internship))^5.0)~0.01
> (keywords:intern | Synonym(company:apprentic company:apprenticeship
> company:intern company:internship) | (Synonym(description:apprentic
> description:app
> renticeship description:intern description:internship))^2.0 |
> (Synonym(title:apprentic title:apprenticeship title:intern
> title:internship))^5.0)~0.01 ((Synonym(description:grad
> description:graduat))^2.0 | Synonym(company:grad company:graduat) |
> (Synonym(title:grad title:gr
> aduat))^5.0 | keywords:graduat)~0.01 ((+(title:\"spot welder\"
> title:metalwork title:metallurgist title:welder))^5.0 |
> (+(description:\"spot welder\" description:metalwork
> description:metallurgist description:welder))^2.0 | keywords:welder |
> (+(company:\"spot welder\" comp
> any:metalwork company:metallurgist company:welder)))~0.01)) ()
> (+(+(reply_on_adzuna:T)^0.5)), scored by boost(float(boost_factor))))",
>
> "parsedquery_toString":"FunctionScoreQuery(+(+(((+(company:carrollcounti
> company:\"carrol counti\")) | keywords:carroll_counti |
> (+(description:carrollcounti description:\"carrol counti\"))^2.0 |
> (+(title:carrollcounti title:\"carrol counti\"))^5.0)~0.01 ((title:aldi)^
> 5.0 | (description:aldi)^2.0 | keywords:Aldi | company:aldi)~0.01
> ((+(company:walmart company:target company:costco company:\"best buy\"
> company:\"home depot\" company:\"whole food\" company:\"shop assist\"
> company:\"picker packer\" company:checker company:\"retail sale as
> sist\" company:\"store associ\" company:cashier)) | (+(description:walmart
> description:target description:costco description:\"best buy\"
> description:\"home depot\" description:\"whole food\" description:\"shop
> assist\" description:\"picker packer\" description:checker des
> cription:\"retail sale assist\" description:\"store associ\"
> description:cashier))^2.0 | (+(title:walmart title:target title:costco
> title:\"best buy\" title:\"home depot\" title:\"whole food\" title:\"shop
> assist\" title:\"picker packer\" title:checker title:\"retail sale
> assist\" title:\"store associ\" title:cashier))^5.0 |
> keywords:Cashier)~0.01 ((+(title:\"tj maxx\" title:maci title:h&m
> title:nike title:kohl))^5.0 | (+(description:\"tj maxx\" description:maci
> description:h&m description:nike description:kohl))^2.0 | keywords:Kohl |
> (+(co
> mpany:\"tj maxx\" company:maci company:h&m company:nike
> company:kohl)))~0.01) -((keywords:internship | Synonym(company:apprentic
> company:apprenticeship company:intern company:internship) |
> (Synonym(description:apprentic description:apprenticeship
> description:intern descrip
> tion:internship))^2.0 | (Synonym(title:apprentic title:apprenticeship
> title:intern title:internship))^5.0)~0.01 (keywords:intern |
> Synonym(company:apprentic company:apprenticeship company:intern
> company:internship) | (Synonym(description:apprentic
> description:apprenticeshi
> p description:intern description:internship))^2.0 |
> (Synonym(title:apprentic title:apprenticeship title:intern
> title:internship))^5.0)~0.01 ((Synonym(description:grad
> description:graduat))^2.0 | Synonym(company:grad company:graduat) |
> (Synonym(title:grad title:graduat))^5.
> 0 | keywords:graduat)~0.01 ((+(title:\"spot welder\" title:metalwork
> title:metallurgist title:welder))^5.0 | (+(description:\"spot welder\"
> description:metalwork description:metallurgist description:welder))^2.0 |
> keywords:welder | (+(company:\"spot welder\" company:metalw
> ork company:metallurgist company:welder)))~0.01)) ()
> (+(+(reply_on_adzuna:T)^0.5)), scored by boost(float(boost_factor)))",
>     "explain":{ },
>
> On Fri, 8 Nov 2024 at 14:41, Dominic Humphries <domi...@adzuna.com> wrote:
>
>> It also apparently doesn't allow emails big enough for the debug output.
>> Here's a link to a Google Doc with the output in:
>> https://docs.google.com/document/d/1TUPE4Qkc-zjKGCJnn0_YVMOfaLgzlF2YCFF9sz4LNcQ/edit?usp=sharing
>>
>> I hope that works well enough, if not we'll have to work out some other
>> option..
>>
>> On Fri, 8 Nov 2024 at 14:01, Gus Heck <gus.h...@gmail.com> wrote:
>>
>>> The mailing list usually strips out attachments. You'll need to paste it
>>> into the body of the email.
>>>
>>> On Fri, Nov 8, 2024 at 7:16 AM Dominic Humphries
>>> <domi...@adzuna.com.invalid>
>>> wrote:
>>>
>>> > Fair enough! See attached, if that doesn't work I'll send it inline...
>>> >
>>> > On Thu, 7 Nov 2024 at 18:40, Gus Heck <gus.h...@gmail.com> wrote:
>>> >
>>> >> Yes, seeing the final expanded query may shed light on where the time
>>> is
>>> >> going, so voluminous output is good. Feel free to anonymize any
>>> customer
>>> >> names or sensitive information with "<REDACTED>" or similar.
>>> >>
>>> >> On Thu, Nov 7, 2024 at 12:21 PM Dominic Humphries
>>> >> <domi...@adzuna.com.invalid> wrote:
>>> >>
>>> >> > Yes, sorry, not cloud, afaik it's single-sharded.
>>> >> >
>>> >> > Same query with facet fields removed takes just as long to run.
>>> Adding
>>> >> the
>>> >> > debug to the request generates a rather large amount of output, I
>>> >> believe
>>> >> > due to synonyms - I can send them if it's useful, but it's rather a
>>> lot?
>>> >> >
>>> >> > On Thu, 7 Nov 2024 at 15:37, Gus Heck <gus.h...@gmail.com> wrote:
>>> >> >
>>> >> > > Ok so that's 7M docs at 3k/doc...  a relatively reasonable index
>>> (at
>>> >> > least
>>> >> > > if the hardware is reasonable, and you say it did work on 8.11 so
>>> >> that's
>>> >> > > probably fine).
>>> >> > >
>>> >> > > By your reply I assume it's single sharded and not using
>>> >> cloud/zookeeper?
>>> >> > >
>>> >> > > The request you showed has a lot of facets on it. How much
>>> difference
>>> >> > does
>>> >> > > it make to the situation if you just send the query without the
>>> >> facets?
>>> >> > >
>>> >> > > Also add &debug=query and send us the debug output from the header
>>> >> when
>>> >> > you
>>> >> > > do that...
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > On Thu, Nov 7, 2024 at 9:31 AM Dominic Humphries
>>> >> > > <domi...@adzuna.com.invalid>
>>> >> > > wrote:
>>> >> > >
>>> >> > > > Sure:
>>> >> > > >       "index":{
>>> >> > > >         "numDocs":7349353,
>>> >> > > >         "maxDoc":7834951,
>>> >> > > >         "deletedDocs":485598,
>>> >> > > >         "segmentCount":31,
>>> >> > > >         "segmentsFileSizeInBytes":2727,
>>> >> > > >         "sizeInBytes":22066572844,
>>> >> > > >         "size":"20.55 GB"
>>> >> > > >
>>> >> > > > On Thu, 7 Nov 2024 at 13:27, Gus Heck <gus.h...@gmail.com>
>>> wrote:
>>> >> > > >
>>> >> > > > > This is interesting, can you give us a feel for the
>>> >> size/structure of
>>> >> > > the
>>> >> > > > > index (# of documents, size of index, # of shards)?
>>> >> > > > >
>>> >> > > > > On Thu, Nov 7, 2024 at 7:52 AM Dominic Humphries
>>> >> > > > > <domi...@adzuna.com.invalid>
>>> >> > > > > wrote:
>>> >> > > > >
>>> >> > > > > > An update, I found the part of the query that's making
>>> >> everything
>>> >> > so
>>> >> > > > > slow:
>>> >> > > > > > the q param
>>> >> > > > > >
>>> >> > > > > > When we have
>>> >> > > > > >       "q":"(carroll_county OR Aldi OR Cashier OR Kohls) AND
>>> NOT
>>> >> > > > > (internship
>>> >> > > > > > OR intern OR graduate)",
>>> >> > > > > > the search is very slow, taking 20-something seconds
>>> >> > > > > >
>>> >> > > > > > When it's just
>>> >> > > > > >       "q":"(carroll_county OR Aldi OR Cashier OR Kohls)",
>>> >> > > > > > the search is blazing fast, coming back in under a second.
>>> So it
>>> >> > > > appears
>>> >> > > > > > it's something triggered by the NOT that's both taking all
>>> the
>>> >> > time,
>>> >> > > > and
>>> >> > > > > > not getting caught by the timeAllowed limit
>>> >> > > > > >
>>> >> > > > > > Full query below:
>>> >> > > > > >
>>> >> > > > > >
>>> >> > > > >
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>> select?f.contract_type.facet.limit=2&fl=*&f.company_id.facet.mincount=1&qt=edismax&f.contract_time.facet.missing=false&f.location_struct.facet.limit=50&facet.date.end=NOW%2FDAY%2B1DAYS&ps=2&f.description.hl.snippets=2&stats.field=salary_avg_stats&facet.date.gap=%2B1DAY&pf=title&stats=true&_qtags=api_id%3Ab02dbf6d~784741%7CFCGI%3A%3AModel%3A%3AWWW%3A%3AJobsBase%3A%3ASearch%7C2781%7CCHOMkO6R7xGwlQ6bKp_SoQ&qs=5&f.contract_time.facet.limit=2&f.contract_time.facet.mincount=1&f.company_id.facet.missing=false&facet.date=%7B!key%3Dfreshness%7Dcreated&bq=(reply_on_adzuna%3Atrue%5E0.5)&f.contract_type.facet.mincount=1&wt=json&f.location_struct.facet.mincount=1&facet.date.hardend=true&f.category_id.facet.limit=50&timeAllowed=4900&f.contract_type.facet.missing=false&f.category_id.facet.mincount=1&sort=score+desc&q.alt=*%3A*&boost=boost_factor&f.company_id.facet.limit=50&facet.date.start=NOW%2FDAY-7DAYS&facet=false&facet.field=%7B!key%3Dlocation%3Aid%7Dlocation_struct&facet.field=%7B!key%3Dcategory%3Aid%7Dcategory_id&facet.field=contract_type&facet.field=contract_time&facet.field=%7B!key%3Dcompany%3Aid%7Dcompany_id&f.description.hl.fragsize=180&hl=false&rows=20&start=0&q=(carroll_county+OR+Aldi+OR+Cashier+OR+Kohls)+AND+NOT+(internship+OR+intern+OR+graduate)&fq=location_id%3A151946&fq=boosted%3A1&fq=%7B!cost%3D200%7Dsearch_category%3A0&fq=created%3A%5BNOW%2FDAY-14DAYS+TO+*%5D
>>> >> > > > > >
>>> >> > > > > > On Wed, 6 Nov 2024 at 17:00, Dominic Humphries <
>>> >> domi...@adzuna.com
>>> >> > >
>>> >> > > > > wrote:
>>> >> > > > > >
>>> >> > > > > > > I spoke too soon, I figured out how to get VisualVM
>>> talking to
>>> >> > > solr.
>>> >> > > > > Now
>>> >> > > > > > > I'm just not sure what to do with it - what sorts of
>>> things
>>> >> am I
>>> >> > > > > looking
>>> >> > > > > > > for?
>>> >> > > > > > >
>>> >> > > > > > > On Wed, 6 Nov 2024 at 16:40, Dominic Humphries <
>>> >> > domi...@adzuna.com
>>> >> > > >
>>> >> > > > > > wrote:
>>> >> > > > > > >
>>> >> > > > > > >> Unfortunately I don't know Java anywhere near well
>>> enough to
>>> >> > know
>>> >> > > my
>>> >> > > > > way
>>> >> > > > > > >> around a profiler or jstack. I've confirmed JMX is
>>> enabled
>>> >> and I
>>> >> > > can
>>> >> > > > > > telnet
>>> >> > > > > > >> to the port, but VisualVM fails to connect and gives me
>>> no
>>> >> > reason
>>> >> > > as
>>> >> > > > > to
>>> >> > > > > > >> why.
>>> >> > > > > > >>
>>> >> > > > > > >> I can post the query and result if that's useful - it
>>> doesn't
>>> >> > > return
>>> >> > > > > any
>>> >> > > > > > >> records so there's nothing to censor
>>> >> > > > > > >>
>>> >> > > > > > >> On Wed, 6 Nov 2024 at 15:36, Gus Heck <
>>> gus.h...@gmail.com>
>>> >> > wrote:
>>> >> > > > > > >>
>>> >> > > > > > >>> If you have access to a test instance where the problem
>>> can
>>> >> be
>>> >> > > > > > >>> reproduced,
>>> >> > > > > > >>> attaching a profiler would be one way. Another cruder
>>> >> method is
>>> >> > > to
>>> >> > > > > use
>>> >> > > > > > >>> jstack to dump all the threads.
>>> >> > > > > > >>>
>>> >> > > > > > >>> Another way to tackle this is to help us reproduce your
>>> >> > problem.
>>> >> > > > Can
>>> >> > > > > > you
>>> >> > > > > > >>> share details about your query? Obviously, please don't
>>> post
>>> >> > > > anything
>>> >> > > > > > >>> your
>>> >> > > > > > >>> company wouldn't want public, but if you can share some
>>> >> details
>>> >> > > > that
>>> >> > > > > > >>> would
>>> >> > > > > > >>> be a start.
>>> >> > > > > > >>>
>>> >> > > > > > >>> The ideal thing would be to provide a minimum working
>>> >> example
>>> >> > of
>>> >> > > > the
>>> >> > > > > > >>> problem you are experiencing.
>>> >> > > > > > >>>
>>> >> > > > > > >>> On Wed, Nov 6, 2024 at 9:55 AM Dominic Humphries
>>> >> > > > > > >>> <domi...@adzuna.com.invalid>
>>> >> > > > > > >>> wrote:
>>> >> > > > > > >>>
>>> >> > > > > > >>> > I've tried both timeAllowed and cpuAllowed and
>>> neither are
>>> >> > > > > > restricting
>>> >> > > > > > >>> the
>>> >> > > > > > >>> > amount of time the queries take to run. I have a test
>>> >> query
>>> >> > > > that's
>>> >> > > > > > >>> reliably
>>> >> > > > > > >>> > taking 20-30 seconds, if there's any useful debug
>>> params
>>> >> or
>>> >> > > such
>>> >> > > > I
>>> >> > > > > > can
>>> >> > > > > > >>> run
>>> >> > > > > > >>> > to provide the information you want I'm happy to run
>>> them
>>> >> -
>>> >> > I'm
>>> >> > > > not
>>> >> > > > > > >>> sure
>>> >> > > > > > >>> > how to usefully interrogate solr for where its time is
>>> >> being
>>> >> > > > spent,
>>> >> > > > > > >>> sorry
>>> >> > > > > > >>> >
>>> >> > > > > > >>> > Thanks
>>> >> > > > > > >>> >
>>> >> > > > > > >>> > On Wed, 6 Nov 2024 at 14:25, Gus Heck <
>>> gus.h...@gmail.com
>>> >> >
>>> >> > > > wrote:
>>> >> > > > > > >>> >
>>> >> > > > > > >>> > > There are unit tests that seem to suggest that
>>> >> timeAllowed
>>> >> > > > still
>>> >> > > > > > >>> works,
>>> >> > > > > > >>> > can
>>> >> > > > > > >>> > > you provide some more information about your use
>>> case?
>>> >> > > > > Particularly
>>> >> > > > > > >>> > > important is any information about where (what code)
>>> >> your
>>> >> > > > queries
>>> >> > > > > > are
>>> >> > > > > > >>> > > spending a lot of time in if you have it.
>>> >> > > > > > >>> > >
>>> >> > > > > > >>> > > On Wed, Nov 6, 2024 at 6:18 AM Dominic Humphries
>>> >> > > > > > >>> > > <domi...@adzuna.com.invalid>
>>> >> > > > > > >>> > > wrote:
>>> >> > > > > > >>> > >
>>> >> > > > > > >>> > > > Hi folks,
>>> >> > > > > > >>> > > >
>>> >> > > > > > >>> > > > we're testing Solr 9.7 to upgrade our existing
>>> 8.11
>>> >> > stack.
>>> >> > > > > We're
>>> >> > > > > > >>> > seeing a
>>> >> > > > > > >>> > > > problem with long requests: we send
>>> `timeAllowed=4900`
>>> >> > > which
>>> >> > > > > > works
>>> >> > > > > > >>> fine
>>> >> > > > > > >>> > > on
>>> >> > > > > > >>> > > > the existing 8.11 and keeps requests to just a few
>>> >> > seconds.
>>> >> > > > > > >>> > > >
>>> >> > > > > > >>> > > > With 9.7, however, the flag is basically ignored -
>>> >> > requests
>>> >> > > > can
>>> >> > > > > > >>> take
>>> >> > > > > > >>> > over
>>> >> > > > > > >>> > > > 30 seconds whether the flag is present or not,
>>> which
>>> >> is
>>> >> > > > causing
>>> >> > > > > > >>> higher
>>> >> > > > > > >>> > > CPU
>>> >> > > > > > >>> > > > load and slowing response times.
>>> >> > > > > > >>> > > >
>>> >> > > > > > >>> > > > I've tried setting the flag suggested in
>>> >> > > > > > >>> > > >
>>> >> > > > > > >>> > > >
>>> >> > > > > > >>> > >
>>> >> > > > > > >>> >
>>> >> > > > > > >>>
>>> >> > > > > >
>>> >> > > > >
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>> https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-9.html#use-of-timeallowed
>>> >> > > > > > >>> > > > - but even with solr.useExitableDirectoryReader
>>> set we
>>> >> > > still
>>> >> > > > > > don't
>>> >> > > > > > >>> get
>>> >> > > > > > >>> > > the
>>> >> > > > > > >>> > > > desired behaviour.
>>> >> > > > > > >>> > > >
>>> >> > > > > > >>> > > > Is there anything else I can try to get the old
>>> >> behaviour
>>> >> > > > back?
>>> >> > > > > > >>> > > >
>>> >> > > > > > >>> > > > Thanks
>>> >> > > > > > >>> > > >
>>> >> > > > > > >>> > >
>>> >> > > > > > >>> > >
>>> >> > > > > > >>> > > --
>>> >> > > > > > >>> > > http://www.needhamsoftware.com (work)
>>> >> > > > > > >>> > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
>>> >> > > > > > >>> > >
>>> >> > > > > > >>> >
>>> >> > > > > > >>>
>>> >> > > > > > >>>
>>> >> > > > > > >>> --
>>> >> > > > > > >>> http://www.needhamsoftware.com (work)
>>> >> > > > > > >>> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>>> >> > > > > > >>>
>>> >> > > > > > >>
>>> >> > > > > >
>>> >> > > > >
>>> >> > > > >
>>> >> > > > > --
>>> >> > > > > http://www.needhamsoftware.com (work)
>>> >> > > > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
>>> >> > > > >
>>> >> > > >
>>> >> > >
>>> >> > >
>>> >> > > --
>>> >> > > http://www.needhamsoftware.com (work)
>>> >> > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
>>> >> > >
>>> >> >
>>> >>
>>> >>
>>> >> --
>>> >> http://www.needhamsoftware.com (work)
>>> >> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>>> >>
>>> >
>>>
>>> --
>>> http://www.needhamsoftware.com (work)
>>> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>>>
>>

Reply via email to