Hm.

An update: Making the exact same request, but adding a single term (in this
case, "welder") to the "and not" results in a massively smaller debug
output, and the search is done in under half a second. I can't understand
why such a massive difference from *adding* a term?

  "debug":{
    "rawquerystring":"(carroll_county OR Aldi OR Cashier OR Kohls) AND NOT
(internship OR intern OR graduate OR welder)",
    "querystring":"(carroll_county OR Aldi OR Cashier OR Kohls) AND NOT
(internship OR intern OR graduate OR welder)",

"parsedquery":"FunctionScoreQuery(FunctionScoreQuery(+(+(((+(company:carrollcounti
company:\"carrol counti\")) | keywords:carroll_counti |
(+(description:carrollcounti description:\"carrol counti\"))^2.0 |
(+(title:carrollcounti title:\"carrol counti\"))^5.0)~0.01 ((ti
tle:aldi)^5.0 | (description:aldi)^2.0 | keywords:Aldi | company:aldi)~0.01
((+(company:walmart company:target company:costco company:\"best buy\"
company:\"home depot\" company:\"whole food\" company:\"shop assist\"
company:\"picker packer\" company:checker company:\"reta
il sale assist\" company:\"store associ\" company:cashier)) |
(+(description:walmart description:target description:costco
description:\"best buy\" description:\"home depot\" description:\"whole
food\" description:\"shop assist\" description:\"picker packer\"
description:c
hecker description:\"retail sale assist\" description:\"store associ\"
description:cashier))^2.0 | (+(title:walmart title:target title:costco
title:\"best buy\" title:\"home depot\" title:\"whole food\" title:\"shop
assist\" title:\"picker packer\" title:checker title:\"re
tail sale assist\" title:\"store associ\" title:cashier))^5.0 |
keywords:Cashier)~0.01 ((+(title:\"tj maxx\" title:maci title:h&m
title:nike title:kohl))^5.0 | (+(description:\"tj maxx\" description:maci
description:h&m description:nike description:kohl))^2.0 | keywords:Ko
hl | (+(company:\"tj maxx\" company:maci company:h&m company:nike
company:kohl)))~0.01) -((keywords:internship | Synonym(company:apprentic
company:apprenticeship company:intern company:internship) |
(Synonym(description:apprentic description:apprenticeship description:inte
rn description:internship))^2.0 | (Synonym(title:apprentic
title:apprenticeship title:intern title:internship))^5.0)~0.01
(keywords:intern | Synonym(company:apprentic company:apprenticeship
company:intern company:internship) | (Synonym(description:apprentic
description:app
renticeship description:intern description:internship))^2.0 |
(Synonym(title:apprentic title:apprenticeship title:intern
title:internship))^5.0)~0.01 ((Synonym(description:grad
description:graduat))^2.0 | Synonym(company:grad company:graduat) |
(Synonym(title:grad title:gr
aduat))^5.0 | keywords:graduat)~0.01 ((+(title:\"spot welder\"
title:metalwork title:metallurgist title:welder))^5.0 |
(+(description:\"spot welder\" description:metalwork
description:metallurgist description:welder))^2.0 | keywords:welder |
(+(company:\"spot welder\" comp
any:metalwork company:metallurgist company:welder)))~0.01)) ()
(+(+(reply_on_adzuna:T)^0.5)), scored by boost(float(boost_factor))))",

"parsedquery_toString":"FunctionScoreQuery(+(+(((+(company:carrollcounti
company:\"carrol counti\")) | keywords:carroll_counti |
(+(description:carrollcounti description:\"carrol counti\"))^2.0 |
(+(title:carrollcounti title:\"carrol counti\"))^5.0)~0.01 ((title:aldi)^
5.0 | (description:aldi)^2.0 | keywords:Aldi | company:aldi)~0.01
((+(company:walmart company:target company:costco company:\"best buy\"
company:\"home depot\" company:\"whole food\" company:\"shop assist\"
company:\"picker packer\" company:checker company:\"retail sale as
sist\" company:\"store associ\" company:cashier)) | (+(description:walmart
description:target description:costco description:\"best buy\"
description:\"home depot\" description:\"whole food\" description:\"shop
assist\" description:\"picker packer\" description:checker des
cription:\"retail sale assist\" description:\"store associ\"
description:cashier))^2.0 | (+(title:walmart title:target title:costco
title:\"best buy\" title:\"home depot\" title:\"whole food\" title:\"shop
assist\" title:\"picker packer\" title:checker title:\"retail sale
assist\" title:\"store associ\" title:cashier))^5.0 |
keywords:Cashier)~0.01 ((+(title:\"tj maxx\" title:maci title:h&m
title:nike title:kohl))^5.0 | (+(description:\"tj maxx\" description:maci
description:h&m description:nike description:kohl))^2.0 | keywords:Kohl |
(+(co
mpany:\"tj maxx\" company:maci company:h&m company:nike
company:kohl)))~0.01) -((keywords:internship | Synonym(company:apprentic
company:apprenticeship company:intern company:internship) |
(Synonym(description:apprentic description:apprenticeship
description:intern descrip
tion:internship))^2.0 | (Synonym(title:apprentic title:apprenticeship
title:intern title:internship))^5.0)~0.01 (keywords:intern |
Synonym(company:apprentic company:apprenticeship company:intern
company:internship) | (Synonym(description:apprentic
description:apprenticeshi
p description:intern description:internship))^2.0 |
(Synonym(title:apprentic title:apprenticeship title:intern
title:internship))^5.0)~0.01 ((Synonym(description:grad
description:graduat))^2.0 | Synonym(company:grad company:graduat) |
(Synonym(title:grad title:graduat))^5.
0 | keywords:graduat)~0.01 ((+(title:\"spot welder\" title:metalwork
title:metallurgist title:welder))^5.0 | (+(description:\"spot welder\"
description:metalwork description:metallurgist description:welder))^2.0 |
keywords:welder | (+(company:\"spot welder\" company:metalw
ork company:metallurgist company:welder)))~0.01)) ()
(+(+(reply_on_adzuna:T)^0.5)), scored by boost(float(boost_factor)))",
    "explain":{ },

On Fri, 8 Nov 2024 at 14:41, Dominic Humphries <domi...@adzuna.com> wrote:

> It also apparently doesn't allow emails big enough for the debug output.
> Here's a link to a Google Doc with the output in:
> https://docs.google.com/document/d/1TUPE4Qkc-zjKGCJnn0_YVMOfaLgzlF2YCFF9sz4LNcQ/edit?usp=sharing
>
> I hope that works well enough, if not we'll have to work out some other
> option..
>
> On Fri, 8 Nov 2024 at 14:01, Gus Heck <gus.h...@gmail.com> wrote:
>
>> The mailing list usually strips out attachments. You'll need to paste it
>> into the body of the email.
>>
>> On Fri, Nov 8, 2024 at 7:16 AM Dominic Humphries
>> <domi...@adzuna.com.invalid>
>> wrote:
>>
>> > Fair enough! See attached, if that doesn't work I'll send it inline...
>> >
>> > On Thu, 7 Nov 2024 at 18:40, Gus Heck <gus.h...@gmail.com> wrote:
>> >
>> >> Yes, seeing the final expanded query may shed light on where the time
>> is
>> >> going, so voluminous output is good. Feel free to anonymize any
>> customer
>> >> names or sensitive information with "<REDACTED>" or similar.
>> >>
>> >> On Thu, Nov 7, 2024 at 12:21 PM Dominic Humphries
>> >> <domi...@adzuna.com.invalid> wrote:
>> >>
>> >> > Yes, sorry, not cloud, afaik it's single-sharded.
>> >> >
>> >> > Same query with facet fields removed takes just as long to run.
>> Adding
>> >> the
>> >> > debug to the request generates a rather large amount of output, I
>> >> believe
>> >> > due to synonyms - I can send them if it's useful, but it's rather a
>> lot?
>> >> >
>> >> > On Thu, 7 Nov 2024 at 15:37, Gus Heck <gus.h...@gmail.com> wrote:
>> >> >
>> >> > > Ok so that's 7M docs at 3k/doc...  a relatively reasonable index
>> (at
>> >> > least
>> >> > > if the hardware is reasonable, and you say it did work on 8.11 so
>> >> that's
>> >> > > probably fine).
>> >> > >
>> >> > > By your reply I assume it's single sharded and not using
>> >> cloud/zookeeper?
>> >> > >
>> >> > > The request you showed has a lot of facets on it. How much
>> difference
>> >> > does
>> >> > > it make to the situation if you just send the query without the
>> >> facets?
>> >> > >
>> >> > > Also add &debug=query and send us the debug output from the header
>> >> when
>> >> > you
>> >> > > do that...
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, Nov 7, 2024 at 9:31 AM Dominic Humphries
>> >> > > <domi...@adzuna.com.invalid>
>> >> > > wrote:
>> >> > >
>> >> > > > Sure:
>> >> > > >       "index":{
>> >> > > >         "numDocs":7349353,
>> >> > > >         "maxDoc":7834951,
>> >> > > >         "deletedDocs":485598,
>> >> > > >         "segmentCount":31,
>> >> > > >         "segmentsFileSizeInBytes":2727,
>> >> > > >         "sizeInBytes":22066572844,
>> >> > > >         "size":"20.55 GB"
>> >> > > >
>> >> > > > On Thu, 7 Nov 2024 at 13:27, Gus Heck <gus.h...@gmail.com>
>> wrote:
>> >> > > >
>> >> > > > > This is interesting, can you give us a feel for the
>> >> size/structure of
>> >> > > the
>> >> > > > > index (# of documents, size of index, # of shards)?
>> >> > > > >
>> >> > > > > On Thu, Nov 7, 2024 at 7:52 AM Dominic Humphries
>> >> > > > > <domi...@adzuna.com.invalid>
>> >> > > > > wrote:
>> >> > > > >
>> >> > > > > > An update, I found the part of the query that's making
>> >> everything
>> >> > so
>> >> > > > > slow:
>> >> > > > > > the q param
>> >> > > > > >
>> >> > > > > > When we have
>> >> > > > > >       "q":"(carroll_county OR Aldi OR Cashier OR Kohls) AND
>> NOT
>> >> > > > > (internship
>> >> > > > > > OR intern OR graduate)",
>> >> > > > > > the search is very slow, taking 20-something seconds
>> >> > > > > >
>> >> > > > > > When it's just
>> >> > > > > >       "q":"(carroll_county OR Aldi OR Cashier OR Kohls)",
>> >> > > > > > the search is blazing fast, coming back in under a second.
>> So it
>> >> > > > appears
>> >> > > > > > it's something triggered by the NOT that's both taking all
>> the
>> >> > time,
>> >> > > > and
>> >> > > > > > not getting caught by the timeAllowed limit
>> >> > > > > >
>> >> > > > > > Full query below:
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> select?f.contract_type.facet.limit=2&fl=*&f.company_id.facet.mincount=1&qt=edismax&f.contract_time.facet.missing=false&f.location_struct.facet.limit=50&facet.date.end=NOW%2FDAY%2B1DAYS&ps=2&f.description.hl.snippets=2&stats.field=salary_avg_stats&facet.date.gap=%2B1DAY&pf=title&stats=true&_qtags=api_id%3Ab02dbf6d~784741%7CFCGI%3A%3AModel%3A%3AWWW%3A%3AJobsBase%3A%3ASearch%7C2781%7CCHOMkO6R7xGwlQ6bKp_SoQ&qs=5&f.contract_time.facet.limit=2&f.contract_time.facet.mincount=1&f.company_id.facet.missing=false&facet.date=%7B!key%3Dfreshness%7Dcreated&bq=(reply_on_adzuna%3Atrue%5E0.5)&f.contract_type.facet.mincount=1&wt=json&f.location_struct.facet.mincount=1&facet.date.hardend=true&f.category_id.facet.limit=50&timeAllowed=4900&f.contract_type.facet.missing=false&f.category_id.facet.mincount=1&sort=score+desc&q.alt=*%3A*&boost=boost_factor&f.company_id.facet.limit=50&facet.date.start=NOW%2FDAY-7DAYS&facet=false&facet.field=%7B!key%3Dlocation%3Aid%7Dlocation_struct&facet.field=%7B!key%3Dcategory%3Aid%7Dcategory_id&facet.field=contract_type&facet.field=contract_time&facet.field=%7B!key%3Dcompany%3Aid%7Dcompany_id&f.description.hl.fragsize=180&hl=false&rows=20&start=0&q=(carroll_county+OR+Aldi+OR+Cashier+OR+Kohls)+AND+NOT+(internship+OR+intern+OR+graduate)&fq=location_id%3A151946&fq=boosted%3A1&fq=%7B!cost%3D200%7Dsearch_category%3A0&fq=created%3A%5BNOW%2FDAY-14DAYS+TO+*%5D
>> >> > > > > >
>> >> > > > > > On Wed, 6 Nov 2024 at 17:00, Dominic Humphries <
>> >> domi...@adzuna.com
>> >> > >
>> >> > > > > wrote:
>> >> > > > > >
>> >> > > > > > > I spoke too soon, I figured out how to get VisualVM
>> talking to
>> >> > > solr.
>> >> > > > > Now
>> >> > > > > > > I'm just not sure what to do with it - what sorts of things
>> >> am I
>> >> > > > > looking
>> >> > > > > > > for?
>> >> > > > > > >
>> >> > > > > > > On Wed, 6 Nov 2024 at 16:40, Dominic Humphries <
>> >> > domi...@adzuna.com
>> >> > > >
>> >> > > > > > wrote:
>> >> > > > > > >
>> >> > > > > > >> Unfortunately I don't know Java anywhere near well enough
>> to
>> >> > know
>> >> > > my
>> >> > > > > way
>> >> > > > > > >> around a profiler or jstack. I've confirmed JMX is enabled
>> >> and I
>> >> > > can
>> >> > > > > > telnet
>> >> > > > > > >> to the port, but VisualVM fails to connect and gives me no
>> >> > reason
>> >> > > as
>> >> > > > > to
>> >> > > > > > >> why.
>> >> > > > > > >>
>> >> > > > > > >> I can post the query and result if that's useful - it
>> doesn't
>> >> > > return
>> >> > > > > any
>> >> > > > > > >> records so there's nothing to censor
>> >> > > > > > >>
>> >> > > > > > >> On Wed, 6 Nov 2024 at 15:36, Gus Heck <gus.h...@gmail.com
>> >
>> >> > wrote:
>> >> > > > > > >>
>> >> > > > > > >>> If you have access to a test instance where the problem
>> can
>> >> be
>> >> > > > > > >>> reproduced,
>> >> > > > > > >>> attaching a profiler would be one way. Another cruder
>> >> method is
>> >> > > to
>> >> > > > > use
>> >> > > > > > >>> jstack to dump all the threads.
>> >> > > > > > >>>
>> >> > > > > > >>> Another way to tackle this is to help us reproduce your
>> >> > problem.
>> >> > > > Can
>> >> > > > > > you
>> >> > > > > > >>> share details about your query? Obviously, please don't
>> post
>> >> > > > anything
>> >> > > > > > >>> your
>> >> > > > > > >>> company wouldn't want public, but if you can share some
>> >> details
>> >> > > > that
>> >> > > > > > >>> would
>> >> > > > > > >>> be a start.
>> >> > > > > > >>>
>> >> > > > > > >>> The ideal thing would be to provide a minimum working
>> >> example
>> >> > of
>> >> > > > the
>> >> > > > > > >>> problem you are experiencing.
>> >> > > > > > >>>
>> >> > > > > > >>> On Wed, Nov 6, 2024 at 9:55 AM Dominic Humphries
>> >> > > > > > >>> <domi...@adzuna.com.invalid>
>> >> > > > > > >>> wrote:
>> >> > > > > > >>>
>> >> > > > > > >>> > I've tried both timeAllowed and cpuAllowed and neither
>> are
>> >> > > > > > restricting
>> >> > > > > > >>> the
>> >> > > > > > >>> > amount of time the queries take to run. I have a test
>> >> query
>> >> > > > that's
>> >> > > > > > >>> reliably
>> >> > > > > > >>> > taking 20-30 seconds, if there's any useful debug
>> params
>> >> or
>> >> > > such
>> >> > > > I
>> >> > > > > > can
>> >> > > > > > >>> run
>> >> > > > > > >>> > to provide the information you want I'm happy to run
>> them
>> >> -
>> >> > I'm
>> >> > > > not
>> >> > > > > > >>> sure
>> >> > > > > > >>> > how to usefully interrogate solr for where its time is
>> >> being
>> >> > > > spent,
>> >> > > > > > >>> sorry
>> >> > > > > > >>> >
>> >> > > > > > >>> > Thanks
>> >> > > > > > >>> >
>> >> > > > > > >>> > On Wed, 6 Nov 2024 at 14:25, Gus Heck <
>> gus.h...@gmail.com
>> >> >
>> >> > > > wrote:
>> >> > > > > > >>> >
>> >> > > > > > >>> > > There are unit tests that seem to suggest that
>> >> timeAllowed
>> >> > > > still
>> >> > > > > > >>> works,
>> >> > > > > > >>> > can
>> >> > > > > > >>> > > you provide some more information about your use
>> case?
>> >> > > > > Particularly
>> >> > > > > > >>> > > important is any information about where (what code)
>> >> your
>> >> > > > queries
>> >> > > > > > are
>> >> > > > > > >>> > > spending a lot of time in if you have it.
>> >> > > > > > >>> > >
>> >> > > > > > >>> > > On Wed, Nov 6, 2024 at 6:18 AM Dominic Humphries
>> >> > > > > > >>> > > <domi...@adzuna.com.invalid>
>> >> > > > > > >>> > > wrote:
>> >> > > > > > >>> > >
>> >> > > > > > >>> > > > Hi folks,
>> >> > > > > > >>> > > >
>> >> > > > > > >>> > > > we're testing Solr 9.7 to upgrade our existing 8.11
>> >> > stack.
>> >> > > > > We're
>> >> > > > > > >>> > seeing a
>> >> > > > > > >>> > > > problem with long requests: we send
>> `timeAllowed=4900`
>> >> > > which
>> >> > > > > > works
>> >> > > > > > >>> fine
>> >> > > > > > >>> > > on
>> >> > > > > > >>> > > > the existing 8.11 and keeps requests to just a few
>> >> > seconds.
>> >> > > > > > >>> > > >
>> >> > > > > > >>> > > > With 9.7, however, the flag is basically ignored -
>> >> > requests
>> >> > > > can
>> >> > > > > > >>> take
>> >> > > > > > >>> > over
>> >> > > > > > >>> > > > 30 seconds whether the flag is present or not,
>> which
>> >> is
>> >> > > > causing
>> >> > > > > > >>> higher
>> >> > > > > > >>> > > CPU
>> >> > > > > > >>> > > > load and slowing response times.
>> >> > > > > > >>> > > >
>> >> > > > > > >>> > > > I've tried setting the flag suggested in
>> >> > > > > > >>> > > >
>> >> > > > > > >>> > > >
>> >> > > > > > >>> > >
>> >> > > > > > >>> >
>> >> > > > > > >>>
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-9.html#use-of-timeallowed
>> >> > > > > > >>> > > > - but even with solr.useExitableDirectoryReader
>> set we
>> >> > > still
>> >> > > > > > don't
>> >> > > > > > >>> get
>> >> > > > > > >>> > > the
>> >> > > > > > >>> > > > desired behaviour.
>> >> > > > > > >>> > > >
>> >> > > > > > >>> > > > Is there anything else I can try to get the old
>> >> behaviour
>> >> > > > back?
>> >> > > > > > >>> > > >
>> >> > > > > > >>> > > > Thanks
>> >> > > > > > >>> > > >
>> >> > > > > > >>> > >
>> >> > > > > > >>> > >
>> >> > > > > > >>> > > --
>> >> > > > > > >>> > > http://www.needhamsoftware.com (work)
>> >> > > > > > >>> > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
>> >> > > > > > >>> > >
>> >> > > > > > >>> >
>> >> > > > > > >>>
>> >> > > > > > >>>
>> >> > > > > > >>> --
>> >> > > > > > >>> http://www.needhamsoftware.com (work)
>> >> > > > > > >>> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>> >> > > > > > >>>
>> >> > > > > > >>
>> >> > > > > >
>> >> > > > >
>> >> > > > >
>> >> > > > > --
>> >> > > > > http://www.needhamsoftware.com (work)
>> >> > > > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
>> >> > > > >
>> >> > > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > http://www.needhamsoftware.com (work)
>> >> > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
>> >> > >
>> >> >
>> >>
>> >>
>> >> --
>> >> http://www.needhamsoftware.com (work)
>> >> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>> >>
>> >
>>
>> --
>> http://www.needhamsoftware.com (work)
>> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>>
>

Reply via email to