In fact, the terms in the NOT seem to result in wildly different response times for no apparent reason:
AND+NOT+(internship+OR+intern+OR+graduate) 23.966s AND+NOT+(internship+OR+intern+OR+welder) 3.368s AND+NOT+(internship+OR+welder+OR+graduate) 23.958s AND+NOT+(welder+OR+intern+OR+graduate) 24.465s AND+NOT+(internship+OR+intern+OR+graduate+OR+welder) 2.062s AND+NOT+(internship+OR+intern+OR+welder+OR+graduate)' 1.722s AND+NOT+(internship+OR+graduate+OR+intern) 25.353s AND+NOT+(internship+OR+graduate+OR+welder) 24.473s On Mon, 11 Nov 2024 at 14:03, Dominic Humphries <domi...@adzuna.com> wrote: > Hm. > > An update: Making the exact same request, but adding a single term (in > this case, "welder") to the "and not" results in a massively smaller debug > output, and the search is done in under half a second. I can't understand > why such a massive difference from *adding* a term? > > "debug":{ > "rawquerystring":"(carroll_county OR Aldi OR Cashier OR Kohls) AND NOT > (internship OR intern OR graduate OR welder)", > "querystring":"(carroll_county OR Aldi OR Cashier OR Kohls) AND NOT > (internship OR intern OR graduate OR welder)", > > "parsedquery":"FunctionScoreQuery(FunctionScoreQuery(+(+(((+(company:carrollcounti > company:\"carrol counti\")) | keywords:carroll_counti | > (+(description:carrollcounti description:\"carrol counti\"))^2.0 | > (+(title:carrollcounti title:\"carrol counti\"))^5.0)~0.01 ((ti > tle:aldi)^5.0 | (description:aldi)^2.0 | keywords:Aldi | > company:aldi)~0.01 ((+(company:walmart company:target company:costco > company:\"best buy\" company:\"home depot\" company:\"whole food\" > company:\"shop assist\" company:\"picker packer\" company:checker > company:\"reta > il sale assist\" company:\"store associ\" company:cashier)) | > (+(description:walmart description:target description:costco > description:\"best buy\" description:\"home depot\" description:\"whole > food\" description:\"shop assist\" description:\"picker packer\" > description:c > hecker description:\"retail sale assist\" description:\"store associ\" > description:cashier))^2.0 | (+(title:walmart title:target title:costco > title:\"best buy\" title:\"home depot\" title:\"whole food\" title:\"shop > assist\" title:\"picker packer\" title:checker title:\"re > tail sale assist\" title:\"store associ\" title:cashier))^5.0 | > keywords:Cashier)~0.01 ((+(title:\"tj maxx\" title:maci title:h&m > title:nike title:kohl))^5.0 | (+(description:\"tj maxx\" description:maci > description:h&m description:nike description:kohl))^2.0 | keywords:Ko > hl | (+(company:\"tj maxx\" company:maci company:h&m company:nike > company:kohl)))~0.01) -((keywords:internship | Synonym(company:apprentic > company:apprenticeship company:intern company:internship) | > (Synonym(description:apprentic description:apprenticeship description:inte > rn description:internship))^2.0 | (Synonym(title:apprentic > title:apprenticeship title:intern title:internship))^5.0)~0.01 > (keywords:intern | Synonym(company:apprentic company:apprenticeship > company:intern company:internship) | (Synonym(description:apprentic > description:app > renticeship description:intern description:internship))^2.0 | > (Synonym(title:apprentic title:apprenticeship title:intern > title:internship))^5.0)~0.01 ((Synonym(description:grad > description:graduat))^2.0 | Synonym(company:grad company:graduat) | > (Synonym(title:grad title:gr > aduat))^5.0 | keywords:graduat)~0.01 ((+(title:\"spot welder\" > title:metalwork title:metallurgist title:welder))^5.0 | > (+(description:\"spot welder\" description:metalwork > description:metallurgist description:welder))^2.0 | keywords:welder | > (+(company:\"spot welder\" comp > any:metalwork company:metallurgist company:welder)))~0.01)) () > (+(+(reply_on_adzuna:T)^0.5)), scored by boost(float(boost_factor))))", > > "parsedquery_toString":"FunctionScoreQuery(+(+(((+(company:carrollcounti > company:\"carrol counti\")) | keywords:carroll_counti | > (+(description:carrollcounti description:\"carrol counti\"))^2.0 | > (+(title:carrollcounti title:\"carrol counti\"))^5.0)~0.01 ((title:aldi)^ > 5.0 | (description:aldi)^2.0 | keywords:Aldi | company:aldi)~0.01 > ((+(company:walmart company:target company:costco company:\"best buy\" > company:\"home depot\" company:\"whole food\" company:\"shop assist\" > company:\"picker packer\" company:checker company:\"retail sale as > sist\" company:\"store associ\" company:cashier)) | (+(description:walmart > description:target description:costco description:\"best buy\" > description:\"home depot\" description:\"whole food\" description:\"shop > assist\" description:\"picker packer\" description:checker des > cription:\"retail sale assist\" description:\"store associ\" > description:cashier))^2.0 | (+(title:walmart title:target title:costco > title:\"best buy\" title:\"home depot\" title:\"whole food\" title:\"shop > assist\" title:\"picker packer\" title:checker title:\"retail sale > assist\" title:\"store associ\" title:cashier))^5.0 | > keywords:Cashier)~0.01 ((+(title:\"tj maxx\" title:maci title:h&m > title:nike title:kohl))^5.0 | (+(description:\"tj maxx\" description:maci > description:h&m description:nike description:kohl))^2.0 | keywords:Kohl | > (+(co > mpany:\"tj maxx\" company:maci company:h&m company:nike > company:kohl)))~0.01) -((keywords:internship | Synonym(company:apprentic > company:apprenticeship company:intern company:internship) | > (Synonym(description:apprentic description:apprenticeship > description:intern descrip > tion:internship))^2.0 | (Synonym(title:apprentic title:apprenticeship > title:intern title:internship))^5.0)~0.01 (keywords:intern | > Synonym(company:apprentic company:apprenticeship company:intern > company:internship) | (Synonym(description:apprentic > description:apprenticeshi > p description:intern description:internship))^2.0 | > (Synonym(title:apprentic title:apprenticeship title:intern > title:internship))^5.0)~0.01 ((Synonym(description:grad > description:graduat))^2.0 | Synonym(company:grad company:graduat) | > (Synonym(title:grad title:graduat))^5. > 0 | keywords:graduat)~0.01 ((+(title:\"spot welder\" title:metalwork > title:metallurgist title:welder))^5.0 | (+(description:\"spot welder\" > description:metalwork description:metallurgist description:welder))^2.0 | > keywords:welder | (+(company:\"spot welder\" company:metalw > ork company:metallurgist company:welder)))~0.01)) () > (+(+(reply_on_adzuna:T)^0.5)), scored by boost(float(boost_factor)))", > "explain":{ }, > > On Fri, 8 Nov 2024 at 14:41, Dominic Humphries <domi...@adzuna.com> wrote: > >> It also apparently doesn't allow emails big enough for the debug output. >> Here's a link to a Google Doc with the output in: >> https://docs.google.com/document/d/1TUPE4Qkc-zjKGCJnn0_YVMOfaLgzlF2YCFF9sz4LNcQ/edit?usp=sharing >> >> I hope that works well enough, if not we'll have to work out some other >> option.. >> >> On Fri, 8 Nov 2024 at 14:01, Gus Heck <gus.h...@gmail.com> wrote: >> >>> The mailing list usually strips out attachments. You'll need to paste it >>> into the body of the email. >>> >>> On Fri, Nov 8, 2024 at 7:16 AM Dominic Humphries >>> <domi...@adzuna.com.invalid> >>> wrote: >>> >>> > Fair enough! See attached, if that doesn't work I'll send it inline... >>> > >>> > On Thu, 7 Nov 2024 at 18:40, Gus Heck <gus.h...@gmail.com> wrote: >>> > >>> >> Yes, seeing the final expanded query may shed light on where the time >>> is >>> >> going, so voluminous output is good. Feel free to anonymize any >>> customer >>> >> names or sensitive information with "<REDACTED>" or similar. >>> >> >>> >> On Thu, Nov 7, 2024 at 12:21 PM Dominic Humphries >>> >> <domi...@adzuna.com.invalid> wrote: >>> >> >>> >> > Yes, sorry, not cloud, afaik it's single-sharded. >>> >> > >>> >> > Same query with facet fields removed takes just as long to run. >>> Adding >>> >> the >>> >> > debug to the request generates a rather large amount of output, I >>> >> believe >>> >> > due to synonyms - I can send them if it's useful, but it's rather a >>> lot? >>> >> > >>> >> > On Thu, 7 Nov 2024 at 15:37, Gus Heck <gus.h...@gmail.com> wrote: >>> >> > >>> >> > > Ok so that's 7M docs at 3k/doc... a relatively reasonable index >>> (at >>> >> > least >>> >> > > if the hardware is reasonable, and you say it did work on 8.11 so >>> >> that's >>> >> > > probably fine). >>> >> > > >>> >> > > By your reply I assume it's single sharded and not using >>> >> cloud/zookeeper? >>> >> > > >>> >> > > The request you showed has a lot of facets on it. How much >>> difference >>> >> > does >>> >> > > it make to the situation if you just send the query without the >>> >> facets? >>> >> > > >>> >> > > Also add &debug=query and send us the debug output from the header >>> >> when >>> >> > you >>> >> > > do that... >>> >> > > >>> >> > > >>> >> > > >>> >> > > On Thu, Nov 7, 2024 at 9:31 AM Dominic Humphries >>> >> > > <domi...@adzuna.com.invalid> >>> >> > > wrote: >>> >> > > >>> >> > > > Sure: >>> >> > > > "index":{ >>> >> > > > "numDocs":7349353, >>> >> > > > "maxDoc":7834951, >>> >> > > > "deletedDocs":485598, >>> >> > > > "segmentCount":31, >>> >> > > > "segmentsFileSizeInBytes":2727, >>> >> > > > "sizeInBytes":22066572844, >>> >> > > > "size":"20.55 GB" >>> >> > > > >>> >> > > > On Thu, 7 Nov 2024 at 13:27, Gus Heck <gus.h...@gmail.com> >>> wrote: >>> >> > > > >>> >> > > > > This is interesting, can you give us a feel for the >>> >> size/structure of >>> >> > > the >>> >> > > > > index (# of documents, size of index, # of shards)? >>> >> > > > > >>> >> > > > > On Thu, Nov 7, 2024 at 7:52 AM Dominic Humphries >>> >> > > > > <domi...@adzuna.com.invalid> >>> >> > > > > wrote: >>> >> > > > > >>> >> > > > > > An update, I found the part of the query that's making >>> >> everything >>> >> > so >>> >> > > > > slow: >>> >> > > > > > the q param >>> >> > > > > > >>> >> > > > > > When we have >>> >> > > > > > "q":"(carroll_county OR Aldi OR Cashier OR Kohls) AND >>> NOT >>> >> > > > > (internship >>> >> > > > > > OR intern OR graduate)", >>> >> > > > > > the search is very slow, taking 20-something seconds >>> >> > > > > > >>> >> > > > > > When it's just >>> >> > > > > > "q":"(carroll_county OR Aldi OR Cashier OR Kohls)", >>> >> > > > > > the search is blazing fast, coming back in under a second. >>> So it >>> >> > > > appears >>> >> > > > > > it's something triggered by the NOT that's both taking all >>> the >>> >> > time, >>> >> > > > and >>> >> > > > > > not getting caught by the timeAllowed limit >>> >> > > > > > >>> >> > > > > > Full query below: >>> >> > > > > > >>> >> > > > > > >>> >> > > > > >>> >> > > > >>> >> > > >>> >> > >>> >> >>> select?f.contract_type.facet.limit=2&fl=*&f.company_id.facet.mincount=1&qt=edismax&f.contract_time.facet.missing=false&f.location_struct.facet.limit=50&facet.date.end=NOW%2FDAY%2B1DAYS&ps=2&f.description.hl.snippets=2&stats.field=salary_avg_stats&facet.date.gap=%2B1DAY&pf=title&stats=true&_qtags=api_id%3Ab02dbf6d~784741%7CFCGI%3A%3AModel%3A%3AWWW%3A%3AJobsBase%3A%3ASearch%7C2781%7CCHOMkO6R7xGwlQ6bKp_SoQ&qs=5&f.contract_time.facet.limit=2&f.contract_time.facet.mincount=1&f.company_id.facet.missing=false&facet.date=%7B!key%3Dfreshness%7Dcreated&bq=(reply_on_adzuna%3Atrue%5E0.5)&f.contract_type.facet.mincount=1&wt=json&f.location_struct.facet.mincount=1&facet.date.hardend=true&f.category_id.facet.limit=50&timeAllowed=4900&f.contract_type.facet.missing=false&f.category_id.facet.mincount=1&sort=score+desc&q.alt=*%3A*&boost=boost_factor&f.company_id.facet.limit=50&facet.date.start=NOW%2FDAY-7DAYS&facet=false&facet.field=%7B!key%3Dlocation%3Aid%7Dlocation_struct&facet.field=%7B!key%3Dcategory%3Aid%7Dcategory_id&facet.field=contract_type&facet.field=contract_time&facet.field=%7B!key%3Dcompany%3Aid%7Dcompany_id&f.description.hl.fragsize=180&hl=false&rows=20&start=0&q=(carroll_county+OR+Aldi+OR+Cashier+OR+Kohls)+AND+NOT+(internship+OR+intern+OR+graduate)&fq=location_id%3A151946&fq=boosted%3A1&fq=%7B!cost%3D200%7Dsearch_category%3A0&fq=created%3A%5BNOW%2FDAY-14DAYS+TO+*%5D >>> >> > > > > > >>> >> > > > > > On Wed, 6 Nov 2024 at 17:00, Dominic Humphries < >>> >> domi...@adzuna.com >>> >> > > >>> >> > > > > wrote: >>> >> > > > > > >>> >> > > > > > > I spoke too soon, I figured out how to get VisualVM >>> talking to >>> >> > > solr. >>> >> > > > > Now >>> >> > > > > > > I'm just not sure what to do with it - what sorts of >>> things >>> >> am I >>> >> > > > > looking >>> >> > > > > > > for? >>> >> > > > > > > >>> >> > > > > > > On Wed, 6 Nov 2024 at 16:40, Dominic Humphries < >>> >> > domi...@adzuna.com >>> >> > > > >>> >> > > > > > wrote: >>> >> > > > > > > >>> >> > > > > > >> Unfortunately I don't know Java anywhere near well >>> enough to >>> >> > know >>> >> > > my >>> >> > > > > way >>> >> > > > > > >> around a profiler or jstack. I've confirmed JMX is >>> enabled >>> >> and I >>> >> > > can >>> >> > > > > > telnet >>> >> > > > > > >> to the port, but VisualVM fails to connect and gives me >>> no >>> >> > reason >>> >> > > as >>> >> > > > > to >>> >> > > > > > >> why. >>> >> > > > > > >> >>> >> > > > > > >> I can post the query and result if that's useful - it >>> doesn't >>> >> > > return >>> >> > > > > any >>> >> > > > > > >> records so there's nothing to censor >>> >> > > > > > >> >>> >> > > > > > >> On Wed, 6 Nov 2024 at 15:36, Gus Heck < >>> gus.h...@gmail.com> >>> >> > wrote: >>> >> > > > > > >> >>> >> > > > > > >>> If you have access to a test instance where the problem >>> can >>> >> be >>> >> > > > > > >>> reproduced, >>> >> > > > > > >>> attaching a profiler would be one way. Another cruder >>> >> method is >>> >> > > to >>> >> > > > > use >>> >> > > > > > >>> jstack to dump all the threads. >>> >> > > > > > >>> >>> >> > > > > > >>> Another way to tackle this is to help us reproduce your >>> >> > problem. >>> >> > > > Can >>> >> > > > > > you >>> >> > > > > > >>> share details about your query? Obviously, please don't >>> post >>> >> > > > anything >>> >> > > > > > >>> your >>> >> > > > > > >>> company wouldn't want public, but if you can share some >>> >> details >>> >> > > > that >>> >> > > > > > >>> would >>> >> > > > > > >>> be a start. >>> >> > > > > > >>> >>> >> > > > > > >>> The ideal thing would be to provide a minimum working >>> >> example >>> >> > of >>> >> > > > the >>> >> > > > > > >>> problem you are experiencing. >>> >> > > > > > >>> >>> >> > > > > > >>> On Wed, Nov 6, 2024 at 9:55 AM Dominic Humphries >>> >> > > > > > >>> <domi...@adzuna.com.invalid> >>> >> > > > > > >>> wrote: >>> >> > > > > > >>> >>> >> > > > > > >>> > I've tried both timeAllowed and cpuAllowed and >>> neither are >>> >> > > > > > restricting >>> >> > > > > > >>> the >>> >> > > > > > >>> > amount of time the queries take to run. I have a test >>> >> query >>> >> > > > that's >>> >> > > > > > >>> reliably >>> >> > > > > > >>> > taking 20-30 seconds, if there's any useful debug >>> params >>> >> or >>> >> > > such >>> >> > > > I >>> >> > > > > > can >>> >> > > > > > >>> run >>> >> > > > > > >>> > to provide the information you want I'm happy to run >>> them >>> >> - >>> >> > I'm >>> >> > > > not >>> >> > > > > > >>> sure >>> >> > > > > > >>> > how to usefully interrogate solr for where its time is >>> >> being >>> >> > > > spent, >>> >> > > > > > >>> sorry >>> >> > > > > > >>> > >>> >> > > > > > >>> > Thanks >>> >> > > > > > >>> > >>> >> > > > > > >>> > On Wed, 6 Nov 2024 at 14:25, Gus Heck < >>> gus.h...@gmail.com >>> >> > >>> >> > > > wrote: >>> >> > > > > > >>> > >>> >> > > > > > >>> > > There are unit tests that seem to suggest that >>> >> timeAllowed >>> >> > > > still >>> >> > > > > > >>> works, >>> >> > > > > > >>> > can >>> >> > > > > > >>> > > you provide some more information about your use >>> case? >>> >> > > > > Particularly >>> >> > > > > > >>> > > important is any information about where (what code) >>> >> your >>> >> > > > queries >>> >> > > > > > are >>> >> > > > > > >>> > > spending a lot of time in if you have it. >>> >> > > > > > >>> > > >>> >> > > > > > >>> > > On Wed, Nov 6, 2024 at 6:18 AM Dominic Humphries >>> >> > > > > > >>> > > <domi...@adzuna.com.invalid> >>> >> > > > > > >>> > > wrote: >>> >> > > > > > >>> > > >>> >> > > > > > >>> > > > Hi folks, >>> >> > > > > > >>> > > > >>> >> > > > > > >>> > > > we're testing Solr 9.7 to upgrade our existing >>> 8.11 >>> >> > stack. >>> >> > > > > We're >>> >> > > > > > >>> > seeing a >>> >> > > > > > >>> > > > problem with long requests: we send >>> `timeAllowed=4900` >>> >> > > which >>> >> > > > > > works >>> >> > > > > > >>> fine >>> >> > > > > > >>> > > on >>> >> > > > > > >>> > > > the existing 8.11 and keeps requests to just a few >>> >> > seconds. >>> >> > > > > > >>> > > > >>> >> > > > > > >>> > > > With 9.7, however, the flag is basically ignored - >>> >> > requests >>> >> > > > can >>> >> > > > > > >>> take >>> >> > > > > > >>> > over >>> >> > > > > > >>> > > > 30 seconds whether the flag is present or not, >>> which >>> >> is >>> >> > > > causing >>> >> > > > > > >>> higher >>> >> > > > > > >>> > > CPU >>> >> > > > > > >>> > > > load and slowing response times. >>> >> > > > > > >>> > > > >>> >> > > > > > >>> > > > I've tried setting the flag suggested in >>> >> > > > > > >>> > > > >>> >> > > > > > >>> > > > >>> >> > > > > > >>> > > >>> >> > > > > > >>> > >>> >> > > > > > >>> >>> >> > > > > > >>> >> > > > > >>> >> > > > >>> >> > > >>> >> > >>> >> >>> https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-9.html#use-of-timeallowed >>> >> > > > > > >>> > > > - but even with solr.useExitableDirectoryReader >>> set we >>> >> > > still >>> >> > > > > > don't >>> >> > > > > > >>> get >>> >> > > > > > >>> > > the >>> >> > > > > > >>> > > > desired behaviour. >>> >> > > > > > >>> > > > >>> >> > > > > > >>> > > > Is there anything else I can try to get the old >>> >> behaviour >>> >> > > > back? >>> >> > > > > > >>> > > > >>> >> > > > > > >>> > > > Thanks >>> >> > > > > > >>> > > > >>> >> > > > > > >>> > > >>> >> > > > > > >>> > > >>> >> > > > > > >>> > > -- >>> >> > > > > > >>> > > http://www.needhamsoftware.com (work) >>> >> > > > > > >>> > > https://a.co/d/b2sZLD9 (my fantasy fiction book) >>> >> > > > > > >>> > > >>> >> > > > > > >>> > >>> >> > > > > > >>> >>> >> > > > > > >>> >>> >> > > > > > >>> -- >>> >> > > > > > >>> http://www.needhamsoftware.com (work) >>> >> > > > > > >>> https://a.co/d/b2sZLD9 (my fantasy fiction book) >>> >> > > > > > >>> >>> >> > > > > > >> >>> >> > > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > -- >>> >> > > > > http://www.needhamsoftware.com (work) >>> >> > > > > https://a.co/d/b2sZLD9 (my fantasy fiction book) >>> >> > > > > >>> >> > > > >>> >> > > >>> >> > > >>> >> > > -- >>> >> > > http://www.needhamsoftware.com (work) >>> >> > > https://a.co/d/b2sZLD9 (my fantasy fiction book) >>> >> > > >>> >> > >>> >> >>> >> >>> >> -- >>> >> http://www.needhamsoftware.com (work) >>> >> https://a.co/d/b2sZLD9 (my fantasy fiction book) >>> >> >>> > >>> >>> -- >>> http://www.needhamsoftware.com (work) >>> https://a.co/d/b2sZLD9 (my fantasy fiction book) >>> >>