There are other clever ways to do it too, using the within parameter, and other things I don’t remember off the top of my head but I gave a presentation a few years ago that utilized it. It uses more raw solr parameters that you can take in a phrase but tokenize them and find out documents that have that phrase but may have words inside them, so you restrict the results to only documents that have all the words in the phrase but within that number of words plus 2 or 3 to take care of stop words that may show up, like “red house hill” would still find “red house on top of the hill” within a proximity to each other of about 7.
> On Sep 7, 2025, at 7:15 PM, Gus Heck <gus.h...@gmail.com> wrote: > > Or > https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#surround-query-parser > >> On Sun, Sep 7, 2025 at 4:32 PM Mikhail Khludnev <m...@apache.org> wrote: >> >> Hi >> I might be missing a point. But the way to create spans in Solr are: >> >> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#xml-query-parser >> >> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#complex-phrase-query-parser >> >> >>> On Fri, Sep 5, 2025 at 6:32 PM mtn search <search...@gmail.com> wrote: >>> >>> I may have found what I am running up against - if Chatgpt is correct >>> on diagnosis? >>> >>> *My sample query* >>> /select?debug=true&indent=true&q={!lucene}spanNear( >>> spanNear(spanTerm(body:separate),spanTerm(body:email),0,true), >>> spanNear(spanTerm(body:will),spanTerm(body:be),0,true), >>> 10,false) >>> >>> *Text from body field from a message where the messages is returned from >>> the spanNear query above (I believe incorrectly)* >>> "separate device there will not be any load on the email servers" >>> >>> *Same text through analyzer* >>> text >>> raw_bytes >>> start >>> end >>> >>> >>> separate >>> [73 65 70 61 72 61 74 65] >>> 5 >>> 13 >>> >>> device >>> [64 65 76 69 63 65] >>> 14 >>> 20 >>> >>> there >>> [74 68 65 72 65] >>> 21 >>> 26 >>> >>> will >>> [77 69 6c 6c] >>> 27 >>> 31 >>> >>> not >>> [6e 6f 74] >>> 32 >>> 35 >>> >>> be >>> [62 65] >>> 36 >>> 38 >>> >>> any >>> [61 6e 79] >>> 39 >>> 42 >>> >>> load >>> [6c 6f 61 64] >>> 43 >>> 47 >>> >>> on >>> [6f 6e] >>> 48 >>> 50 >>> >>> the >>> [74 68 65] >>> 51 >>> 54 >>> >>> email >>> [65 6d 61 69 6c] >>> 55 >>> 60 >>> >>> server >>> [73 65 72 76 65 72] >>> 61 >>> 68 >>> >>> >>> >>> >>> >>> *Chatgpt assessment* >>> >>> Now, let’s check the spans: >>> >>> - >>> >>> Inner spanNear(separate, email, 0, true) is *not* going to match >>> directly, because email isn’t right after separate. >>> - >>> >>> But Lucene is allowed to *reposition* the spans when used as children >> of >>> the outer spanNear. Each child span doesn’t need to be contiguous >> unless >>> it resolves to a valid match somewhere in the text. >>> >>> *Conclusion: *This last line may explain why the message above was >> returned >>> by the query above, but appears to be incorrect. While the words/tokens >> in >>> the query are in the message they do not honor the proximity specified. >>> But apparently children spans do not have to honor the proximity rules >>> specified. AI suggested this query for proximity, I am now concluding it >>> is not a valid approach. >>> >>> I am not seeing a Solr/Lucene http query approach for a proximity search >>> between phrases, other than possibly to use the Lucene Java API for more >>> control. >>> >>> If others have found a workable solution, please let me know. >>> >>> Thanks, >>> Matt >>> >>> >>> >>> >>> >>>> On Thu, Sep 4, 2025 at 3:26 PM mtn search <search...@gmail.com> wrote: >>> >>>> Also, I am using the SolrAdmin Analysis UI to verify how Solr is >>>> tokenizing the messages and verifying manually position between tokens. >>>> >>>> Debug view of the query side: >>>> For query: >>>> "*params*":{ >>>> "q":"{!lucene}SpanNearQuery(body,(money question),5,true)", >>>> "df":"body", >>>> "debug":"true", >>>> "indent":"true", >>>> "q.op":"OR", >>>> "wt":"json"}}, >>>> >>>> It seems odd that in the parsed query that the "body" field named is >>>> pre-appended to the value 5 and the text true. >>>> "*debug*":{ >>>> "rawquerystring":"{!lucene}SpanNearQuery(body,(money >>>> question),5,true)", >>>> "querystring":"{!lucene}SpanNearQuery(body,(money >> question),5,true)", >>>> "parsedquery":"body:spannearquery (body:body (body:money >>>> body:question) (body:5 body:true))", >>>> "*parsedquery_toString*":*"body:spannearquery *(body:body >> (body:money >>>> body:question)* (body:5 body:true*))", >>>> "explain":{ >>>> >>>> On Thu, Sep 4, 2025 at 12:04 PM mtn search <search...@gmail.com> >> wrote: >>>> >>>>> Thanks Tim! Yes I have tried a variety of values and am aware >>>>> of ordering vs non ordering. I am getting more results than expected >>> and >>>>> some that do not match the proximity criteria. So when I set it to a >>>>> small value like 2, I was seeking to see the result count drop >>>>> significantly as many would not match criteria. Unfortunately, the >>> count >>>>> does not drop. Looks like a fundamental problem with how I am using >>> the >>>>> syntax. Still researching, and open to suggestions. >>>>> >>>>> Matt >>>>> >>>>> On Thu, Sep 4, 2025 at 11:54 AM Tim Casey <tca...@gmail.com> wrote: >>>>> >>>>>> usually the span and proximities are off-by-one issues. Specifically >>> the >>>>>> order of the tokens will change the distance calculation. I do not >>> have >>>>>> an >>>>>> example off the top of my head. But, when I was doing this, I >> usually >>>>>> started with a larger span and brought it down through looking at >>>>>> results. >>>>>> >>>>>> This is the case for the old 5~"phrase words" syntax. >>>>>> >>>>>> As an aside, "Not working" is taken by me to mean you are not getting >>>>>> results but the query passes parse. Not working could mean a lot >> more >>> in >>>>>> this context. So I am suggesting, instead of 2, try 10. >>>>>> >>>>>> On Thu, Sep 4, 2025 at 10:43 AM mtn search <search...@gmail.com> >>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Looking for guidance on approaches to implement a proximity search >>>>>> between >>>>>>> phrases. >>>>>>> >>>>>>> Initially tried: >>>>>>> >>>>>>> >>>>>> >>> >> "q":"{!lucene}spanNear(spanNear(spanNear(spanTerm(body:off),spanTerm(body:the),0,true), >>>>>>> spanTerm(body: record),0,true), >>>>>> spanNear(spanTerm(body:new),spanTerm(body: >>>>>>> information),0,true) , 2N,false)", >>>>>>> "defType":"lucene", >>>>>>> "df":"body", >>>>>>> >>>>>>> However then simplified to just two terms: >>>>>>> >>> "q":"{!lucene}spanNear(spanTerm(body:off),spanTerm(body:call),2,true)", >>>>>>> "defType":"lucene", >>>>>>> "df":"body", >>>>>>> >>>>>>> Both are not working. Any tips? Currently on Solr 9.4, but will >>>>>> likely >>>>>>> need to run for some time on a Solr 6 instance. >>>>>>> >>>>>>> Thanks, >>>>>>> Matt >>>>>>> >>>>>> >>>>> >>> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> > > > -- > http://www.needhamsoftware.com (work) > https://a.co/d/b2sZLD9 (my fantasy fiction book)