Thanks for the feedback! Mikhail - I did not see the complex query parser supporting proximity between 2 phrases, however the XmlQParser might via spans. Thanks for the tip!
Gus - we currently use the Surround query parser for proximity between two terms. Do you know of a means to use it for proximity between phrases? This would be ideal as we have a search client tool already using this syntax. Dave - This type of approach might work for us (possibly like the complex query parser) where it is not exactly finding proximity between two phrases. But verifying that all the worlds within two phrases are within a proximity range. As you say this could handle stop words that may still be in the index from not blocking a match. Matt On Mon, Sep 8, 2025 at 7:29 AM Dave <hastings.recurs...@gmail.com> wrote: > There are other clever ways to do it too, using the within parameter, and > other things I don’t remember off the top of my head but I gave a > presentation a few years ago that utilized it. It uses more raw solr > parameters that you can take in a phrase but tokenize them and find out > documents that have that phrase but may have words inside them, so you > restrict the results to only documents that have all the words in the > phrase but within that number of words plus 2 or 3 to take care of stop > words that may show up, like “red house hill” would still find “red house > on top of the hill” within a proximity to each other of about 7. > > > On Sep 7, 2025, at 7:15 PM, Gus Heck <gus.h...@gmail.com> wrote: > > > > Or > > > https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#surround-query-parser > > > >> On Sun, Sep 7, 2025 at 4:32 PM Mikhail Khludnev <m...@apache.org> > wrote: > >> > >> Hi > >> I might be missing a point. But the way to create spans in Solr are: > >> > >> > https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#xml-query-parser > >> > >> > https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#complex-phrase-query-parser > >> > >> > >>> On Fri, Sep 5, 2025 at 6:32 PM mtn search <search...@gmail.com> wrote: > >>> > >>> I may have found what I am running up against - if Chatgpt is correct > >>> on diagnosis? > >>> > >>> *My sample query* > >>> /select?debug=true&indent=true&q={!lucene}spanNear( > >>> spanNear(spanTerm(body:separate),spanTerm(body:email),0,true), > >>> spanNear(spanTerm(body:will),spanTerm(body:be),0,true), > >>> 10,false) > >>> > >>> *Text from body field from a message where the messages is returned > from > >>> the spanNear query above (I believe incorrectly)* > >>> "separate device there will not be any load on the email servers" > >>> > >>> *Same text through analyzer* > >>> text > >>> raw_bytes > >>> start > >>> end > >>> > >>> > >>> separate > >>> [73 65 70 61 72 61 74 65] > >>> 5 > >>> 13 > >>> > >>> device > >>> [64 65 76 69 63 65] > >>> 14 > >>> 20 > >>> > >>> there > >>> [74 68 65 72 65] > >>> 21 > >>> 26 > >>> > >>> will > >>> [77 69 6c 6c] > >>> 27 > >>> 31 > >>> > >>> not > >>> [6e 6f 74] > >>> 32 > >>> 35 > >>> > >>> be > >>> [62 65] > >>> 36 > >>> 38 > >>> > >>> any > >>> [61 6e 79] > >>> 39 > >>> 42 > >>> > >>> load > >>> [6c 6f 61 64] > >>> 43 > >>> 47 > >>> > >>> on > >>> [6f 6e] > >>> 48 > >>> 50 > >>> > >>> the > >>> [74 68 65] > >>> 51 > >>> 54 > >>> > >>> email > >>> [65 6d 61 69 6c] > >>> 55 > >>> 60 > >>> > >>> server > >>> [73 65 72 76 65 72] > >>> 61 > >>> 68 > >>> > >>> > >>> > >>> > >>> > >>> *Chatgpt assessment* > >>> > >>> Now, let’s check the spans: > >>> > >>> - > >>> > >>> Inner spanNear(separate, email, 0, true) is *not* going to match > >>> directly, because email isn’t right after separate. > >>> - > >>> > >>> But Lucene is allowed to *reposition* the spans when used as children > >> of > >>> the outer spanNear. Each child span doesn’t need to be contiguous > >> unless > >>> it resolves to a valid match somewhere in the text. > >>> > >>> *Conclusion: *This last line may explain why the message above was > >> returned > >>> by the query above, but appears to be incorrect. While the > words/tokens > >> in > >>> the query are in the message they do not honor the proximity specified. > >>> But apparently children spans do not have to honor the proximity rules > >>> specified. AI suggested this query for proximity, I am now concluding > it > >>> is not a valid approach. > >>> > >>> I am not seeing a Solr/Lucene http query approach for a proximity > search > >>> between phrases, other than possibly to use the Lucene Java API for > more > >>> control. > >>> > >>> If others have found a workable solution, please let me know. > >>> > >>> Thanks, > >>> Matt > >>> > >>> > >>> > >>> > >>> > >>>> On Thu, Sep 4, 2025 at 3:26 PM mtn search <search...@gmail.com> > wrote: > >>> > >>>> Also, I am using the SolrAdmin Analysis UI to verify how Solr is > >>>> tokenizing the messages and verifying manually position between > tokens. > >>>> > >>>> Debug view of the query side: > >>>> For query: > >>>> "*params*":{ > >>>> "q":"{!lucene}SpanNearQuery(body,(money question),5,true)", > >>>> "df":"body", > >>>> "debug":"true", > >>>> "indent":"true", > >>>> "q.op":"OR", > >>>> "wt":"json"}}, > >>>> > >>>> It seems odd that in the parsed query that the "body" field named is > >>>> pre-appended to the value 5 and the text true. > >>>> "*debug*":{ > >>>> "rawquerystring":"{!lucene}SpanNearQuery(body,(money > >>>> question),5,true)", > >>>> "querystring":"{!lucene}SpanNearQuery(body,(money > >> question),5,true)", > >>>> "parsedquery":"body:spannearquery (body:body (body:money > >>>> body:question) (body:5 body:true))", > >>>> "*parsedquery_toString*":*"body:spannearquery *(body:body > >> (body:money > >>>> body:question)* (body:5 body:true*))", > >>>> "explain":{ > >>>> > >>>> On Thu, Sep 4, 2025 at 12:04 PM mtn search <search...@gmail.com> > >> wrote: > >>>> > >>>>> Thanks Tim! Yes I have tried a variety of values and am aware > >>>>> of ordering vs non ordering. I am getting more results than expected > >>> and > >>>>> some that do not match the proximity criteria. So when I set it to > a > >>>>> small value like 2, I was seeking to see the result count drop > >>>>> significantly as many would not match criteria. Unfortunately, the > >>> count > >>>>> does not drop. Looks like a fundamental problem with how I am using > >>> the > >>>>> syntax. Still researching, and open to suggestions. > >>>>> > >>>>> Matt > >>>>> > >>>>> On Thu, Sep 4, 2025 at 11:54 AM Tim Casey <tca...@gmail.com> wrote: > >>>>> > >>>>>> usually the span and proximities are off-by-one issues. > Specifically > >>> the > >>>>>> order of the tokens will change the distance calculation. I do not > >>> have > >>>>>> an > >>>>>> example off the top of my head. But, when I was doing this, I > >> usually > >>>>>> started with a larger span and brought it down through looking at > >>>>>> results. > >>>>>> > >>>>>> This is the case for the old 5~"phrase words" syntax. > >>>>>> > >>>>>> As an aside, "Not working" is taken by me to mean you are not > getting > >>>>>> results but the query passes parse. Not working could mean a lot > >> more > >>> in > >>>>>> this context. So I am suggesting, instead of 2, try 10. > >>>>>> > >>>>>> On Thu, Sep 4, 2025 at 10:43 AM mtn search <search...@gmail.com> > >>> wrote: > >>>>>> > >>>>>>> Hello, > >>>>>>> > >>>>>>> Looking for guidance on approaches to implement a proximity search > >>>>>> between > >>>>>>> phrases. > >>>>>>> > >>>>>>> Initially tried: > >>>>>>> > >>>>>>> > >>>>>> > >>> > >> > "q":"{!lucene}spanNear(spanNear(spanNear(spanTerm(body:off),spanTerm(body:the),0,true), > >>>>>>> spanTerm(body: record),0,true), > >>>>>> spanNear(spanTerm(body:new),spanTerm(body: > >>>>>>> information),0,true) , 2N,false)", > >>>>>>> "defType":"lucene", > >>>>>>> "df":"body", > >>>>>>> > >>>>>>> However then simplified to just two terms: > >>>>>>> > >>> "q":"{!lucene}spanNear(spanTerm(body:off),spanTerm(body:call),2,true)", > >>>>>>> "defType":"lucene", > >>>>>>> "df":"body", > >>>>>>> > >>>>>>> Both are not working. Any tips? Currently on Solr 9.4, but will > >>>>>> likely > >>>>>>> need to run for some time on a Solr 6 instance. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Matt > >>>>>>> > >>>>>> > >>>>> > >>> > >> > >> > >> -- > >> Sincerely yours > >> Mikhail Khludnev > >> > > > > > > -- > > http://www.needhamsoftware.com (work) > > https://a.co/d/b2sZLD9 (my fantasy fiction book) >