Or https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#surround-query-parser
On Sun, Sep 7, 2025 at 4:32 PM Mikhail Khludnev <m...@apache.org> wrote: > Hi > I might be missing a point. But the way to create spans in Solr are: > > https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#xml-query-parser > > https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#complex-phrase-query-parser > > > On Fri, Sep 5, 2025 at 6:32 PM mtn search <search...@gmail.com> wrote: > > > I may have found what I am running up against - if Chatgpt is correct > > on diagnosis? > > > > *My sample query* > > /select?debug=true&indent=true&q={!lucene}spanNear( > > spanNear(spanTerm(body:separate),spanTerm(body:email),0,true), > > spanNear(spanTerm(body:will),spanTerm(body:be),0,true), > > 10,false) > > > > *Text from body field from a message where the messages is returned from > > the spanNear query above (I believe incorrectly)* > > "separate device there will not be any load on the email servers" > > > > *Same text through analyzer* > > text > > raw_bytes > > start > > end > > > > > > separate > > [73 65 70 61 72 61 74 65] > > 5 > > 13 > > > > device > > [64 65 76 69 63 65] > > 14 > > 20 > > > > there > > [74 68 65 72 65] > > 21 > > 26 > > > > will > > [77 69 6c 6c] > > 27 > > 31 > > > > not > > [6e 6f 74] > > 32 > > 35 > > > > be > > [62 65] > > 36 > > 38 > > > > any > > [61 6e 79] > > 39 > > 42 > > > > load > > [6c 6f 61 64] > > 43 > > 47 > > > > on > > [6f 6e] > > 48 > > 50 > > > > the > > [74 68 65] > > 51 > > 54 > > > > email > > [65 6d 61 69 6c] > > 55 > > 60 > > > > server > > [73 65 72 76 65 72] > > 61 > > 68 > > > > > > > > > > > > *Chatgpt assessment* > > > > Now, let’s check the spans: > > > > - > > > > Inner spanNear(separate, email, 0, true) is *not* going to match > > directly, because email isn’t right after separate. > > - > > > > But Lucene is allowed to *reposition* the spans when used as children > of > > the outer spanNear. Each child span doesn’t need to be contiguous > unless > > it resolves to a valid match somewhere in the text. > > > > *Conclusion: *This last line may explain why the message above was > returned > > by the query above, but appears to be incorrect. While the words/tokens > in > > the query are in the message they do not honor the proximity specified. > > But apparently children spans do not have to honor the proximity rules > > specified. AI suggested this query for proximity, I am now concluding it > > is not a valid approach. > > > > I am not seeing a Solr/Lucene http query approach for a proximity search > > between phrases, other than possibly to use the Lucene Java API for more > > control. > > > > If others have found a workable solution, please let me know. > > > > Thanks, > > Matt > > > > > > > > > > > > On Thu, Sep 4, 2025 at 3:26 PM mtn search <search...@gmail.com> wrote: > > > > > Also, I am using the SolrAdmin Analysis UI to verify how Solr is > > > tokenizing the messages and verifying manually position between tokens. > > > > > > Debug view of the query side: > > > For query: > > > "*params*":{ > > > "q":"{!lucene}SpanNearQuery(body,(money question),5,true)", > > > "df":"body", > > > "debug":"true", > > > "indent":"true", > > > "q.op":"OR", > > > "wt":"json"}}, > > > > > > It seems odd that in the parsed query that the "body" field named is > > > pre-appended to the value 5 and the text true. > > > "*debug*":{ > > > "rawquerystring":"{!lucene}SpanNearQuery(body,(money > > > question),5,true)", > > > "querystring":"{!lucene}SpanNearQuery(body,(money > question),5,true)", > > > "parsedquery":"body:spannearquery (body:body (body:money > > > body:question) (body:5 body:true))", > > > "*parsedquery_toString*":*"body:spannearquery *(body:body > (body:money > > > body:question)* (body:5 body:true*))", > > > "explain":{ > > > > > > On Thu, Sep 4, 2025 at 12:04 PM mtn search <search...@gmail.com> > wrote: > > > > > >> Thanks Tim! Yes I have tried a variety of values and am aware > > >> of ordering vs non ordering. I am getting more results than expected > > and > > >> some that do not match the proximity criteria. So when I set it to a > > >> small value like 2, I was seeking to see the result count drop > > >> significantly as many would not match criteria. Unfortunately, the > > count > > >> does not drop. Looks like a fundamental problem with how I am using > > the > > >> syntax. Still researching, and open to suggestions. > > >> > > >> Matt > > >> > > >> On Thu, Sep 4, 2025 at 11:54 AM Tim Casey <tca...@gmail.com> wrote: > > >> > > >>> usually the span and proximities are off-by-one issues. Specifically > > the > > >>> order of the tokens will change the distance calculation. I do not > > have > > >>> an > > >>> example off the top of my head. But, when I was doing this, I > usually > > >>> started with a larger span and brought it down through looking at > > >>> results. > > >>> > > >>> This is the case for the old 5~"phrase words" syntax. > > >>> > > >>> As an aside, "Not working" is taken by me to mean you are not getting > > >>> results but the query passes parse. Not working could mean a lot > more > > in > > >>> this context. So I am suggesting, instead of 2, try 10. > > >>> > > >>> On Thu, Sep 4, 2025 at 10:43 AM mtn search <search...@gmail.com> > > wrote: > > >>> > > >>> > Hello, > > >>> > > > >>> > Looking for guidance on approaches to implement a proximity search > > >>> between > > >>> > phrases. > > >>> > > > >>> > Initially tried: > > >>> > > > >>> > > > >>> > > > "q":"{!lucene}spanNear(spanNear(spanNear(spanTerm(body:off),spanTerm(body:the),0,true), > > >>> > spanTerm(body: record),0,true), > > >>> spanNear(spanTerm(body:new),spanTerm(body: > > >>> > information),0,true) , 2N,false)", > > >>> > "defType":"lucene", > > >>> > "df":"body", > > >>> > > > >>> > However then simplified to just two terms: > > >>> > > > "q":"{!lucene}spanNear(spanTerm(body:off),spanTerm(body:call),2,true)", > > >>> > "defType":"lucene", > > >>> > "df":"body", > > >>> > > > >>> > Both are not working. Any tips? Currently on Solr 9.4, but will > > >>> likely > > >>> > need to run for some time on a Solr 6 instance. > > >>> > > > >>> > Thanks, > > >>> > Matt > > >>> > > > >>> > > >> > > > > > -- > Sincerely yours > Mikhail Khludnev > -- http://www.needhamsoftware.com (work) https://a.co/d/b2sZLD9 (my fantasy fiction book)