Hi I might be missing a point. But the way to create spans in Solr are: https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#xml-query-parser https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#complex-phrase-query-parser
On Fri, Sep 5, 2025 at 6:32 PM mtn search <search...@gmail.com> wrote: > I may have found what I am running up against - if Chatgpt is correct > on diagnosis? > > *My sample query* > /select?debug=true&indent=true&q={!lucene}spanNear( > spanNear(spanTerm(body:separate),spanTerm(body:email),0,true), > spanNear(spanTerm(body:will),spanTerm(body:be),0,true), > 10,false) > > *Text from body field from a message where the messages is returned from > the spanNear query above (I believe incorrectly)* > "separate device there will not be any load on the email servers" > > *Same text through analyzer* > text > raw_bytes > start > end > > > separate > [73 65 70 61 72 61 74 65] > 5 > 13 > > device > [64 65 76 69 63 65] > 14 > 20 > > there > [74 68 65 72 65] > 21 > 26 > > will > [77 69 6c 6c] > 27 > 31 > > not > [6e 6f 74] > 32 > 35 > > be > [62 65] > 36 > 38 > > any > [61 6e 79] > 39 > 42 > > load > [6c 6f 61 64] > 43 > 47 > > on > [6f 6e] > 48 > 50 > > the > [74 68 65] > 51 > 54 > > email > [65 6d 61 69 6c] > 55 > 60 > > server > [73 65 72 76 65 72] > 61 > 68 > > > > > > *Chatgpt assessment* > > Now, let’s check the spans: > > - > > Inner spanNear(separate, email, 0, true) is *not* going to match > directly, because email isn’t right after separate. > - > > But Lucene is allowed to *reposition* the spans when used as children of > the outer spanNear. Each child span doesn’t need to be contiguous unless > it resolves to a valid match somewhere in the text. > > *Conclusion: *This last line may explain why the message above was returned > by the query above, but appears to be incorrect. While the words/tokens in > the query are in the message they do not honor the proximity specified. > But apparently children spans do not have to honor the proximity rules > specified. AI suggested this query for proximity, I am now concluding it > is not a valid approach. > > I am not seeing a Solr/Lucene http query approach for a proximity search > between phrases, other than possibly to use the Lucene Java API for more > control. > > If others have found a workable solution, please let me know. > > Thanks, > Matt > > > > > > On Thu, Sep 4, 2025 at 3:26 PM mtn search <search...@gmail.com> wrote: > > > Also, I am using the SolrAdmin Analysis UI to verify how Solr is > > tokenizing the messages and verifying manually position between tokens. > > > > Debug view of the query side: > > For query: > > "*params*":{ > > "q":"{!lucene}SpanNearQuery(body,(money question),5,true)", > > "df":"body", > > "debug":"true", > > "indent":"true", > > "q.op":"OR", > > "wt":"json"}}, > > > > It seems odd that in the parsed query that the "body" field named is > > pre-appended to the value 5 and the text true. > > "*debug*":{ > > "rawquerystring":"{!lucene}SpanNearQuery(body,(money > > question),5,true)", > > "querystring":"{!lucene}SpanNearQuery(body,(money question),5,true)", > > "parsedquery":"body:spannearquery (body:body (body:money > > body:question) (body:5 body:true))", > > "*parsedquery_toString*":*"body:spannearquery *(body:body (body:money > > body:question)* (body:5 body:true*))", > > "explain":{ > > > > On Thu, Sep 4, 2025 at 12:04 PM mtn search <search...@gmail.com> wrote: > > > >> Thanks Tim! Yes I have tried a variety of values and am aware > >> of ordering vs non ordering. I am getting more results than expected > and > >> some that do not match the proximity criteria. So when I set it to a > >> small value like 2, I was seeking to see the result count drop > >> significantly as many would not match criteria. Unfortunately, the > count > >> does not drop. Looks like a fundamental problem with how I am using > the > >> syntax. Still researching, and open to suggestions. > >> > >> Matt > >> > >> On Thu, Sep 4, 2025 at 11:54 AM Tim Casey <tca...@gmail.com> wrote: > >> > >>> usually the span and proximities are off-by-one issues. Specifically > the > >>> order of the tokens will change the distance calculation. I do not > have > >>> an > >>> example off the top of my head. But, when I was doing this, I usually > >>> started with a larger span and brought it down through looking at > >>> results. > >>> > >>> This is the case for the old 5~"phrase words" syntax. > >>> > >>> As an aside, "Not working" is taken by me to mean you are not getting > >>> results but the query passes parse. Not working could mean a lot more > in > >>> this context. So I am suggesting, instead of 2, try 10. > >>> > >>> On Thu, Sep 4, 2025 at 10:43 AM mtn search <search...@gmail.com> > wrote: > >>> > >>> > Hello, > >>> > > >>> > Looking for guidance on approaches to implement a proximity search > >>> between > >>> > phrases. > >>> > > >>> > Initially tried: > >>> > > >>> > > >>> > "q":"{!lucene}spanNear(spanNear(spanNear(spanTerm(body:off),spanTerm(body:the),0,true), > >>> > spanTerm(body: record),0,true), > >>> spanNear(spanTerm(body:new),spanTerm(body: > >>> > information),0,true) , 2N,false)", > >>> > "defType":"lucene", > >>> > "df":"body", > >>> > > >>> > However then simplified to just two terms: > >>> > > "q":"{!lucene}spanNear(spanTerm(body:off),spanTerm(body:call),2,true)", > >>> > "defType":"lucene", > >>> > "df":"body", > >>> > > >>> > Both are not working. Any tips? Currently on Solr 9.4, but will > >>> likely > >>> > need to run for some time on a Solr 6 instance. > >>> > > >>> > Thanks, > >>> > Matt > >>> > > >>> > >> > -- Sincerely yours Mikhail Khludnev