Hi
I might be missing a point. But the way to create spans in Solr are:
https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#xml-query-parser
https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#complex-phrase-query-parser


On Fri, Sep 5, 2025 at 6:32 PM mtn search <search...@gmail.com> wrote:

> I may have found what I am running up against - if Chatgpt is correct
> on diagnosis?
>
> *My sample query*
> /select?debug=true&indent=true&q={!lucene}spanNear(
>   spanNear(spanTerm(body:separate),spanTerm(body:email),0,true),
>   spanNear(spanTerm(body:will),spanTerm(body:be),0,true),
>   10,false)
>
> *Text from body field from a message where the messages is returned from
> the spanNear query above (I believe incorrectly)*
>        "separate device there will not be any load on the email servers"
>
> *Same text through analyzer*
> text
> raw_bytes
> start
> end
>
>
> separate
> [73 65 70 61 72 61 74 65]
> 5
> 13
>
> device
> [64 65 76 69 63 65]
> 14
> 20
>
> there
> [74 68 65 72 65]
> 21
> 26
>
> will
> [77 69 6c 6c]
> 27
> 31
>
> not
> [6e 6f 74]
> 32
> 35
>
> be
> [62 65]
> 36
> 38
>
> any
> [61 6e 79]
> 39
> 42
>
> load
> [6c 6f 61 64]
> 43
> 47
>
> on
> [6f 6e]
> 48
> 50
>
> the
> [74 68 65]
> 51
> 54
>
> email
> [65 6d 61 69 6c]
> 55
> 60
>
> server
> [73 65 72 76 65 72]
> 61
> 68
>
>
>
>
>
> *Chatgpt assessment*
>
>     Now, let’s check the spans:
>
>    -
>
>    Inner spanNear(separate, email, 0, true) is *not* going to match
>    directly, because email isn’t right after separate.
>    -
>
>    But Lucene is allowed to *reposition* the spans when used as children of
>    the outer spanNear. Each child span doesn’t need to be contiguous unless
>    it resolves to a valid match somewhere in the text.
>
> *Conclusion: *This last line may explain why the message above was returned
> by the query above, but appears to be incorrect.  While the words/tokens in
> the query are in the message they do not honor the proximity specified.
> But apparently children spans do not have to honor the proximity rules
> specified.  AI suggested this query for proximity, I am now concluding it
> is not a valid approach.
>
> I am not seeing a Solr/Lucene http query approach for a proximity search
> between phrases,  other than possibly to use the Lucene Java API for more
> control.
>
> If others have found a workable solution, please let me know.
>
> Thanks,
> Matt
>
>
>
>
>
> On Thu, Sep 4, 2025 at 3:26 PM mtn search <search...@gmail.com> wrote:
>
> > Also, I am using the SolrAdmin Analysis UI to verify how Solr is
> > tokenizing the messages and verifying manually position between tokens.
> >
> > Debug view of the query side:
> > For query:
> > "*params*":{
> >       "q":"{!lucene}SpanNearQuery(body,(money question),5,true)",
> >       "df":"body",
> >       "debug":"true",
> >       "indent":"true",
> >       "q.op":"OR",
> >       "wt":"json"}},
> >
> > It seems odd that in the parsed query that the "body" field named is
> > pre-appended to the value 5 and the text true.
> >   "*debug*":{
> >     "rawquerystring":"{!lucene}SpanNearQuery(body,(money
> > question),5,true)",
> >     "querystring":"{!lucene}SpanNearQuery(body,(money question),5,true)",
> >     "parsedquery":"body:spannearquery (body:body (body:money
> > body:question) (body:5 body:true))",
> >     "*parsedquery_toString*":*"body:spannearquery *(body:body (body:money
> > body:question)* (body:5 body:true*))",
> >     "explain":{
> >
> > On Thu, Sep 4, 2025 at 12:04 PM mtn search <search...@gmail.com> wrote:
> >
> >> Thanks Tim!  Yes I have tried a variety of values and am aware
> >> of ordering vs non ordering.  I am getting more results than expected
> and
> >> some that do not match the proximity criteria.   So when I set it to a
> >> small value like 2, I was seeking to see the result count drop
> >> significantly as many would not match criteria.  Unfortunately, the
> count
> >> does not drop.   Looks like a fundamental problem with how I am using
> the
> >> syntax.  Still researching, and open to suggestions.
> >>
> >> Matt
> >>
> >> On Thu, Sep 4, 2025 at 11:54 AM Tim Casey <tca...@gmail.com> wrote:
> >>
> >>> usually the span and proximities are off-by-one issues.  Specifically
> the
> >>> order of the tokens will change the distance calculation.  I do not
> have
> >>> an
> >>> example off the top of my head.   But, when I was doing this, I usually
> >>> started with a larger span and brought it down through looking at
> >>> results.
> >>>
> >>> This is the case for the old 5~"phrase words" syntax.
> >>>
> >>> As an aside, "Not working" is taken by me to mean you are not getting
> >>> results but the query passes parse.  Not working could mean a lot more
> in
> >>> this context.  So I am suggesting, instead of 2, try 10.
> >>>
> >>> On Thu, Sep 4, 2025 at 10:43 AM mtn search <search...@gmail.com>
> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> > Looking for guidance on approaches to implement a proximity search
> >>> between
> >>> > phrases.
> >>> >
> >>> > Initially tried:
> >>> >
> >>> >
> >>>
> "q":"{!lucene}spanNear(spanNear(spanNear(spanTerm(body:off),spanTerm(body:the),0,true),
> >>> > spanTerm(body: record),0,true),
> >>> spanNear(spanTerm(body:new),spanTerm(body:
> >>> > information),0,true) , 2N,false)",
> >>> >       "defType":"lucene",
> >>> >       "df":"body",
> >>> >
> >>> > However then simplified to just two terms:
> >>> >
> "q":"{!lucene}spanNear(spanTerm(body:off),spanTerm(body:call),2,true)",
> >>> >       "defType":"lucene",
> >>> >       "df":"body",
> >>> >
> >>> > Both are not working.  Any tips?  Currently on Solr 9.4, but will
> >>> likely
> >>> > need to run for some time on a Solr 6 instance.
> >>> >
> >>> > Thanks,
> >>> > Matt
> >>> >
> >>>
> >>
>


-- 
Sincerely yours
Mikhail Khludnev

Reply via email to