I may have found what I am running up against - if Chatgpt is correct
on diagnosis?

*My sample query*
/select?debug=true&indent=true&q={!lucene}spanNear(
  spanNear(spanTerm(body:separate),spanTerm(body:email),0,true),
  spanNear(spanTerm(body:will),spanTerm(body:be),0,true),
  10,false)

*Text from body field from a message where the messages is returned from
the spanNear query above (I believe incorrectly)*
       "separate device there will not be any load on the email servers"

*Same text through analyzer*
text
raw_bytes
start
end


separate
[73 65 70 61 72 61 74 65]
5
13

device
[64 65 76 69 63 65]
14
20

there
[74 68 65 72 65]
21
26

will
[77 69 6c 6c]
27
31

not
[6e 6f 74]
32
35

be
[62 65]
36
38

any
[61 6e 79]
39
42

load
[6c 6f 61 64]
43
47

on
[6f 6e]
48
50

the
[74 68 65]
51
54

email
[65 6d 61 69 6c]
55
60

server
[73 65 72 76 65 72]
61
68





*Chatgpt assessment*

    Now, let’s check the spans:

   -

   Inner spanNear(separate, email, 0, true) is *not* going to match
   directly, because email isn’t right after separate.
   -

   But Lucene is allowed to *reposition* the spans when used as children of
   the outer spanNear. Each child span doesn’t need to be contiguous unless
   it resolves to a valid match somewhere in the text.

*Conclusion: *This last line may explain why the message above was returned
by the query above, but appears to be incorrect.  While the words/tokens in
the query are in the message they do not honor the proximity specified.
But apparently children spans do not have to honor the proximity rules
specified.  AI suggested this query for proximity, I am now concluding it
is not a valid approach.

I am not seeing a Solr/Lucene http query approach for a proximity search
between phrases,  other than possibly to use the Lucene Java API for more
control.

If others have found a workable solution, please let me know.

Thanks,
Matt





On Thu, Sep 4, 2025 at 3:26 PM mtn search <search...@gmail.com> wrote:

> Also, I am using the SolrAdmin Analysis UI to verify how Solr is
> tokenizing the messages and verifying manually position between tokens.
>
> Debug view of the query side:
> For query:
> "*params*":{
>       "q":"{!lucene}SpanNearQuery(body,(money question),5,true)",
>       "df":"body",
>       "debug":"true",
>       "indent":"true",
>       "q.op":"OR",
>       "wt":"json"}},
>
> It seems odd that in the parsed query that the "body" field named is
> pre-appended to the value 5 and the text true.
>   "*debug*":{
>     "rawquerystring":"{!lucene}SpanNearQuery(body,(money
> question),5,true)",
>     "querystring":"{!lucene}SpanNearQuery(body,(money question),5,true)",
>     "parsedquery":"body:spannearquery (body:body (body:money
> body:question) (body:5 body:true))",
>     "*parsedquery_toString*":*"body:spannearquery *(body:body (body:money
> body:question)* (body:5 body:true*))",
>     "explain":{
>
> On Thu, Sep 4, 2025 at 12:04 PM mtn search <search...@gmail.com> wrote:
>
>> Thanks Tim!  Yes I have tried a variety of values and am aware
>> of ordering vs non ordering.  I am getting more results than expected and
>> some that do not match the proximity criteria.   So when I set it to a
>> small value like 2, I was seeking to see the result count drop
>> significantly as many would not match criteria.  Unfortunately, the count
>> does not drop.   Looks like a fundamental problem with how I am using the
>> syntax.  Still researching, and open to suggestions.
>>
>> Matt
>>
>> On Thu, Sep 4, 2025 at 11:54 AM Tim Casey <tca...@gmail.com> wrote:
>>
>>> usually the span and proximities are off-by-one issues.  Specifically the
>>> order of the tokens will change the distance calculation.  I do not have
>>> an
>>> example off the top of my head.   But, when I was doing this, I usually
>>> started with a larger span and brought it down through looking at
>>> results.
>>>
>>> This is the case for the old 5~"phrase words" syntax.
>>>
>>> As an aside, "Not working" is taken by me to mean you are not getting
>>> results but the query passes parse.  Not working could mean a lot more in
>>> this context.  So I am suggesting, instead of 2, try 10.
>>>
>>> On Thu, Sep 4, 2025 at 10:43 AM mtn search <search...@gmail.com> wrote:
>>>
>>> > Hello,
>>> >
>>> > Looking for guidance on approaches to implement a proximity search
>>> between
>>> > phrases.
>>> >
>>> > Initially tried:
>>> >
>>> >
>>> "q":"{!lucene}spanNear(spanNear(spanNear(spanTerm(body:off),spanTerm(body:the),0,true),
>>> > spanTerm(body: record),0,true),
>>> spanNear(spanTerm(body:new),spanTerm(body:
>>> > information),0,true) , 2N,false)",
>>> >       "defType":"lucene",
>>> >       "df":"body",
>>> >
>>> > However then simplified to just two terms:
>>> > "q":"{!lucene}spanNear(spanTerm(body:off),spanTerm(body:call),2,true)",
>>> >       "defType":"lucene",
>>> >       "df":"body",
>>> >
>>> > Both are not working.  Any tips?  Currently on Solr 9.4, but will
>>> likely
>>> > need to run for some time on a Solr 6 instance.
>>> >
>>> > Thanks,
>>> > Matt
>>> >
>>>
>>

Reply via email to