Or
https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#surround-query-parser

On Sun, Sep 7, 2025 at 4:32 PM Mikhail Khludnev <m...@apache.org> wrote:

> Hi
> I might be missing a point. But the way to create spans in Solr are:
>
> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#xml-query-parser
>
> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#complex-phrase-query-parser
>
>
> On Fri, Sep 5, 2025 at 6:32 PM mtn search <search...@gmail.com> wrote:
>
> > I may have found what I am running up against - if Chatgpt is correct
> > on diagnosis?
> >
> > *My sample query*
> > /select?debug=true&indent=true&q={!lucene}spanNear(
> >   spanNear(spanTerm(body:separate),spanTerm(body:email),0,true),
> >   spanNear(spanTerm(body:will),spanTerm(body:be),0,true),
> >   10,false)
> >
> > *Text from body field from a message where the messages is returned from
> > the spanNear query above (I believe incorrectly)*
> >        "separate device there will not be any load on the email servers"
> >
> > *Same text through analyzer*
> > text
> > raw_bytes
> > start
> > end
> >
> >
> > separate
> > [73 65 70 61 72 61 74 65]
> > 5
> > 13
> >
> > device
> > [64 65 76 69 63 65]
> > 14
> > 20
> >
> > there
> > [74 68 65 72 65]
> > 21
> > 26
> >
> > will
> > [77 69 6c 6c]
> > 27
> > 31
> >
> > not
> > [6e 6f 74]
> > 32
> > 35
> >
> > be
> > [62 65]
> > 36
> > 38
> >
> > any
> > [61 6e 79]
> > 39
> > 42
> >
> > load
> > [6c 6f 61 64]
> > 43
> > 47
> >
> > on
> > [6f 6e]
> > 48
> > 50
> >
> > the
> > [74 68 65]
> > 51
> > 54
> >
> > email
> > [65 6d 61 69 6c]
> > 55
> > 60
> >
> > server
> > [73 65 72 76 65 72]
> > 61
> > 68
> >
> >
> >
> >
> >
> > *Chatgpt assessment*
> >
> >     Now, let’s check the spans:
> >
> >    -
> >
> >    Inner spanNear(separate, email, 0, true) is *not* going to match
> >    directly, because email isn’t right after separate.
> >    -
> >
> >    But Lucene is allowed to *reposition* the spans when used as children
> of
> >    the outer spanNear. Each child span doesn’t need to be contiguous
> unless
> >    it resolves to a valid match somewhere in the text.
> >
> > *Conclusion: *This last line may explain why the message above was
> returned
> > by the query above, but appears to be incorrect.  While the words/tokens
> in
> > the query are in the message they do not honor the proximity specified.
> > But apparently children spans do not have to honor the proximity rules
> > specified.  AI suggested this query for proximity, I am now concluding it
> > is not a valid approach.
> >
> > I am not seeing a Solr/Lucene http query approach for a proximity search
> > between phrases,  other than possibly to use the Lucene Java API for more
> > control.
> >
> > If others have found a workable solution, please let me know.
> >
> > Thanks,
> > Matt
> >
> >
> >
> >
> >
> > On Thu, Sep 4, 2025 at 3:26 PM mtn search <search...@gmail.com> wrote:
> >
> > > Also, I am using the SolrAdmin Analysis UI to verify how Solr is
> > > tokenizing the messages and verifying manually position between tokens.
> > >
> > > Debug view of the query side:
> > > For query:
> > > "*params*":{
> > >       "q":"{!lucene}SpanNearQuery(body,(money question),5,true)",
> > >       "df":"body",
> > >       "debug":"true",
> > >       "indent":"true",
> > >       "q.op":"OR",
> > >       "wt":"json"}},
> > >
> > > It seems odd that in the parsed query that the "body" field named is
> > > pre-appended to the value 5 and the text true.
> > >   "*debug*":{
> > >     "rawquerystring":"{!lucene}SpanNearQuery(body,(money
> > > question),5,true)",
> > >     "querystring":"{!lucene}SpanNearQuery(body,(money
> question),5,true)",
> > >     "parsedquery":"body:spannearquery (body:body (body:money
> > > body:question) (body:5 body:true))",
> > >     "*parsedquery_toString*":*"body:spannearquery *(body:body
> (body:money
> > > body:question)* (body:5 body:true*))",
> > >     "explain":{
> > >
> > > On Thu, Sep 4, 2025 at 12:04 PM mtn search <search...@gmail.com>
> wrote:
> > >
> > >> Thanks Tim!  Yes I have tried a variety of values and am aware
> > >> of ordering vs non ordering.  I am getting more results than expected
> > and
> > >> some that do not match the proximity criteria.   So when I set it to a
> > >> small value like 2, I was seeking to see the result count drop
> > >> significantly as many would not match criteria.  Unfortunately, the
> > count
> > >> does not drop.   Looks like a fundamental problem with how I am using
> > the
> > >> syntax.  Still researching, and open to suggestions.
> > >>
> > >> Matt
> > >>
> > >> On Thu, Sep 4, 2025 at 11:54 AM Tim Casey <tca...@gmail.com> wrote:
> > >>
> > >>> usually the span and proximities are off-by-one issues.  Specifically
> > the
> > >>> order of the tokens will change the distance calculation.  I do not
> > have
> > >>> an
> > >>> example off the top of my head.   But, when I was doing this, I
> usually
> > >>> started with a larger span and brought it down through looking at
> > >>> results.
> > >>>
> > >>> This is the case for the old 5~"phrase words" syntax.
> > >>>
> > >>> As an aside, "Not working" is taken by me to mean you are not getting
> > >>> results but the query passes parse.  Not working could mean a lot
> more
> > in
> > >>> this context.  So I am suggesting, instead of 2, try 10.
> > >>>
> > >>> On Thu, Sep 4, 2025 at 10:43 AM mtn search <search...@gmail.com>
> > wrote:
> > >>>
> > >>> > Hello,
> > >>> >
> > >>> > Looking for guidance on approaches to implement a proximity search
> > >>> between
> > >>> > phrases.
> > >>> >
> > >>> > Initially tried:
> > >>> >
> > >>> >
> > >>>
> >
> "q":"{!lucene}spanNear(spanNear(spanNear(spanTerm(body:off),spanTerm(body:the),0,true),
> > >>> > spanTerm(body: record),0,true),
> > >>> spanNear(spanTerm(body:new),spanTerm(body:
> > >>> > information),0,true) , 2N,false)",
> > >>> >       "defType":"lucene",
> > >>> >       "df":"body",
> > >>> >
> > >>> > However then simplified to just two terms:
> > >>> >
> > "q":"{!lucene}spanNear(spanTerm(body:off),spanTerm(body:call),2,true)",
> > >>> >       "defType":"lucene",
> > >>> >       "df":"body",
> > >>> >
> > >>> > Both are not working.  Any tips?  Currently on Solr 9.4, but will
> > >>> likely
> > >>> > need to run for some time on a Solr 6 instance.
> > >>> >
> > >>> > Thanks,
> > >>> > Matt
> > >>> >
> > >>>
> > >>
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)

Reply via email to