I've checked the surround parser. Turns out it lacks braces support. I've also added a reproducer for nested spans issue, which intervals are able to handle https://github.com/mkhludnev/solr-flexible-qparser/blob/860e17c16153b1d3ef337f099b0d9f572620e9b1/src/test/java/org/apache/solr/flexibleqp/TestCompeteWithSpans.java#L49
On Tue, Sep 9, 2025 at 1:12 PM Mikhail Khludnev <m...@apache.org> wrote: > Right. complexphrase is not an option for nesting. > I'm wondering if you encounter > https://issues.apache.org/jira/browse/LUCENE-7398 Let us know please if > you do. > I'm interested in whether intervals are an option for such cases. > > On Mon, Sep 8, 2025 at 6:31 PM Matt Kuiper <kuipe...@gmail.com> wrote: > >> Thanks for the feedback! >> >> Mikhail - I did not see the complex query parser supporting proximity >> between 2 phrases, however the XmlQParser might via spans. Thanks for the >> tip! >> >> Gus - we currently use the Surround query parser for proximity between >> two >> terms. Do you know of a means to use it for proximity between phrases? >> This would be ideal as we have a search client tool already using this >> syntax. >> >> Dave - This type of approach might work for us (possibly like the complex >> query parser) where it is not exactly finding proximity between two >> phrases. But verifying that all the worlds within two phrases are within >> a >> proximity range. As you say this could handle stop words that may still >> be >> in the index from not blocking a match. >> >> Matt >> >> On Mon, Sep 8, 2025 at 7:29 AM Dave <hastings.recurs...@gmail.com> wrote: >> >> > There are other clever ways to do it too, using the within parameter, >> and >> > other things I don’t remember off the top of my head but I gave a >> > presentation a few years ago that utilized it. It uses more raw solr >> > parameters that you can take in a phrase but tokenize them and find out >> > documents that have that phrase but may have words inside them, so you >> > restrict the results to only documents that have all the words in the >> > phrase but within that number of words plus 2 or 3 to take care of stop >> > words that may show up, like “red house hill” would still find “red >> house >> > on top of the hill” within a proximity to each other of about 7. >> > >> > > On Sep 7, 2025, at 7:15 PM, Gus Heck <gus.h...@gmail.com> wrote: >> > > >> > > Or >> > > >> > >> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#surround-query-parser >> > > >> > >> On Sun, Sep 7, 2025 at 4:32 PM Mikhail Khludnev <m...@apache.org> >> > wrote: >> > >> >> > >> Hi >> > >> I might be missing a point. But the way to create spans in Solr are: >> > >> >> > >> >> > >> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#xml-query-parser >> > >> >> > >> >> > >> https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#complex-phrase-query-parser >> > >> >> > >> >> > >>> On Fri, Sep 5, 2025 at 6:32 PM mtn search <search...@gmail.com> >> wrote: >> > >>> >> > >>> I may have found what I am running up against - if Chatgpt is >> correct >> > >>> on diagnosis? >> > >>> >> > >>> *My sample query* >> > >>> /select?debug=true&indent=true&q={!lucene}spanNear( >> > >>> spanNear(spanTerm(body:separate),spanTerm(body:email),0,true), >> > >>> spanNear(spanTerm(body:will),spanTerm(body:be),0,true), >> > >>> 10,false) >> > >>> >> > >>> *Text from body field from a message where the messages is returned >> > from >> > >>> the spanNear query above (I believe incorrectly)* >> > >>> "separate device there will not be any load on the email >> servers" >> > >>> >> > >>> *Same text through analyzer* >> > >>> text >> > >>> raw_bytes >> > >>> start >> > >>> end >> > >>> >> > >>> >> > >>> separate >> > >>> [73 65 70 61 72 61 74 65] >> > >>> 5 >> > >>> 13 >> > >>> >> > >>> device >> > >>> [64 65 76 69 63 65] >> > >>> 14 >> > >>> 20 >> > >>> >> > >>> there >> > >>> [74 68 65 72 65] >> > >>> 21 >> > >>> 26 >> > >>> >> > >>> will >> > >>> [77 69 6c 6c] >> > >>> 27 >> > >>> 31 >> > >>> >> > >>> not >> > >>> [6e 6f 74] >> > >>> 32 >> > >>> 35 >> > >>> >> > >>> be >> > >>> [62 65] >> > >>> 36 >> > >>> 38 >> > >>> >> > >>> any >> > >>> [61 6e 79] >> > >>> 39 >> > >>> 42 >> > >>> >> > >>> load >> > >>> [6c 6f 61 64] >> > >>> 43 >> > >>> 47 >> > >>> >> > >>> on >> > >>> [6f 6e] >> > >>> 48 >> > >>> 50 >> > >>> >> > >>> the >> > >>> [74 68 65] >> > >>> 51 >> > >>> 54 >> > >>> >> > >>> email >> > >>> [65 6d 61 69 6c] >> > >>> 55 >> > >>> 60 >> > >>> >> > >>> server >> > >>> [73 65 72 76 65 72] >> > >>> 61 >> > >>> 68 >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> *Chatgpt assessment* >> > >>> >> > >>> Now, let’s check the spans: >> > >>> >> > >>> - >> > >>> >> > >>> Inner spanNear(separate, email, 0, true) is *not* going to match >> > >>> directly, because email isn’t right after separate. >> > >>> - >> > >>> >> > >>> But Lucene is allowed to *reposition* the spans when used as >> children >> > >> of >> > >>> the outer spanNear. Each child span doesn’t need to be contiguous >> > >> unless >> > >>> it resolves to a valid match somewhere in the text. >> > >>> >> > >>> *Conclusion: *This last line may explain why the message above was >> > >> returned >> > >>> by the query above, but appears to be incorrect. While the >> > words/tokens >> > >> in >> > >>> the query are in the message they do not honor the proximity >> specified. >> > >>> But apparently children spans do not have to honor the proximity >> rules >> > >>> specified. AI suggested this query for proximity, I am now >> concluding >> > it >> > >>> is not a valid approach. >> > >>> >> > >>> I am not seeing a Solr/Lucene http query approach for a proximity >> > search >> > >>> between phrases, other than possibly to use the Lucene Java API for >> > more >> > >>> control. >> > >>> >> > >>> If others have found a workable solution, please let me know. >> > >>> >> > >>> Thanks, >> > >>> Matt >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>>> On Thu, Sep 4, 2025 at 3:26 PM mtn search <search...@gmail.com> >> > wrote: >> > >>> >> > >>>> Also, I am using the SolrAdmin Analysis UI to verify how Solr is >> > >>>> tokenizing the messages and verifying manually position between >> > tokens. >> > >>>> >> > >>>> Debug view of the query side: >> > >>>> For query: >> > >>>> "*params*":{ >> > >>>> "q":"{!lucene}SpanNearQuery(body,(money question),5,true)", >> > >>>> "df":"body", >> > >>>> "debug":"true", >> > >>>> "indent":"true", >> > >>>> "q.op":"OR", >> > >>>> "wt":"json"}}, >> > >>>> >> > >>>> It seems odd that in the parsed query that the "body" field named >> is >> > >>>> pre-appended to the value 5 and the text true. >> > >>>> "*debug*":{ >> > >>>> "rawquerystring":"{!lucene}SpanNearQuery(body,(money >> > >>>> question),5,true)", >> > >>>> "querystring":"{!lucene}SpanNearQuery(body,(money >> > >> question),5,true)", >> > >>>> "parsedquery":"body:spannearquery (body:body (body:money >> > >>>> body:question) (body:5 body:true))", >> > >>>> "*parsedquery_toString*":*"body:spannearquery *(body:body >> > >> (body:money >> > >>>> body:question)* (body:5 body:true*))", >> > >>>> "explain":{ >> > >>>> >> > >>>> On Thu, Sep 4, 2025 at 12:04 PM mtn search <search...@gmail.com> >> > >> wrote: >> > >>>> >> > >>>>> Thanks Tim! Yes I have tried a variety of values and am aware >> > >>>>> of ordering vs non ordering. I am getting more results than >> expected >> > >>> and >> > >>>>> some that do not match the proximity criteria. So when I set it >> to >> > a >> > >>>>> small value like 2, I was seeking to see the result count drop >> > >>>>> significantly as many would not match criteria. Unfortunately, >> the >> > >>> count >> > >>>>> does not drop. Looks like a fundamental problem with how I am >> using >> > >>> the >> > >>>>> syntax. Still researching, and open to suggestions. >> > >>>>> >> > >>>>> Matt >> > >>>>> >> > >>>>> On Thu, Sep 4, 2025 at 11:54 AM Tim Casey <tca...@gmail.com> >> wrote: >> > >>>>> >> > >>>>>> usually the span and proximities are off-by-one issues. >> > Specifically >> > >>> the >> > >>>>>> order of the tokens will change the distance calculation. I do >> not >> > >>> have >> > >>>>>> an >> > >>>>>> example off the top of my head. But, when I was doing this, I >> > >> usually >> > >>>>>> started with a larger span and brought it down through looking at >> > >>>>>> results. >> > >>>>>> >> > >>>>>> This is the case for the old 5~"phrase words" syntax. >> > >>>>>> >> > >>>>>> As an aside, "Not working" is taken by me to mean you are not >> > getting >> > >>>>>> results but the query passes parse. Not working could mean a lot >> > >> more >> > >>> in >> > >>>>>> this context. So I am suggesting, instead of 2, try 10. >> > >>>>>> >> > >>>>>> On Thu, Sep 4, 2025 at 10:43 AM mtn search <search...@gmail.com> >> > >>> wrote: >> > >>>>>> >> > >>>>>>> Hello, >> > >>>>>>> >> > >>>>>>> Looking for guidance on approaches to implement a proximity >> search >> > >>>>>> between >> > >>>>>>> phrases. >> > >>>>>>> >> > >>>>>>> Initially tried: >> > >>>>>>> >> > >>>>>>> >> > >>>>>> >> > >>> >> > >> >> > >> "q":"{!lucene}spanNear(spanNear(spanNear(spanTerm(body:off),spanTerm(body:the),0,true), >> > >>>>>>> spanTerm(body: record),0,true), >> > >>>>>> spanNear(spanTerm(body:new),spanTerm(body: >> > >>>>>>> information),0,true) , 2N,false)", >> > >>>>>>> "defType":"lucene", >> > >>>>>>> "df":"body", >> > >>>>>>> >> > >>>>>>> However then simplified to just two terms: >> > >>>>>>> >> > >>> >> "q":"{!lucene}spanNear(spanTerm(body:off),spanTerm(body:call),2,true)", >> > >>>>>>> "defType":"lucene", >> > >>>>>>> "df":"body", >> > >>>>>>> >> > >>>>>>> Both are not working. Any tips? Currently on Solr 9.4, but >> will >> > >>>>>> likely >> > >>>>>>> need to run for some time on a Solr 6 instance. >> > >>>>>>> >> > >>>>>>> Thanks, >> > >>>>>>> Matt >> > >>>>>>> >> > >>>>>> >> > >>>>> >> > >>> >> > >> >> > >> >> > >> -- >> > >> Sincerely yours >> > >> Mikhail Khludnev >> > >> >> > > >> > > >> > > -- >> > > http://www.needhamsoftware.com (work) >> > > https://a.co/d/b2sZLD9 (my fantasy fiction book) >> > >> > > > -- > Sincerely yours > Mikhail Khludnev > -- Sincerely yours Mikhail Khludnev