Opened reproducer https://github.com/apache/lucene/pull/12157
On Mon, Feb 13, 2023 at 6:46 PM Mikhail Khludnev <m...@apache.org> wrote: > It's time to summon Lucene devs > https://issues.apache.org/jira/browse/SOLR-16652?focusedCommentId=17687998&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17687998 > > it seems by design > https://github.com/apache/lucene/blob/main/lucene/queryparser/src/test/org/apache/lucene/queryparser/classic/TestQueryParser.java#L591 > It sets mw synonym: "guinea pig => cavy" > dumb.parse("guinea pig") => ((+field:guinea +field:pig) field:cavy) > Doesn't match just 'guinea' as expected in this ticket. > > On Mon, Feb 13, 2023 at 5:33 PM Rudi Seitz <r...@rudiseitz.com> wrote: > >> Thanks Mikhail. >> I think your directional approach ("foo bar=>baz,foo,bar") would work, but >> we'd also need "baz=>baz,foo bar" for a complete workaround. >> I've added your message as a comment on the ticket. >> Rudi >> >> On Sat, Feb 11, 2023 at 12:34 PM Mikhail Khludnev <m...@apache.org> >> wrote: >> >> > Thanks for raising a ticket. Here are just two considerations: >> > > we could change the synonym rule to "foo bar,baz,foo,bar" but this >> would >> > mean that a query for "foo" could now match a document containing only >> > "bar", which is not the intent of the original rule. >> > Ok. The later issue can be probably fixed by directing synonyms >> > foo bar=>baz,foo,bar >> > Right, It seems like a weird band aid. >> > >> > I stepped through lucene code, MUST occur for synonyms is defined >> > >> > >> https://github.com/apache/lucene/blob/7baa01b3c2f93e6b172e986aac8ef577a87ebceb/lucene/core/src/java/org/apache/lucene/util/QueryBuilder.java#L534 >> > Presumably, original terms could go with defaultOperator, and synonym >> > replacement keep MUST. >> > >> > >> > >> > >> > >> > On Sat, Feb 11, 2023 at 12:17 AM Rudi Seitz <r...@rudiseitz.com> wrote: >> > >> > > Thanks Mikhail and Michael. >> > > Based on your feedback, I created a ticket: >> > > https://issues.apache.org/jira/browse/SOLR-16652 >> > > In the ticket, I mentioned why updating the synonym rule or setting >> > > sow=true causes other problems in this case, unfortunately. I haven't >> yet >> > > looked through code to see where the behavior could be changed. >> > > Rudi >> > > >> > > >> > > On Fri, Feb 10, 2023 at 11:26 AM Michael Gibney < >> > mich...@michaelgibney.net >> > > > >> > > wrote: >> > > >> > > > Rudi, >> > > > >> > > > I agree, this does not seem like how it should behave. Probably >> > > > something that could be fixed in edismax, not something lower-level >> > > > (Lucene)? >> > > > >> > > > Michael >> > > > >> > > > On Fri, Feb 10, 2023 at 9:38 AM Mikhail Khludnev <m...@apache.org> >> > > wrote: >> > > > > >> > > > > Hello, Rudi. >> > > > > Well, it doesn't seem perfect. Probably it's can be fixed >> > > > > via >> > > > > foo bar,zzz,foo,bar >> > > > > And in some sort of sense this behavior is reasonable. >> > > > > Also you can experiment with sow and pf params (the later param is >> > > > > described in dismax page only). >> > > > > >> > > > > On Thu, Feb 9, 2023 at 8:19 PM Rudi Seitz <r...@rudiseitz.com> >> > wrote: >> > > > > >> > > > > > Is this known behavior or is it worth a JIRA ticket? >> > > > > > >> > > > > > Searching against a text_general field in Solr 9.1, if my >> edismax >> > > > query is >> > > > > > "foo bar" I should be able to get matches for "foo" without >> "bar" >> > and >> > > > vice >> > > > > > versa. However, if there happens to be a synonym rule applied at >> > > query >> > > > > > time, like "foo bar,zzz" I can no longer get single-term matches >> > > > against >> > > > > > "foo" or "bar." Both terms are now required, but can occur in >> > either >> > > > order. >> > > > > > If we change the text_general analysis chain to apply synonyms >> at >> > > index >> > > > > > time instead of query time, this behavior goes away and >> single-term >> > > > matches >> > > > > > are again possible. >> > > > > > >> > > > > > To reproduce, use the _default configset with "foo bar,zzz" >> added >> > to >> > > > > > synonyms.txt. Index these four docs: >> > > > > > >> > > > > > {"id":"1", "title_txt":"foo"} >> > > > > > {"id":"2", "title_txt":"bar"} >> > > > > > {"id":"3", "title_txt":"foo bar"} >> > > > > > {"id":"4", "title_txt":"bar foo"} >> > > > > > >> > > > > > Issue a query for "foo bar" (i.e. >> > > > > > defType=edismax&q.op=OR&qf=title_txt&q=foo bar) >> > > > > > Result: Only docs 3 and 4 come back >> > > > > > >> > > > > > Issue a query for "bar foo" >> > > > > > Result: All four docs come back; the synonym rule is not invoked >> > > > > > >> > > > > > Looking at the explain output for "foo bar" we see: >> > > > > > >> > > > > > +((title_txt:zzz (+title_txt:foo +title_txt:bar))) >> > > > > > >> > > > > > >> > > > > > Looking at the explain output for "bar foo" we see: >> > > > > > >> > > > > > +((title_txt:bar) (title_txt:foo)) >> > > > > > >> > > > > > So, the observed behavior makes sense according to the low-level >> > > query >> > > > > > structure. But -- is this how it's "supposed" to work? >> > > > > > >> > > > > > Why not expand the "foo bar" query like this instead? >> > > > > > >> > > > > > +((title_txt:zzz (title_txt:foo title_txt:bar))) >> > > > > > >> > > > > > Rudi >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Sincerely yours >> > > > > Mikhail Khludnev >> > > > > https://t.me/MUST_SEARCH >> > > > > A caveat: Cyrillic! >> > > > >> > > >> > >> > >> > -- >> > Sincerely yours >> > Mikhail Khludnev >> > https://t.me/MUST_SEARCH >> > A caveat: Cyrillic! >> > >> > > > -- > Sincerely yours > Mikhail Khludnev > https://t.me/MUST_SEARCH > A caveat: Cyrillic! > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!