Not related to the word-delimiter token filter but I did a study a while
ago on the sow parameter, identified a couple of bugs and fixed one (the
other was discussed and in the end not accepted as an improvement as it was
controversial).

https://sease.io/2021/05/apache-solr-sow-parameter-split-on-whitespace-and-multi-field-full-text-search.html

Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: [email protected]


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Wed, 7 Sept 2022 at 14:19, Markus Jelsma <[email protected]>
wrote:

> Hello Stephen,
>
> Using Solr 8.8.1 i tried to reproduce your strange problem, copied your
> schema and indexed a single document. As expected, i got exactly one result
> for all four combinations, also using both the default Lucene QParser and
> the Edismax QParser.
>
> So it appears to work just fine here on 8.8.1. The WordDelimeterGraph is
> relatively new and had only few issues. Maybe you can try to see if it
> works without the Graph-type token filters, using the old WordDelimeter
> That one is tried and tested.
>
> Regards,
> Markus
>
> Op vr 2 sep. 2022 om 21:57 schreef Stephen Lewis Bianamara <
> [email protected]>:
>
> > Hey Solr Users,
> >
> > I've noticed an odd behavior between word graph delimiter and the sow
> > parameter. When the word graph delimiter gets invoked and sow=true, there
> > is the possibility to miss results which include alpha num splitting but
> > aren't exact matches. So if I have a document with "ABC123 DEF456_GHI",
> the
> > combination of sow=true and WordDelimeterGraph seem to break queries for
> > "def456". See full repro below.
> >
> > I believe this is a bug. Could someone please take a look at my repro and
> > confirm my repro, or let me know if something is misconfigured here?
> >
> > *Repro*
> >
> >    - solr 9 with this field type definition for field "test_en"
> >
> > <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100"
> > autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer
> class=
> > "solr.WhitespaceTokenizerFactory"/> <filter class=
> > "solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
> > generateNumberParts="1" catenateAll="1" preserveOriginal="1"
> > splitOnCaseChange="1"/> <filter class="solr.FlattenGraphFilterFactory"/>
> <
> > filter class="solr.LowerCaseFilterFactory"/> <filter class=
> > "solr.SnowballPorterFilterFactory"/> </analyzer> <analyzer type="query">
> <
> > tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class=
> > "solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
> > generateNumberParts="1" catenateAll="1" preserveOriginal="1"
> > splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <
> > filter class="solr.SnowballPorterFilterFactory"/> </analyzer>
> </fieldType>
> >
> >    - Create document {"id": 1, "test_en": ["ABC123 DEF456_GHI"]}
> >    - Query the following; all should hit, but one combination misses
> >       - sow=true, q=def456
> >          - misses
> >       - sow=true, q=abc123
> >          - hits
> >       - sow=false, q=def456
> >          - hits
> >       - sow=false, q=abc123
> >          - hits
> >
>

Reply via email to