FYI
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: [email protected]


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Fri, 9 Sept 2022 at 15:49, Alessandro Benedetti <[email protected]>
wrote:

> Not related to the word-delimiter token filter but I did a study a while
> ago on the sow parameter, identified a couple of bugs and fixed one (the
> other was discussed and in the end not accepted as an improvement as it was
> controversial).
>
>
> https://sease.io/2021/05/apache-solr-sow-parameter-split-on-whitespace-and-multi-field-full-text-search.html
>
> Cheers
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: [email protected]
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
>
> On Wed, 7 Sept 2022 at 14:19, Markus Jelsma <[email protected]>
> wrote:
>
>> Hello Stephen,
>>
>> Using Solr 8.8.1 i tried to reproduce your strange problem, copied your
>> schema and indexed a single document. As expected, i got exactly one
>> result
>> for all four combinations, also using both the default Lucene QParser and
>> the Edismax QParser.
>>
>> So it appears to work just fine here on 8.8.1. The WordDelimeterGraph is
>> relatively new and had only few issues. Maybe you can try to see if it
>> works without the Graph-type token filters, using the old WordDelimeter
>> That one is tried and tested.
>>
>> Regards,
>> Markus
>>
>> Op vr 2 sep. 2022 om 21:57 schreef Stephen Lewis Bianamara <
>> [email protected]>:
>>
>> > Hey Solr Users,
>> >
>> > I've noticed an odd behavior between word graph delimiter and the sow
>> > parameter. When the word graph delimiter gets invoked and sow=true,
>> there
>> > is the possibility to miss results which include alpha num splitting but
>> > aren't exact matches. So if I have a document with "ABC123 DEF456_GHI",
>> the
>> > combination of sow=true and WordDelimeterGraph seem to break queries for
>> > "def456". See full repro below.
>> >
>> > I believe this is a bug. Could someone please take a look at my repro
>> and
>> > confirm my repro, or let me know if something is misconfigured here?
>> >
>> > *Repro*
>> >
>> >    - solr 9 with this field type definition for field "test_en"
>> >
>> > <fieldType name="text_en" class="solr.TextField"
>> positionIncrementGap="100"
>> > autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer
>> class=
>> > "solr.WhitespaceTokenizerFactory"/> <filter class=
>> > "solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
>> > generateNumberParts="1" catenateAll="1" preserveOriginal="1"
>> > splitOnCaseChange="1"/> <filter
>> class="solr.FlattenGraphFilterFactory"/> <
>> > filter class="solr.LowerCaseFilterFactory"/> <filter class=
>> > "solr.SnowballPorterFilterFactory"/> </analyzer> <analyzer
>> type="query"> <
>> > tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class=
>> > "solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
>> > generateNumberParts="1" catenateAll="1" preserveOriginal="1"
>> > splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <
>> > filter class="solr.SnowballPorterFilterFactory"/> </analyzer>
>> </fieldType>
>> >
>> >    - Create document {"id": 1, "test_en": ["ABC123 DEF456_GHI"]}
>> >    - Query the following; all should hit, but one combination misses
>> >       - sow=true, q=def456
>> >          - misses
>> >       - sow=true, q=abc123
>> >          - hits
>> >       - sow=false, q=def456
>> >          - hits
>> >       - sow=false, q=abc123
>> >          - hits
>> >
>>
>

Reply via email to