Not related to the word-delimiter token filter but I did a study a while ago on the sow parameter, identified a couple of bugs and fixed one (the other was discussed and in the end not accepted as an improvement as it was controversial).
https://sease.io/2021/05/apache-solr-sow-parameter-split-on-whitespace-and-multi-field-full-text-search.html Cheers -------------------------- *Alessandro Benedetti* Director @ Sease Ltd. *Apache Lucene/Solr Committer* *Apache Solr PMC Member* e-mail: [email protected] *Sease* - Information Retrieval Applied Consulting | Training | Open Source Website: Sease.io <http://sease.io/> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter <https://twitter.com/seaseltd> | Youtube <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github <https://github.com/seaseltd> On Wed, 7 Sept 2022 at 14:19, Markus Jelsma <[email protected]> wrote: > Hello Stephen, > > Using Solr 8.8.1 i tried to reproduce your strange problem, copied your > schema and indexed a single document. As expected, i got exactly one result > for all four combinations, also using both the default Lucene QParser and > the Edismax QParser. > > So it appears to work just fine here on 8.8.1. The WordDelimeterGraph is > relatively new and had only few issues. Maybe you can try to see if it > works without the Graph-type token filters, using the old WordDelimeter > That one is tried and tested. > > Regards, > Markus > > Op vr 2 sep. 2022 om 21:57 schreef Stephen Lewis Bianamara < > [email protected]>: > > > Hey Solr Users, > > > > I've noticed an odd behavior between word graph delimiter and the sow > > parameter. When the word graph delimiter gets invoked and sow=true, there > > is the possibility to miss results which include alpha num splitting but > > aren't exact matches. So if I have a document with "ABC123 DEF456_GHI", > the > > combination of sow=true and WordDelimeterGraph seem to break queries for > > "def456". See full repro below. > > > > I believe this is a bug. Could someone please take a look at my repro and > > confirm my repro, or let me know if something is misconfigured here? > > > > *Repro* > > > > - solr 9 with this field type definition for field "test_en" > > > > <fieldType name="text_en" class="solr.TextField" > positionIncrementGap="100" > > autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer > class= > > "solr.WhitespaceTokenizerFactory"/> <filter class= > > "solr.WordDelimiterGraphFilterFactory" generateWordParts="1" > > generateNumberParts="1" catenateAll="1" preserveOriginal="1" > > splitOnCaseChange="1"/> <filter class="solr.FlattenGraphFilterFactory"/> > < > > filter class="solr.LowerCaseFilterFactory"/> <filter class= > > "solr.SnowballPorterFilterFactory"/> </analyzer> <analyzer type="query"> > < > > tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class= > > "solr.WordDelimiterGraphFilterFactory" generateWordParts="1" > > generateNumberParts="1" catenateAll="1" preserveOriginal="1" > > splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> < > > filter class="solr.SnowballPorterFilterFactory"/> </analyzer> > </fieldType> > > > > - Create document {"id": 1, "test_en": ["ABC123 DEF456_GHI"]} > > - Query the following; all should hit, but one combination misses > > - sow=true, q=def456 > > - misses > > - sow=true, q=abc123 > > - hits > > - sow=false, q=def456 > > - hits > > - sow=false, q=abc123 > > - hits > > >
