Hi I'm trying to produce an autosuggestion field for project ids using ngrams and WordDelimiterGraphFilterFactory to split on word number boundaries.
The ids have various formats ranging from nihr123456, 12/34/567, DRF-2018-11-ST2-062. What I'm trying to do is allow the user to enter the number parts or the alphabetical characters, or both and match all. The basic autosuggestion is working but I have an issue where the query is matching some but not all of the component parts. For example: I enter DRF-2018-11 and it matches: DRF-2018-11-ST2-062 PB-PG-0909-20188 CS-2018-18-ST2-005 The first one is correct because it matches the DRF, the 2018 and the 11. The second and third ones I don't want because there's no DRF, or 11 in the ids. Is there any way to get around this problem in Solr configuration, or do I have to split the id manually in code and construct a query where the id is DRF AND id is 2018 AND id is 11? Here is my field type configuration: <fieldType name="ngram_award_id" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.FlattenGraphFilterFactory" /> <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="7"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="1"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> Thanks Shaun