Hi

I'm trying to produce an autosuggestion field for project ids using
ngrams and WordDelimiterGraphFilterFactory to split on word number
boundaries.

The ids have various formats ranging from nihr123456, 12/34/567,
DRF-2018-11-ST2-062.

What I'm trying to do is allow the user to enter the number parts or the
alphabetical characters, or both and match all. The basic autosuggestion is
working but I have an issue where the query is matching some but not all of
the component parts. For example:

I enter DRF-2018-11 and it matches:

DRF-2018-11-ST2-062
PB-PG-0909-20188
CS-2018-18-ST2-005


The first one is correct because it matches the DRF, the 2018 and the 11.
The second and third ones I don't want because there's no DRF, or 11 in the
ids.  Is there any way to get around this problem in Solr configuration, or
do I have to split the id manually in code and construct a query where the
id is DRF AND id is 2018 AND id is 11?

Here is my field type configuration:

 <fieldType name="ngram_award_id" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">

<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0" splitOnNumerics="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.FlattenGraphFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="7"/>

</analyzer>
<analyzer type="query">

<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"  splitOnNumerics="1"/>
<filter class="solr.LowerCaseFilterFactory"/>

</analyzer>
</fieldType>

Thanks
Shaun

Reply via email to