Searching and WordDelimiterGraphFilterFactory

Shaun Campbell Tue, 09 Mar 2021 03:06:58 -0800

Hi

I'm trying to produce an autosuggestion field for project ids using
ngrams and WordDelimiterGraphFilterFactory to split on word number
boundaries.


The ids have various formats ranging from nihr123456, 12/34/567,
DRF-2018-11-ST2-062.

What I'm trying to do is allow the user to enter the number parts or the
alphabetical characters, or both and match all. The basic autosuggestion is
working but I have an issue where the query is matching some but not all of
the component parts. For example:

I enter DRF-2018-11 and it matches:

DRF-2018-11-ST2-062
PB-PG-0909-20188
CS-2018-18-ST2-005


The first one is correct because it matches the DRF, the 2018 and the 11.
The second and third ones I don't want because there's no DRF, or 11 in the
ids.  Is there any way to get around this problem in Solr configuration, or
do I have to split the id manually in code and construct a query where the
id is DRF AND id is 2018 AND id is 11?

Here is my field type configuration:

 <fieldType name="ngram_award_id" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">

<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0" splitOnNumerics="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.FlattenGraphFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="7"/>

</analyzer>
<analyzer type="query">

<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"  splitOnNumerics="1"/>
<filter class="solr.LowerCaseFilterFactory"/>

</analyzer>
</fieldType>

Thanks
Shaun

Searching and WordDelimiterGraphFilterFactory

Reply via email to