Hi Nawab, The reason why you do not get shingle is because there is empty token because after tokenizer you have 3 tokens ‘abc’, ‘-’ and ‘def’ so the token that you are interested in are not next to each other and cannot form shingle. What you can do is apply char filter before tokenization to remove ‘-‘ something like:
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern=“\s*-\s*” replacement=“ ”/> Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 3 Jan 2018, at 21:04, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > > Hi, > > So, I have a string for indexing: > > abc - def (notice the space on either side of hyphen) > > which is being processed with this filter-list:- > > > <fieldType name="shingle" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <charFilter > class="org.apache.lucene.analysis.icu.ICUNormalizer2CharFilterFactory" > name="nfkc" mode="compose"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" preserveOriginal="0" > splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/> > <filter class="solr.FlattenGraphFilterFactory"/> > <filter class="solr.PatternReplaceFilterFactory" > pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" > outputUnigrams="false" fillerToken=""/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > <filter class="solr.LimitTokenCountFilterFactory" > maxTokenCount="10000" consumeAllTokens="false"/> > <filter class="solr.LengthFilterFactory" min="1" max="255"/> > </analyzer> > > > I get two shingle tokens at the end: > > "abc" "def" > > I want to get "abc def" . What can I tweak to get this? > > > Thanks > Nawab