WordDelimiterGraphFilterFactory is a new implementation so it's also quite possible that the behavior just changed.
I just took a look and indeed it does. WordDelimiterFilterFactory (done on "p / n whatever) produces token: p n whatever position: 1 2 3 whereas WordDelimiterGraphFilterFactory produces: token: p n whatever position: 1 3 4 Arguably the Graph version is correct behavior. What if you use phrases to search for this instead? Best, Erick On Wed, Jan 3, 2018 at 12:56 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > Thanks Emir, Erick. > > What i want to do is remove empty tokens after WordDelimiterGraphFilter ? > Is there any such option in WordDelimiterGraphFilter to not generate empty > tokens? > > This index field is intended to use for strange strings e.g. part numbers. > P/N HSC0424PP > The benefit of removing the empty tokens is that if someone unintentionally > puts a space around the '/' (in above example) this field is still able to > match. > > In previous solr version, ShingleFilter used to work fine in case of empty > positions and was making shingles across the empty space. Although, it is > possible that i have learned to rely on a bug. > > > > > > > On Wed, Jan 3, 2018 at 12:23 PM, Emir Arnautović < > emir.arnauto...@sematext.com> wrote: > >> Hi Nawab, >> The reason why you do not get shingle is because there is empty token >> because after tokenizer you have 3 tokens ‘abc’, ‘-’ and ‘def’ so the token >> that you are interested in are not next to each other and cannot form >> shingle. >> What you can do is apply char filter before tokenization to remove ‘-‘ >> something like: >> >> <charFilter class="solr.PatternReplaceCharFilterFactory" >> pattern=“\s*-\s*” replacement=“ ”/> >> >> Regards, >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >> > On 3 Jan 2018, at 21:04, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: >> > >> > Hi, >> > >> > So, I have a string for indexing: >> > >> > abc - def (notice the space on either side of hyphen) >> > >> > which is being processed with this filter-list:- >> > >> > >> > <fieldType name="shingle" class="solr.TextField" >> > positionIncrementGap="100"> >> > <analyzer type="index"> >> > <charFilter >> > class="org.apache.lucene.analysis.icu.ICUNormalizer2CharFilterFactory" >> > name="nfkc" mode="compose"/> >> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> > <filter class="solr.WordDelimiterGraphFilterFactory" >> > generateWordParts="1" generateNumberParts="1" catenateWords="0" >> > catenateNumbers="0" catenateAll="0" preserveOriginal="0" >> > splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/> >> > <filter class="solr.FlattenGraphFilterFactory"/> >> > <filter class="solr.PatternReplaceFilterFactory" >> > pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/> >> > <filter class="solr.LowerCaseFilterFactory"/> >> > <filter class="solr.ASCIIFoldingFilterFactory"/> >> > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" >> > outputUnigrams="false" fillerToken=""/> >> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> > <filter class="solr.LimitTokenCountFilterFactory" >> > maxTokenCount="10000" consumeAllTokens="false"/> >> > <filter class="solr.LengthFilterFactory" min="1" max="255"/> >> > </analyzer> >> > >> > >> > I get two shingle tokens at the end: >> > >> > "abc" "def" >> > >> > I want to get "abc def" . What can I tweak to get this? >> > >> > >> > Thanks >> > Nawab >> >>