Hi All, Currently, the SOLR's existing http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.RemoveDuplicatesTokenFilterFactory RemoveDuplicatesTokenFilter filters the duplicate tokens with the same text and logical at the same position.
In my case, if the same term appears duplicate one after the other then i need to remove all duplicates and consume only single occurance of the term (even if the positionincrementgap ==1). For e.g. the input stream is as: /quick brown brown brown fox jumps jumps over the little little lazy brown dog/ Then the output shld be: quick brown fox jumps over the little lazy brown dog. To acheive this, I implemented my own version of /RemoveDuplicatesTokenFilter/ with overridden /process()/ method as: protected Token process(Token t) throws IOException { Token nextTok = peek(1); if(t!=null && nextTok!=null){ if(t.termText().equalsIgnoreCase(nextTok.termText())){ return null; } } return t; } The above implementation works as per desired and the continuous duplicates are getting removed :) Any advice/feedback for the above implementation :) Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Generic-RemoveDuplicatesTokenFilter-tp3581656p3581656.html Sent from the Solr - User mailing list archive at Nabble.com.