Hi all, I’ve got some ancient Lucene tokenizer code from 2006 that I’m trying to avoid forward-porting, but I don’t think there’s an equivalent in Solr 5/6.
Specifically it’s applying shingles to the output of something like the WordDelimiterFilter - e.g. MySuperSink gets split into “My” “Super” “Sink”, and then shingled (if we’re using shingle size of 2) to be “My”, “MySuper”, “Super”, “SuperSink”, “Sink”. I can’t just follow the WDF with a single filter because shingles aren’t created across terms coming into the WDF - it’s only for the pieces generated by the WDF. Or is there actually a way to make this work with Solr 5/6? Thanks, — Ken -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr