Hi all,

I’ve got some ancient Lucene tokenizer code from 2006 that I’m trying to avoid 
forward-porting, but I don’t think there’s an equivalent in Solr 5/6.

Specifically it’s applying shingles to the output of something like the 
WordDelimiterFilter - e.g. MySuperSink gets split into “My” “Super” “Sink”, and 
then shingled (if we’re using shingle size of 2) to be “My”, “MySuper”, 
“Super”, “SuperSink”, “Sink”.

I can’t just follow the WDF with a single filter because shingles aren’t 
created across terms coming into the WDF - it’s only for the pieces generated 
by the WDF.

Or is there actually a way to make this work with Solr 5/6?

Thanks,

— Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



Reply via email to