On 20-Nov-08, at 6:20 AM, Daniel Rosher wrote:
Hi,
I'm trying to index some content that has things like 'java/J2EE'
but with
solr.WordDelimiterFilterFactory and parameters [generateWordParts="1"
generateNumberParts="0" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"] this ends up tokenized as
'java','j','2',EE'
Does anyone know a way of having this tokenized as 'java','j2ee'.
Perhaps this filter need something like a protected list of tokens
not to
tokenize like EnglishPorterFilter ?
That's a possibility. Another is to add code to filter out short
tokens from being generated, and use catenateAll=true
-Mike