On 20-Nov-08, at 6:20 AM, Daniel Rosher wrote:

Hi,

I'm trying to index some content that has things like 'java/J2EE' but with
solr.WordDelimiterFilterFactory and parameters [generateWordParts="1"
generateNumberParts="0" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"] this ends up tokenized as
'java','j','2',EE'

Does anyone know a way of having this tokenized as 'java','j2ee'.

Perhaps this filter need something like a protected list of tokens not to
tokenize like EnglishPorterFilter ?

That's a possibility. Another is to add code to filter out short tokens from being generated, and use catenateAll=true

-Mike

Reply via email to