Maybe try ([^a-z0-9]+)
Sent by CarrierPigeon
On 17 Jun 2011, at 20:26, Adam Estrada estrada.adam.gro...@gmail.com wrote:
All,
I am having trouble getting my regex pattern to work properly. I have tried
PatternReplaceFilterFactory after the standard tokenizer
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9])
replacement= replace=all/
and PatternReplaceCharFilterFactory before it.
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=([^a-zA-Z0-9]) replacement= replace=all/
It looks like this should work to remove everything except letters and
numbers.
charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.LengthFilterFactory min=2 max=999/
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all/
I am left with quite a few facet items like this
int name=_ view1443/int
int name=view _1599/int
Can anyone suggest what may be going on here? I have verified that my regex
works properly here http://www.fileformat.info/tool/regex.htm
Adam