REGEX Proper Usage?

2011-06-17 Thread Adam Estrada
All,

I am having trouble getting my regex pattern to work properly. I have tried
PatternReplaceFilterFactory after the standard tokenizer

filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9])
replacement=  replace=all/

and PatternReplaceCharFilterFactory before it.

charFilter class=solr.PatternReplaceCharFilterFactory
pattern=([^a-zA-Z0-9]) replacement=  replace=all/

It looks like this should work to remove everything except letters and
numbers.

charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/

filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.LengthFilterFactory min=2 max=999/
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement=  replace=all/

I am left with quite a few facet items like this

int name=_ view1443/int
int name=view _1599/int

Can anyone suggest what may be going on here? I have verified that my regex
works properly here http://www.fileformat.info/tool/regex.htm

Adam


Re: REGEX Proper Usage?

2011-06-17 Thread Dave Searle
Maybe try ([^a-z0-9]+)

Sent by CarrierPigeon

On 17 Jun 2011, at 20:26, Adam Estrada estrada.adam.gro...@gmail.com wrote:

 All,
 
 I am having trouble getting my regex pattern to work properly. I have tried
 PatternReplaceFilterFactory after the standard tokenizer
 
 filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9])
 replacement=  replace=all/
 
 and PatternReplaceCharFilterFactory before it.
 
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=([^a-zA-Z0-9]) replacement=  replace=all/
 
 It looks like this should work to remove everything except letters and
 numbers.
 
charFilter class=solr.HTMLStripCharFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
 
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.LengthFilterFactory min=2 max=999/
filter class=solr.PatternReplaceFilterFactory
 pattern=([^a-z0-9]) replacement=  replace=all/
 
 I am left with quite a few facet items like this
 
 int name=_ view1443/int
 int name=view _1599/int
 
 Can anyone suggest what may be going on here? I have verified that my regex
 works properly here http://www.fileformat.info/tool/regex.htm
 
 Adam