Maybe try ([^a-z0-9]+) Sent by CarrierPigeon
On 17 Jun 2011, at 20:26, Adam Estrada <estrada.adam.gro...@gmail.com> wrote: > All, > > I am having trouble getting my regex pattern to work properly. I have tried > PatternReplaceFilterFactory after the standard tokenizer > > <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z0-9])" > replacement=" " replace="all"/> > > and PatternReplaceCharFilterFactory before it. > > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="([^a-zA-Z0-9])" replacement=" " replace="all"/> > > It looks like this should work to remove everything except letters and > numbers. > > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory"/> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords_en.txt" > enablePositionIncrements="true" > /> > <filter class="solr.LengthFilterFactory" min="2" max="999"/> > <filter class="solr.PatternReplaceFilterFactory" > pattern="([^a-z0-9])" replacement=" " replace="all"/> > > I am left with quite a few facet items like this > > <int name="_ view">1443</int> > <int name="view _">1599</int> > > Can anyone suggest what may be going on here? I have verified that my regex > works properly here http://www.fileformat.info/tool/regex.htm > > Adam