Maybe try ([^a-z0-9]+)

Sent by CarrierPigeon

On 17 Jun 2011, at 20:26, Adam Estrada <estrada.adam.gro...@gmail.com> wrote:

> All,
> 
> I am having trouble getting my regex pattern to work properly. I have tried
> PatternReplaceFilterFactory after the standard tokenizer
> 
> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z0-9])"
> replacement=" " replace="all"/>
> 
> and PatternReplaceCharFilterFactory before it.
> 
> <charFilter class="solr.PatternReplaceCharFilterFactory"
> pattern="([^a-zA-Z0-9])" replacement=" " replace="all"/>
> 
> It looks like this should work to remove everything except letters and
> numbers.
> 
>        <charFilter class="solr.HTMLStripCharFilterFactory"/>
>        <filter class="solr.ASCIIFoldingFilterFactory"/>
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
> 
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords_en.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.LengthFilterFactory" min="2" max="999"/>
>        <filter class="solr.PatternReplaceFilterFactory"
> pattern="([^a-z0-9])" replacement=" " replace="all"/>
> 
> I am left with quite a few facet items like this
> 
> <int name="_ view">1443</int>
> <int name="view _">1599</int>
> 
> Can anyone suggest what may be going on here? I have verified that my regex
> works properly here http://www.fileformat.info/tool/regex.htm
> 
> Adam

Reply via email to