Re: newbie question re solr.PatternReplaceFilterFactory

2017-05-10 Thread Erick Erickson
First use PatternReplaceCharFilterFactory. The difference is that
PatternReplaceCharFilterFactoryworks on the entire input whereas
PatternReplaceFilterFactory works only on the tokens emitted by the
tokenizer. Concrete example using WhitespeceTokenizerFactory would be
this [is some ] text
PatternReplaceFilterFactory would see 5 tokens, "this", "[is", "some",
"]", and "text". So it would be very hard to do what you want.

patternReplaceCharFilterFactory will see the entire input as one
string and operate on it, _then" send it through the tokenizer.

And also don't be fooled by the fact that the _stored_ data will still
contain the removed words. So when you get the doc back from solr
you'll see the original input, brackets and all. In the above example,
if you returned the field you'd still see

this [is some ] text

when the doc matched. This doc would be found when searching for
"this" or "text", but _not_ when searching for "is" or "some".

You want some pattern like
  

Best,
Erick

On Wed, May 10, 2017 at 6:08 PM, Michael Tobias  wrote:
> I am sure this is very simple but I cannot get the pattern right.
>
> How can I use solr.PatternReplaceFilterFactory to remove all words in 
> brackets from being indexed?
>
> eg [ignore this]
>
> thanks
>
> Michael
>


newbie question re solr.PatternReplaceFilterFactory

2017-05-10 Thread Michael Tobias
I am sure this is very simple but I cannot get the pattern right.

How can I use solr.PatternReplaceFilterFactory to remove all words in brackets 
from being indexed?

eg [ignore this]

thanks

Michael