Re: DelimitedBoostTokenFilterFactory Issue - Boosting and StandardTokenizerFactory

Erick Erickson Sun, 26 Apr 2020 05:37:40 -0700

This line is kind of hidden in the javadocs in DelimitedBoostTokenFilter.java:

"Note make sure your Tokenizer doesn't split on the delimiter, or this won't 
work”

So you need to use a different tokenizer. StandardTokenizer is already 
splitting on
the | character as you’ve seen.

WhitespaceTokenizer is the most intuitive, but beware that you’ll have to do 
your
own punctuation stripping, it’ll include, say, the period at the end of a 
sentence in the
last token.

I realize you’re constructing your analysis chains programmatically, but 
defining a 
similar analyzer in a schema and using the admin UI >> select core >> analysis 
page
is a great way to see exactly what each step in an analysis chain does.

And of course you’ll have to reindex…..

Best,
Erick

> On Apr 26, 2020, at 6:20 AM, Ivana Spasojevic 
> <ivanaspasojevic87...@gmail.com> wrote:
> 
> DelimitedBoostTokenFilterFactory

Re: DelimitedBoostTokenFilterFactory Issue - Boosting and StandardTokenizerFactory

Reply via email to