Re: Tokenization: How to Allow Multiple Strategies?

Tavi Nathanson Tue, 08 Feb 2011 15:24:30 -0800

Thanks for the suggestions! Using a new field makes sense, except it would
double the size of the index. I'd like to add additional terms, at my
discretion, only when there's ambiguity.

More specifically, do you know of any way to put multiple *tokens sets* at
the same position of the same field?

If I can tokenize "123-4567 apple" as:

[Token(123), Token(-), Token(4567), Token(apple)]
or
[Token(123-4567), Token(apple)]

...might there be a way to put [Token(123), Token(-), Token(4567)] *and*
[Token(123-4567)]  in the index in such a way that the PhraseQuery
"Token(123-4567) Token(apple)" would match the above string, *and* the
PhraseQuery "Token(123) Token(-) Token(4567) Token(apple)" would also match
it?

Thanks!
Tavi

On Tue, Feb 8, 2011 at 10:34 AM, Em <mailformailingli...@yahoo.de> wrote:

>
> Hi Tavi,
>
> if you want to use multiple tokenization strategies (different tokenizers
> so
> to speak) you have to use different fieldTypes.
>
> Maybe you have to create your own tokenizer for doing what you want or a
> PatternTokenizer might help you.
>
> However, your examples for the different positions of specific terms
> reminds
> me on the WordDelimiterFilter (see
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
> ).
>
> It does almost everything you wrote and is close to what you want, I think.
> Have a look at it.
>
> Regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tokenization-How-to-Allow-Multiple-Strategies-tp2452505p2453215.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Tokenization: How to Allow Multiple Strategies?

Reply via email to