Re: Good example of multiple tokenizers for a single field

Robert Muir Mon, 29 Nov 2010 14:40:43 -0800

On Mon, Nov 29, 2010 at 5:35 PM, Jacob Elder <jel...@locamoda.com> wrote:
> StandardTokenizer doesn't handle some of the tokens we need, like
> @twitteruser, and as far as I can tell, doesn't handle Chinese, Japanese or
> Korean. Am I wrong about that?


it uses the unigram method for CJK ideographs... the CJKtokenizer just
uses the bigram method, its just an alternative method.

the whitespace doesnt work at all though, so give up on that!

Re: Good example of multiple tokenizers for a single field

Reply via email to