Write a tokenizer that does language ID and then picks which tokenizer to
use.  Then record the language in the language id field.

What is there to elaborate?

On Fri, Jan 20, 2012 at 1:58 AM, nibing <nibing_...@hotmail.com> wrote:

> But then there occurs a problem of using analyzer in indexing. I assume
> files encoded in different language should be handled using different
> analyzers, i.e. different tokenizers and filters. Can you elaborate a
> little bit on the design that you propose, especially in how files encoded
> in different languages can be handled during the indexing?

Reply via email to