Re: Multy Language documents indexing

2007-02-23 Thread Ivan Vasilev
Thanks Erik, Here I describe about my research on this problem. It might be helpful for someone :) I will divide the problem with multiple language docs in some subproblems: *1. Determining the language in the text documents. 1.1. Determining the language in document when the whole text is in on

Re: Multy Language documents indexing

2007-02-22 Thread Erick Erickson
I know this has been discussed several times, but sure don't remember the answers. Search the mail archive for "multiple languages" and you'll find some good suggestions. But as I remember, it's not a trivial issue. But I don't see why the "three different documents" approach wouldn't work. You c

Multy Language documents indexing

2007-02-22 Thread Ivan Vasilev
Hi All, Our application that uses Lucene for indexing will be used to index documents that each of which contains parts written in different languages. For example some document could contain English, Chinese and Brazilian text. So how to index such document? Is there some best practice to do