Hi,
there is already a language detection plugin, that uses ngrams to guess languages. If you have a corpus you can train this the plugin and create a 'model file', we would be happy if you can contribute this.
More information you can found in the wiki and in the mail archive.

Stefan

Am 08.01.2006 um 11:39 schrieb Sameer Tamsekar:

Hello,

 I am working on building custom analyzer and language detector
for native language("Marathi") , does anybody have idea how to extend
nutch for using this language.

Regards,

Sameer

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply via email to