Hi,
there is already a language detection plugin, that uses ngrams to
guess languages.
If you have a corpus you can train this the plugin and create a
'model file', we would be happy if you can contribute this.
More information you can found in the wiki and in the mail archive.
Stefan
Am 08.01.2006 um 11:39 schrieb Sameer Tamsekar:
Hello,
I am working on building custom analyzer and language detector
for native language("Marathi") , does anybody have idea how to extend
nutch for using this language.
Regards,
Sameer
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net