> Any plan to implement this ? I mean move LanguageIdentifier class > intto nutch core.
As I already suggested it on this list, I really would like to move the LanguageIdentifier class (and profiles) to an independant Lucene sub-project (and the MimeType repository too). I don't remember why but there were some objections about this... Here is a short status of what I have in mind for next improvements with the LanguageIdentifier / MultiLanguage support : * Enhance LanguageIdentifier APIs by returning something like an ordered LangDetail[] array when guessing language (each LangDetail should contains the language code and its score) - I have a prototype version of this on my disk but I doesn't take time to finalize it * I encountered some identification problems with some specific sites (with blogger for instance), and I plan to investigate on this point. * Another pending task : the analysis (and coding) of multilingual querying support. Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
