There is two way LanguageIndentifier plugin uses to find language:

1. HTML tags.(Detect)
2. Statistical language identification (identify)

When plugin looks html tag for language description, it uses
http://svn.apache.org/viewvc/nutch/trunk/src/plugin/language-identifier/src/java/org/apache/nutch/analysis/lang/langmappings.properties?view=markup

If plugin can not find language description in html tags, it uses
statistical language identification to extract page language(Tika
LangualeIndentifier).
http://svn.apache.org/viewvc/tika/trunk/tika-core/src/main/resources/org/apache/tika/language/tika.language.properties?revision=1181278&view=markup


On Mon, Jun 3, 2013 at 5:26 PM, Tejas Patil <[email protected]>wrote:

>
> http://svn.apache.org/viewvc/nutch/trunk/src/plugin/language-identifier/src/java/org/apache/nutch/analysis/lang/langmappings.properties?view=markup
>
>
> On Mon, Jun 3, 2013 at 6:35 AM, H. Coskun Gunduz
> <[email protected]>wrote:
>
> > Hi,
> >
> > I'm looking for the list of Implemented Languages in Language Identifier
> > Plugin.
> >
> > There's a list in wiki page [1] but the page last edited almost four
> years
> > ago. I'm not sure if the list there is up-to-date or not.
> >
> > Any help will be appreciated.
> >
> > Thanks.
> >
> > coskun...
> >
> > [1]http://wiki.apache.org/**nutch/LanguageIdentifierPlugin<
> http://wiki.apache.org/nutch/LanguageIdentifierPlugin>
> >
>

Reply via email to