[ 
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791996#action_12791996
 ] 

Dennis Kubes commented on NUTCH-666:
------------------------------------

I don't remember exactly what the difference was, but I do remember that there 
was a subtle difference in the algorithms that was only noticed after creating 
the new tools.  I think it had something to do with how the ngrams were being 
handled or that it was taking spaces into account.  But try running the 
identifiers side by side, you will see there is a considerable difference.

> Analysis plugins for multiple language and new Language Identifier Tool
> -----------------------------------------------------------------------
>
>                 Key: NUTCH-666
>                 URL: https://issues.apache.org/jira/browse/NUTCH-666
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.1
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.1
>
>         Attachments: NUTCH-666-1-20081126.patch, NUTCH-666-2-20091217-nf.patch
>
>
> Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, 
> russian, and thai.  Also includes a new Language Identifier tool that used 
> the new indexing framework in NUTCH-646.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to