[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791996#action_12791996 ]
Dennis Kubes commented on NUTCH-666: ------------------------------------ I don't remember exactly what the difference was, but I do remember that there was a subtle difference in the algorithms that was only noticed after creating the new tools. I think it had something to do with how the ngrams were being handled or that it was taking spaces into account. But try running the identifiers side by side, you will see there is a considerable difference. > Analysis plugins for multiple language and new Language Identifier Tool > ----------------------------------------------------------------------- > > Key: NUTCH-666 > URL: https://issues.apache.org/jira/browse/NUTCH-666 > Project: Nutch > Issue Type: Improvement > Affects Versions: 1.1 > Environment: All > Reporter: Dennis Kubes > Assignee: Dennis Kubes > Fix For: 1.1 > > Attachments: NUTCH-666-1-20081126.patch, NUTCH-666-2-20091217-nf.patch > > > Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, > russian, and thai. Also includes a new Language Identifier tool that used > the new indexing framework in NUTCH-646. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.