[
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791979#action_12791979
]
Andrzej Bialecki commented on NUTCH-666:
-
Do you think it was related to the quality of language models that you built
(presumably the ones in the patch?) versus the ones in the Nutch plugin, or due
to a different classification algorithm? I'm trying to understand the source of
such a big difference, because AFAIK the algorithm in textcat is essentially
the same as the one we use.
Analysis plugins for multiple language and new Language Identifier Tool
---
Key: NUTCH-666
URL: https://issues.apache.org/jira/browse/NUTCH-666
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.1
Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Fix For: 1.1
Attachments: NUTCH-666-1-20081126.patch, NUTCH-666-2-20091217-nf.patch
Add analysis plugins for czech, greek, japanese, chinese, korean, dutch,
russian, and thai. Also includes a new Language Identifier tool that used
the new indexing framework in NUTCH-646.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.