All,
  Joern Kottman is working with us on [1], but I thought I'd do the
proper community thing and raise this here as well.  On Apache Tika,
we're considering switching over to OpenNLP for language detection for
tika-eval.
  We know that dumping a 100k chunk of text into OpenNLP for language
detection is stupid (thanks to Joern), however, we found that if you
do that, the detector's performance degrades dramatically, with it
detecting "che" for many languages.
  Is this expected?  Is this a bug?
  Many, many thanks for all of your work and for making your slice of
the Leipzig corpus so readily accessible!

          Cheers,

                    Tim


[1] https://issues.apache.org/jira/browse/TIKA-2790

esp:
https://issues.apache.org/jira/browse/TIKA-2790?focusedCommentId=16839443&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16839443

and
https://issues.apache.org/jira/browse/TIKA-2790?focusedCommentId=16839413&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16839413

Reply via email to