On 10 October 2013 12:46, Ted Dunning <[email protected]> wrote:
> For language detection, you are going to have a hard time doing better than
> one of the standard packages for the purpose.  See here:
>
> http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html
>

Thanks for the pointer Ted. I'm a big fan of the Tika project, we use
it for content extraction already. For various reasons though, we have
rolled our own language detector (mainly, neither of these packages
cover all of the languages we need to identify - language-detection
doesn't do Catalan, Tika doesn't do Welsh).

Dean.

Reply via email to