Cool. Sounds like you are ahead of the game. Sent from my iPhone
On Oct 10, 2013, at 13:15, Dean Jones <[email protected]> wrote: > On 10 October 2013 12:46, Ted Dunning <[email protected]> wrote: >> For language detection, you are going to have a hard time doing better than >> one of the standard packages for the purpose. See here: >> >> http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html > > Thanks for the pointer Ted. I'm a big fan of the Tika project, we use > it for content extraction already. For various reasons though, we have > rolled our own language detector (mainly, neither of these packages > cover all of the languages we need to identify - language-detection > doesn't do Catalan, Tika doesn't do Welsh). > > Dean.
