On the sections 7.2 (pg. 115) ... of "tika in action", they talk in
very general terms about that theme and mentioned that tika currently
uses n-grams but may change the underlying algorithm in the future

 Could you/committers/the autors share a little more about tika's
language detection internals and/or your probable future
decisions/plans?

 thanks
 lbrtchx

Reply via email to