Re: French Language Detection with Tika

Claude Garceau Fri, 12 May 2017 10:59:41 -0700

Thank you Ken, realy useful reply...I guess than an high false negative rate 
(silence) do much more harm than an high false positive rate (noise).


I would say that more than 90% of the targeted documents are in French although 
they might have some paragraphs in English but they are not half-half 
French-English within the same document. And most of them have more than 2 
pages, so I guess (you can tell me if not) with enough characters so that the 
detector operates with fair enough precision ?

Another question...can we assign, at the same time, the Tika's French Detector 
and the English Detector on the same document being parsed so it can be parsed 
with the two detector on ?

Re: French Language Detection with Tika

Reply via email to