Thank you Ken, realy useful reply...I guess than an high false negative rate 
(silence) do much more harm than an high false positive rate (noise).

I would say that more than 90% of the targeted documents are in French although 
they might have some paragraphs in English but they are not half-half 
French-English within the same document. And most of them have more than 2 
pages, so I guess (you can tell me if not) with enough characters so that the 
detector operates with fair enough precision ?

Another question...can we assign, at the same time, the Tika's French Detector 
and the English Detector on the same document being parsed so it can be parsed 
with the two detector on ?

Reply via email to