Thank you Ken, realy useful reply...I guess than an high false negative rate (silence) do much more harm than an high false positive rate (noise).
I would say that more than 90% of the targeted documents are in French although they might have some paragraphs in English but they are not half-half French-English within the same document. And most of them have more than 2 pages, so I guess (you can tell me if not) with enough characters so that the detector operates with fair enough precision ? Another question...can we assign, at the same time, the Tika's French Detector and the English Detector on the same document being parsed so it can be parsed with the two detector on ?
