Hi,
On Tue, Jul 26, 2016 at 02:17:13AM +0000, Allison, Timothy B. wrote:
> Exactly what code are you using? How are you doing detection?
I see that you already have something working on TIKA-2041.
But for completeness' sake:
Our code is a bit convoluted.
It boils down to running the following piece of code in multiple
threads in parallel:
private String getCharset(final byte[] raw) {
CharsetDetector detector = new CharsetDetector();
detector.setText(raw);
CharsetMatch match = detector.detect();
if (match == null) {
return null;
}
return match.getName();
}
`raw` is isolated per thread. So CharsetDetector does not have the
byte array changed underneath its feet.
Best regards,
Christian