Hi, I am working in a project where Tika is getting used in a heavily multi-threaded environment. Lately, there have been some issues where character set detection in isolation gives plausible results, while running it in parallel gives results that are way off.
The root cause has not yet been found, but within the team, there was quite some finger-pointing towards Tika's thread-safety and lots of FUD especially around org.apache.tika.parser.txt.CharsetDetector. But it seems no one in our team reached out or cared to either bug report or ask on the mailing list. So just to get rid of the FUD: Is org.apache.tika.parser.txt.CharsetDetector considered to be thread-safe? (Some bugs suggest that Tika cares about thread-safety, but I could not find anything in the javadoc for CharsetDetector) Thanks and Best regards, Christian P.S.: We're building a fresh, new CharSetDetector for each byte array that should have the character set encoding detected. And only the thread that created the CharSetDetector is using it. P.P.S.: We're still using Tika 1.9.
