Christian,
If you could open an account on JIRA, it would be helpful for discussion on
this issue. Thank you, again.
Best,
Tim
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Monday, July 25, 2016 6:01 PM
To: [email protected]
Subject: Is Tika (especially CharsetDetector) considered thread-safe?
Hi,
I am working in a project where Tika is getting used in a heavily
multi-threaded environment. Lately, there have been some issues where character
set detection in isolation gives plausible results, while running it in
parallel gives results that are way off.
The root cause has not yet been found, but within the team, there was quite
some finger-pointing towards Tika's thread-safety and lots of FUD especially
around org.apache.tika.parser.txt.CharsetDetector.
But it seems no one in our team reached out or cared to either bug report or
ask on the mailing list.
So just to get rid of the FUD: Is
org.apache.tika.parser.txt.CharsetDetector considered to be thread-safe?
(Some bugs suggest that Tika cares about thread-safety, but I could not find
anything in the javadoc for CharsetDetector)
Thanks and Best regards,
Christian
P.S.: We're building a fresh, new CharSetDetector for each byte array that
should have the character set encoding detected. And only the thread that
created the CharSetDetector is using it.
P.P.S.: We're still using Tika 1.9.