Christian,
  If you could open an account on JIRA, it would be helpful for discussion on 
this issue.  Thank you, again.

        Best,

                Tim
       

-----Original Message-----
From: [email protected] [mailto:[email protected]] 
Sent: Monday, July 25, 2016 6:01 PM
To: [email protected]
Subject: Is Tika (especially CharsetDetector) considered thread-safe?

Hi,

I am working in a project where Tika is getting used in a heavily 
multi-threaded environment. Lately, there have been some issues where character 
set detection in isolation gives plausible results, while running it in 
parallel gives results that are way off.

The root cause has not yet been found, but within the team, there was quite 
some finger-pointing towards Tika's thread-safety and lots of FUD especially 
around org.apache.tika.parser.txt.CharsetDetector.

But it seems no one in our team reached out or cared to either bug report or 
ask on the mailing list.

So just to get rid of the FUD: Is
org.apache.tika.parser.txt.CharsetDetector considered to be thread-safe?
(Some bugs suggest that Tika cares about thread-safety, but I could not find 
anything in the javadoc for CharsetDetector)

Thanks and Best regards,
Christian


P.S.: We're building a fresh, new CharSetDetector for each byte array that 
should have the character set encoding detected. And only the thread that 
created the CharSetDetector is using it.


P.P.S.: We're still using Tika 1.9.

Reply via email to