I am aware that the AutoDetectParser object is thread safe, but I wish to ask the opposite question.

Why not simply create a new AutoDetectParser one each time my (server) code gets a request to parse a file?

If there some overhead, what is it?

Is it just the overhead of checking to see if a definition (-D or environment varible) points to a tika-config.xml and possibly parsing it? I don't actually have a config file. Is there other waste or overhead? Comments?

( I noticed the discussion from Jan 2010 [ https://issues.apache.org/jira/browse/TIKA-374 ] about fixing a bug in AutoDetectParser to make it even more thread safe. I checked the code and despite the fix comment only mentioning the one issue -- new SAXParser instance -- the code also currently TikaConfig to no longer holds the mime-types in a static member. )

-Paul

Reply via email to