On Mon, 6 Mar 2023, Chris Bamford via user wrote:
From both performance and thread safety points of view what is the best approach for the use / reuse of the following objects:

Tika
ParseContext
Parser
Metadata

The Tika object and/or TikaConfig object should only be created once and then re-used. Same for any Parser or Detector instances

ParseContext ought to be fine to re-use, but it's such a light-weight thing I normally create one fresh.

Metadata is normally created from scratch each time - all the entries need to be removed and recreating is typically a lot less work then removing everything


Depending on how untrustworth / variable the input you're receiving is, and the impact of an OOM or similar, you might want to look into using the Tika Server or Fork Mode or Batch Mode.

Nick

Reply via email to