On Mon, 6 Mar 2023, Chris Bamford via user wrote:
From both performance and thread safety points of view what is the best
approach for the use / reuse of the following objects:
Tika
ParseContext
Parser
Metadata
The Tika object and/or TikaConfig object should only be created once and
then re-used. Same for any Parser or Detector instances
ParseContext ought to be fine to re-use, but it's such a light-weight
thing I normally create one fresh.
Metadata is normally created from scratch each time - all the entries need
to be removed and recreating is typically a lot less work then removing
everything
Depending on how untrustworth / variable the input you're receiving is,
and the impact of an OOM or similar, you might want to look into using the
Tika Server or Fork Mode or Batch Mode.
Nick