On Tue, 20 Jun 2023, Neha Kamat via user wrote:
I am currently working on an application wherein I would like to whitelist the filetypes supported by TIKA And discard rest of the files to avoid unknown behaviour/memory leaks. I am currently referring to https://cwiki.apache.org/confluence/display/TIKA/File+Types+and+Dependencies.

You may be better off using the Tika App or Tika Server options which will let you see which mime types each parser claims, which parsers you have available, and how the mime types relate to each other (more info available via Java API too)

That way you can check exactly what mime types your install supports, how they relate to each other, the impact of disabling parsers via the config file etc

Nick

Reply via email to