Hi team

Is there a documentation available with Apache TIKA which clearly describes 
list of file extensions supported by a particular TIKA version? I can see file 
formats supported by tika under https://tika.apache.org/2.8.0/formats.html but 
this page doesn't give clarity around extensions covered under a particular 
file format.
Based on supported extension list, we plan to implement filters in our 
application so that right set of extensions(supported) are sent to TIKA for 
extraction and non-supported extensions are not even sent to TIKA for 
processing. I am also looking for documentation which captures performance 
statistics and recommendations for different type of parsers currently 
supported by TIKA e.g. <x> parser is resource intensive and <y> parser is time 
consuming and so on with right set of statistics published.

Is there any common shared testdata location(something similar to govdocs or 
testdata maintained by TIKA) against which parser testing is done?





Reply via email to