Hi team Is there a documentation available with Apache TIKA which clearly describes list of file extensions supported by a particular TIKA version? I can see file formats supported by tika under https://tika.apache.org/2.8.0/formats.html but this page doesn't give clarity around extensions covered under a particular file format. Based on supported extension list, we plan to implement filters in our application so that right set of extensions(supported) are sent to TIKA for extraction and non-supported extensions are not even sent to TIKA for processing. I am also looking for documentation which captures performance statistics and recommendations for different type of parsers currently supported by TIKA e.g. <x> parser is resource intensive and <y> parser is time consuming and so on with right set of statistics published.
Is there any common shared testdata location(something similar to govdocs or testdata maintained by TIKA) against which parser testing is done?
