On Fri, 27 Jan 2012, Public Network Services wrote:
More specifically, for the 3 basic MS-Office formats, I am now getting:
- For *docx*:
application/vnd.openxmlformats-officedocument.wordprocessingml.document
- For *xlsx*
: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- For *pptx*
: vnd.openxmlformats-officedocument.presentationml.presentation
- For *doc*: application/msword
- For *xls*: application/vnd.ms-excel
- For *ppt*: application/vnd.ms-powerpoint
Those are all correct
Is that the only answer possible, or could there be another type
returned for, say, Word?
If memory serves, some of the office templates can have different
mimetypes to the normal file itself, and the macro enabled forms usually
have different mimetypes to the non-macro version.
Where do I find all the Tika type declarations and names?
The base set come from org/apache/tika/mime/tika-mimetypes.xml
The latest version of that can be seen in SVN at:
https://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Additionally, you can add extra custom mimetypes if you want, details are
at http://tika.apache.org/1.0/parser_guide.html#Add_your_MIME-Type
Nick