On Fri, 27 Jan 2012, Public Network Services wrote:
More specifically, for the 3 basic MS-Office formats, I am now getting:

  - For *docx*:
  application/vnd.openxmlformats-officedocument.wordprocessingml.document
  - For *xlsx*
  : application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  - For *pptx*
  : vnd.openxmlformats-officedocument.presentationml.presentation
  - For *doc*: application/msword
  - For *xls*: application/vnd.ms-excel
  - For *ppt*: application/vnd.ms-powerpoint

Those are all correct

Is that the only answer possible, or could there be another type returned for, say, Word?

If memory serves, some of the office templates can have different mimetypes to the normal file itself, and the macro enabled forms usually have different mimetypes to the non-macro version.


Where do I find all the Tika type declarations and names?

The base set come from org/apache/tika/mime/tika-mimetypes.xml

The latest version of that can be seen in SVN at:
https://svn.apache.org/repos/asf/tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

Additionally, you can add extra custom mimetypes if you want, details are at http://tika.apache.org/1.0/parser_guide.html#Add_your_MIME-Type

Nick

Reply via email to