On Fri, 27 Jan 2012, Public Network Services wrote:
I had a look at the MIME types list and there are 50 different "Office" formats, including many for Microsoft Word/Excel/Powerpoint!

Yup, there are quite a few different formats (with and without macros, normal and templates etc), and they generally all have their own mimetypes

Is there any recommended strategy for reliably detecting the correct media type of such files

I might be missing something, but just use Tika and it'll tell you the mimetype for the file.

in order to use POI afterwards, for content extraction?

Assuming you don't have very specific needs, it might be simpler to let Tika call the appropriate POI code for you for content extraction

Nick

Reply via email to