To document what I mean :

I do

MediaType mediaType = MediaType.parse(tika.detect(inputStream));
String mimeType = mediaType.getSubtype();

FAILED: getsCorrectContentType("application/vnd.ms-excel", docs/xls/en.xls)
java.lang.AssertionError: expected:<application/vnd.ms-excel> but
was:<x-tika-msoffice>

FAILED:
getsCorrectContentType("vnd.openxmlformats-officedocument.spreadsheetml.sheet",
docs/xlsx/en.xlsx)
java.lang.AssertionError:
expected:<vnd.openxmlformats-officedocument.spreadsheetml.sheet> but
was:<zip>

FAILED: getsCorrectContentType("application/msword", doc/en.doc)
java.lang.AssertionError: expected:<application/msword> but
was:<x-tika-msoffice>

FAILED:
getsCorrectContentType("application/vnd.openxmlformats-officedocument.wordprocessingml.document",
docs/docx/en.docx)
java.lang.AssertionError:
expected:<application/vnd.openxmlformats-officedocument.wordprocessingml.document>
but was:<zip>

FAILED: getsCorrectContentType("vnd.ms-powerpoint", docs/ppt/en.ppt)
java.lang.AssertionError: expected:<vnd.ms-powerpoint> but
was:<x-tika-msoffice>


Is there any way to get the actual subtype from mimetypes.xml ? Instead
of x-tika-msoffice or application/zip ?



On Sun, Aug 21, 2011 at 12:55 AM, Jakub Liska <[email protected]> wrote:

> Hey,
>
> I'd need to get the iana.org MediaType  rather than  application/zip
> or application/x-tika-msoffice for documents like, odt, ppt, pptx, xlsx etc.
>
> when doing :
>
> MediaType mediaType = MediaType.parse(tika.detect(is));
>
>
> If you look at mimetypes.xml there are mimeType elements composed of the
> iana.org mime-type, alias and "sub-class-of"
>
>    <mime-type type="application/msword">
>     <alias type="application/vnd.ms-word"/>
>     ............................
>     <glob pattern="*.doc"/>
>     <glob pattern="*.dot"/>
>     <sub-class-of type="application/x-tika-msoffice"/>
>   </mime-type>
>
>
> What is the alias about ? And how to get the iana.org mime-type name
> instead of sub-class-of type name ?
>
>
> Best regards, Jakub
>

Reply via email to