Since it seems there was some problems with Tika mails (at least I did not
receive them for some time, and these days I got them in a batch).

Just re-raising this to get some response.


On Wed, May 7, 2014 at 12:57 PM, Tamás Cservenák <[email protected]>wrote:

> Hi all,
>
> I just created an issue
> https://issues.apache.org/jira/browse/TIKA-1292
>
> In short: it's about Tika Detector detecting a JAR file (correct ZIP file,
> with proper magic bytes, etc) as "text/html" instead of expected
> "application/java-archive".
>
> The reason is clear to me (we already created a PR in Nexus project for
> that), but the interesting thing what bothers me is _why_ Detector behaves
> correctly with tika-parsers on classpath?
>
> How is the presence of tika-parsers affecting the MIME magic detection and
> most interestingly, why does it affects? (am aware of
> added org.apache.tika.parser.pkg.ZipContainerDetector).
>
> Isn't MIME magic detection based on bundled tika-mimetypes.xml, where even
> the globs defined for text/html (*.htm and *.html) does not match for the
> JAR file above (*.jar), still, Tika selects the HTML mime type....
>
>
> Thanks,
> ~t~
>

Reply via email to