Since it seems there was some problems with Tika mails (at least I did not receive them for some time, and these days I got them in a batch).
Just re-raising this to get some response. On Wed, May 7, 2014 at 12:57 PM, Tamás Cservenák <[email protected]>wrote: > Hi all, > > I just created an issue > https://issues.apache.org/jira/browse/TIKA-1292 > > In short: it's about Tika Detector detecting a JAR file (correct ZIP file, > with proper magic bytes, etc) as "text/html" instead of expected > "application/java-archive". > > The reason is clear to me (we already created a PR in Nexus project for > that), but the interesting thing what bothers me is _why_ Detector behaves > correctly with tika-parsers on classpath? > > How is the presence of tika-parsers affecting the MIME magic detection and > most interestingly, why does it affects? (am aware of > added org.apache.tika.parser.pkg.ZipContainerDetector). > > Isn't MIME magic detection based on bundled tika-mimetypes.xml, where even > the globs defined for text/html (*.htm and *.html) does not match for the > JAR file above (*.jar), still, Tika selects the HTML mime type.... > > > Thanks, > ~t~ >
