[ https://issues.apache.org/jira/browse/TIKA-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760023#action_12760023 ]
Ken Krugler commented on TIKA-285: ---------------------------------- The "file" command line utility also has a pretty good set of magic byte settings - we'd looked at it when working on Krugle. FWIR, it also has a slightly more sophisticated method for processing magic bytes than what Nutch (and I guess now Tika) has. One of the issues we'd run into was the need to be able to use a regex against the header bytes to determine true file type, versus fixed offsets/values. > Update media type registry to the latest httpd mime type database > ----------------------------------------------------------------- > > Key: TIKA-285 > URL: https://issues.apache.org/jira/browse/TIKA-285 > Project: Tika > Issue Type: Improvement > Components: mime > Reporter: Jukka Zitting > > The MIME type database included in the Apache HTTP Server is one of the more > complete and accurate media type and file extension resources out there. > Their magic byte settings don't seem to be as complete as the ones in Tika, > but it would be good to check also those settings for extra information. > ... and we should contribute any of the recent Tika settings back to httpd > where they don't already know of those details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.