Increase buffer size for mime type sniffing -------------------------------------------
Key: TIKA-366 URL: https://issues.apache.org/jira/browse/TIKA-366 Project: Tika Issue Type: Improvement Components: mime Affects Versions: 0.5 Environment: My local MacBook pro laptop. Reporter: Chris A. Mattmann Assignee: Chris A. Mattmann Fix For: 0.6 While working on TIKA-357 to address a similar problem for charset detection, I found an issue with mime identification having to do with the same general problem. Tika right now only deals with the first MimeTypes#getMinLength() bytes of a magic header to do the sniffing of mime type. With the example file attached from Ken Krugler, it's clear that the current min length size of 4 * 1024 bytes isn't enough. Extending it to 8K (8 * 1024 bytes) addresses this issue and seems to open up more opportunity for mime detection at little overhead cost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.