[
https://issues.apache.org/jira/browse/TIKA-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534160
]
Keith R. Bennett commented on TIKA-56:
--------------------------------------
Chris -
I don't know of any such cases, but then we've reached the limits of my
knowledge of MIME types. ;)
However, if we have a utility that determines the MIME type from an extension,
my sense is that is reasonable to make the extension comparisons case
insensitive. Especially in the Windows world, there are huge numbers of files
out there with upper case extensions. To me, it makes sense for the default to
be to consider "PDF" equal to "pdf"; otherwise, we will get lots of "bugs"
reported. ;)
If there are any obscure cases where case matters, I think it may be reasonable
to require the user to use other means of determining the MIME type (have the
user determine it himself, or use "magic"?).
- Keith
> Mime type detection fails with upper case file extensions such as "PDF".
> ------------------------------------------------------------------------
>
> Key: TIKA-56
> URL: https://issues.apache.org/jira/browse/TIKA-56
> Project: Tika
> Issue Type: Bug
> Components: general
> Affects Versions: 0.1-incubator
> Reporter: Keith R. Bennett
> Priority: Critical
> Fix For: 0.1-incubator
>
>
> Mime type detection only seems to work when the file extension is lower case.
> Both PDF and DOC extensions failed.
> To test this, add the following method to TestParsers:
> public void testGetParsers() throws TikaException, MalformedURLException {
> assertNotNull(ParseUtils.getParser(new URL("file:x.pdf"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.PDF"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.doc"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.DOC"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.txt"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.TXT"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.html"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.HTML"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.HtMl"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.htm"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.HTM"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.ppt"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.PPT"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.xls"), tc));
> assertNotNull(ParseUtils.getParser(new URL("file:x.XLS"), tc));
> // more?
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.