Hello, I have a .ppt file that I've renamed to be a .doc file (by only changing its extension). If I use the Tika GUI, or the command line, to extract the file metadata, then Tika correctly identifies the content type as a Powerpoint file. However, if I use the command line -d option to detect its content type, the application returns "application/msword", which is of course only superficially correct. The source code indicates that the correct type comes from a call to a parser's parse method, while the less-accurate detection comes from a call to a detector's detect method. I'm not sure if this is a feature or a bug--I didn't see anything similar when browsing through JIRA--so I thought I'd ask if the project team is aware of the detector's performance vs the parser's performance on detecting content types before I or someone else would create a bug report / feature request in JIRA.
Thanks, John Mastarone
