On Sun, 20 Nov 2011, John M wrote:
I have a .ppt file that I've renamed to be a .doc file (by only changing its extension). If I use the Tika GUI, or the command line, to extract the file metadata, then Tika correctly identifies the content type as a Powerpoint file. However, if I use the command line -d option to detect its content type, the application returns "application/msword", which is of course only superficially correct.
What version of Tika are you trying with? If it isn't 1.0, I'd suggest you upgrade and re-test. (We've made detectors pluggable like parsers fairly recently, which changed how the container aware detectors were made available and used)
Nick
