On Sun, 20 Nov 2011, John M wrote:
With genuine .doc, .xls, or .ppt files, I'm not having a problem.  I
was wondering how good Tika was about being fooled with misnamed
files, and so I took a .ppt, and just changed the extension to a .doc
to see what would occur.  Using the -m option turns out to be better
than -d in this case.

Please take another look at my example. I took a .doc, renamed it, and Tika detected it just fine for me, hence my wondering why it is different for you

Nick

On Sun, Nov 20, 2011 at 4:14 PM, Nick Burch <[email protected]> wrote:
On Sun, 20 Nov 2011, John M wrote:

I'm using a build from the 1.1 source.

That's odd - with 1.1 TikaCLI will use DefaultDetector, which loads all
available detectors including the container aware ones

However, I'm not able to reproduce your problem:

cd /tmp
cp ~/test.doc C1.doc
cp ~/test.doc C1.xls
cp ~/test.doc C1.ppt
cd ~/java/apache-tika/tika-app/target
for i in /tmp/C1*; do echo ""; echo $i; java -jar tika-app-1.1-SNAPSHOT.jar
--detect $i; done

/tmp/C1.doc
application/msword

/tmp/C1.ppt
application/vnd.ms-powerpoint

/tmp/C1.xls
application/vnd.ms-excel


So I do get the container aware detection working properly. Not sure what's
not working for you....

Nick


Reply via email to