On Mon, 21 Nov 2011, Nick Burch wrote:
Ah, oops. More coffee needed! You're right, I wasn't seeing what I was
expecting - the file should come back as a .doc no matter the filename,
on the grounds of the content trumping the name
With the fix now in, I can confirm that my earlier test now behaves as
you'd really expect:
cp test.doc /tmp/C1.doc
cp test.doc /tmp/C1.ppt
cp test.doc /tmp/C1.xls
for i in /tmp/C1*; do echo ""; echo $i; java -jar
tika-app-1.1-SNAPSHOT.jar --detect $i; done
/tmp/C1.doc
application/msword
/tmp/C1.ppt
application/msword
/tmp/C1.xls
application/msword
Nick