With genuine .doc, .xls, or .ppt files, I'm not having a problem. I was wondering how good Tika was about being fooled with misnamed files, and so I took a .ppt, and just changed the extension to a .doc to see what would occur. Using the -m option turns out to be better than -d in this case.
John On Sun, Nov 20, 2011 at 4:14 PM, Nick Burch <[email protected]> wrote: > On Sun, 20 Nov 2011, John M wrote: >> >> I'm using a build from the 1.1 source. > > That's odd - with 1.1 TikaCLI will use DefaultDetector, which loads all > available detectors including the container aware ones > > However, I'm not able to reproduce your problem: > > cd /tmp > cp ~/test.doc C1.doc > cp ~/test.doc C1.xls > cp ~/test.doc C1.ppt > cd ~/java/apache-tika/tika-app/target > for i in /tmp/C1*; do echo ""; echo $i; java -jar tika-app-1.1-SNAPSHOT.jar > --detect $i; done > > /tmp/C1.doc > application/msword > > /tmp/C1.ppt > application/vnd.ms-powerpoint > > /tmp/C1.xls > application/vnd.ms-excel > > > So I do get the container aware detection working properly. Not sure what's > not working for you.... > > Nick >
