With genuine .doc, .xls, or .ppt files, I'm not having a problem.  I
was wondering how good Tika was about being fooled with misnamed
files, and so I took a .ppt, and just changed the extension to a .doc
to see what would occur.  Using the -m option turns out to be better
than -d in this case.

John

On Sun, Nov 20, 2011 at 4:14 PM, Nick Burch <[email protected]> wrote:
> On Sun, 20 Nov 2011, John M wrote:
>>
>> I'm using a build from the 1.1 source.
>
> That's odd - with 1.1 TikaCLI will use DefaultDetector, which loads all
> available detectors including the container aware ones
>
> However, I'm not able to reproduce your problem:
>
> cd /tmp
> cp ~/test.doc C1.doc
> cp ~/test.doc C1.xls
> cp ~/test.doc C1.ppt
> cd ~/java/apache-tika/tika-app/target
> for i in /tmp/C1*; do echo ""; echo $i; java -jar tika-app-1.1-SNAPSHOT.jar
> --detect $i; done
>
> /tmp/C1.doc
> application/msword
>
> /tmp/C1.ppt
> application/vnd.ms-powerpoint
>
> /tmp/C1.xls
> application/vnd.ms-excel
>
>
> So I do get the container aware detection working properly. Not sure what's
> not working for you....
>
> Nick
>

Reply via email to