I replaced:
<detector class="org.apache.tika.parser.microsoft.POIFSContainerDetector"/>
<detector class="org.apache.tika.mime.MimeTypes"/>
with:
<detector class="org.apache.tika.detect.DefaultDetector">
<detector-exclude
class="org.apache.tika.parser.pkg.ZipContainerDetector"/>
</detector>
...and this works. So I think there's a problem in that CompositeDetector does
not behave like DefaultDetector with the same set of detectors.
On Wednesday, August 12, 2015 8:47 AM, Justin <[email protected]> wrote:
More information: I stepped through with debugger and found differences. When
I use TikaConfig.getDefaultConfig(), then getDetector() returns a
DefaultDetector whose MimeTypes successfully detects the PST. When I use my
configuration file, then getDetector() instead returns a CompositeDetector
whose MimeTypes fails to detect the PST.
On Wednesday, August 12, 2015 3:15 AM, Justin <[email protected]> wrote:
| Sorry, that was just a copy/paste omission. I have the closing tag and my
config works for XLS, not for PST. Because the default config works, I know I
have all the dependencies.
On Aug 12, 2015, 2:10:27 AM, Nick Burch wrote:On 12/08/15 02:07, Justin wrote:
> ---tika-config.xml---
>
>
>
>
>
>
>
>
>
>
>
>
>
> I do not get anything back from BodyContentHandler when parsing a PST
> file whereas I do when I use TikaConfig.getDefaultConfig() instead. Am I
> missing something?
Your config file looks invalid - you need to close the tag
with a before you move onto the detectors
I'd also suggest you try some of the things listed in the
Troubleshooting page, to ensure you really have the parsers you expected:
http://wiki.apache.org/tika/Troubleshooting%20Tika
Nick
|