I replaced:
    <detector class="org.apache.tika.parser.microsoft.POIFSContainerDetector"/>
    <detector class="org.apache.tika.mime.MimeTypes"/>

with:
    <detector class="org.apache.tika.detect.DefaultDetector">
      <detector-exclude 
class="org.apache.tika.parser.pkg.ZipContainerDetector"/>
    </detector>

...and this works. So I think there's a problem in that CompositeDetector does 
not behave like DefaultDetector with the same set of detectors. 


     On Wednesday, August 12, 2015 8:47 AM, Justin <[email protected]> wrote:
   

 More information: I stepped through with debugger and found differences. When 
I use TikaConfig.getDefaultConfig(), then getDetector() returns a 
DefaultDetector whose MimeTypes successfully detects the PST. When I use my 
configuration file, then getDetector() instead returns a CompositeDetector 
whose MimeTypes fails to detect the PST. 


     On Wednesday, August 12, 2015 3:15 AM, Justin <[email protected]> wrote:
   

 
| Sorry, that was just a copy/paste omission. I have the closing tag and my 
config works for XLS, not for PST. Because the default config works, I know I 
have all the dependencies.




On Aug 12, 2015, 2:10:27 AM, Nick Burch wrote:On 12/08/15 02:07, Justin wrote:
> ---tika-config.xml---
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
> I do not get anything back from BodyContentHandler when parsing a PST
> file whereas I do when I use TikaConfig.getDefaultConfig() instead. Am I
> missing something?

Your config file looks invalid - you need to close the tag 
with a before you move onto the detectors

I'd also suggest you try some of the things listed in the 
Troubleshooting page, to ensure you really have the parsers you expected:
http://wiki.apache.org/tika/Troubleshooting%20Tika

Nick


 |



   

  

Reply via email to