On Wed, 14 Jan 2015, Anton Shokhrin wrote:
I’ve setup my SOLR instance to index Outlook PST files with OutlookPSTParser (via SOLR’s TikaEntityProcessor). I can see that the SOLR is receiving and indexing email message's related meta data like message unique id and subject but the body of the message, along with recipients and senders is nowhere to be found. From what I can tell OutlookPSTParser never even attempts to parser the email message. I am also certain that the issue is with my setup because running

java -jar tika-app-1.6.jar test_fie.pst

dumps the entire content of the pst file, message body and all.

It sounds like you haven't got recursion turned on when you're processing it through SOLR. The TikaEntityProcessor code is maintained by the Apache SOLR community, not the Tika one, so you'll need to ask there for help on what settings to change to enable recursion

(If you were using Tika directly, you'd do it by popping something like AutoDetectParser onto the ParseContext, or using something like a RecursiveParserWrapper if you needed more control)

Nick

Reply via email to