On Wed, 14 Jan 2015, Anton Shokhrin wrote:
I’ve setup my SOLR instance to index Outlook PST files with
OutlookPSTParser (via SOLR’s TikaEntityProcessor). I can see that the
SOLR is receiving and indexing email message's related meta data like
message unique id and subject but the body of the message, along with
recipients and senders is nowhere to be found. From what I can tell
OutlookPSTParser never even attempts to parser the email message. I am
also certain that the issue is with my setup because running
java -jar tika-app-1.6.jar test_fie.pst
dumps the entire content of the pst file, message body and all.
It sounds like you haven't got recursion turned on when you're processing
it through SOLR. The TikaEntityProcessor code is maintained by the Apache
SOLR community, not the Tika one, so you'll need to ask there for help on
what settings to change to enable recursion
(If you were using Tika directly, you'd do it by popping something like
AutoDetectParser onto the ParseContext, or using something like a
RecursiveParserWrapper if you needed more control)
Nick