Hi Nick,

Looks like misdiagnosed the problem. After instrumenting the OutlookPSTParser I 
can see that code responsible for parsing email body etc… is being called and 
that information is passed onto the ParsingEmbeddedDocumentExtractor for 
further processing. If that gives you any hints on what else I can try, please 
let me know. 

Thank you for all the help.


Anton

> On Jan 15, 2015, at 2:51 AM, Nick Burch <[email protected]> wrote:
> 
> On Wed, 14 Jan 2015, Anton Shokhrin wrote:
>> I’ve setup my SOLR instance to index Outlook PST files with OutlookPSTParser 
>> (via SOLR’s TikaEntityProcessor). I can see that the SOLR is receiving and 
>> indexing email message's related meta data like message unique id and 
>> subject but the body of the message, along with recipients and senders is 
>> nowhere to be found. From what I can tell OutlookPSTParser never even 
>> attempts to parser the email message. I am also certain that the issue is 
>> with my setup because running
>> 
>> java -jar tika-app-1.6.jar test_fie.pst
>> 
>> dumps the entire content of the pst file, message body and all.
> 
> It sounds like you haven't got recursion turned on when you're processing it 
> through SOLR. The TikaEntityProcessor code is maintained by the Apache SOLR 
> community, not the Tika one, so you'll need to ask there for help on what 
> settings to change to enable recursion
> 
> (If you were using Tika directly, you'd do it by popping something like 
> AutoDetectParser onto the ParseContext, or using something like a 
> RecursiveParserWrapper if you needed more control)
> 
> Nick

Reply via email to