On Wed, 28 Sep 2011, Swapna Vuppala wrote:
Am new to using Solr and Tika. Am trying to index .msg files (Outlook mails) into Solr. For this, I need a list of metadata extracted by Tika from emails. I would like to know what all fields from a .msg file are extracted by Tika's outlookextractor.

Your best bet is probably just to look at the code:
http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java

how I can customize existing parser to get more metadata (like number of attachments, count of embedded and non-embedded etc )from emails ?

If you want to know about attachments, you'll need to register a recursing Parser onto the ParserContext. This'll then be called once per attachment, and you can do whatever you want with the information at that point

Nick

Reply via email to