Thanks for the info Nick, I'll have a look at that.

Best Regards,
Swapna.

-----Original Message-----
From: Nick Burch [mailto:[email protected]] 
Sent: Wednesday, September 28, 2011 4:29 PM
To: [email protected]
Subject: Re: Metadata extracted by OutlookExtractor

On Wed, 28 Sep 2011, Swapna Vuppala wrote:
> Am new to using Solr and Tika. Am trying to index .msg files (Outlook 
> mails) into Solr. For this, I need a list of metadata extracted by Tika 
> from emails. I would like to know what all fields from a .msg file are 
> extracted by Tika's outlookextractor.

Your best bet is probably just to look at the code:
http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java

> how I can customize existing parser to get more metadata (like number of 
> attachments, count of embedded and non-embedded etc )from emails ?

If you want to know about attachments, you'll need to register a recursing 
Parser onto the ParserContext. This'll then be called once per attachment, 
and you can do whatever you want with the information at that point

Nick
____________________________________________________________
Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses

Reply via email to