woot!

----
Chris Mattmann
[email protected]






-----Original Message-----
From: Mark Kerzner <[email protected]>
Reply-To: <[email protected]>
Date: Thursday, June 4, 2015 at 9:42 PM
To: Tika User <[email protected]>
Subject: Re: Tika parsing of emails

>Thank you, Konstantin. That is a wealth of information that will last me
>for both my current project and the next two :)
>Mark
>
>
>On Thu, Jun 4, 2015 at 3:44 AM, Konstantin Gribov <[email protected]>
>wrote:
>
>Hi, Mark.
>
>If you use Tika facade you will receive all text content to
>ContentHandler passed to parse(...), including attachments. You can use
>XHTMLContentHandler to receive each part of email to it's own <div
>class="email-entry">. Tika usually parse content recursively and emits
>all to ContentHandler.
>If you need more fine-grained control take a look at
>RecursiveParserWrapper
>(http://tika.apache.org/1.8/api/org/apache/tika/parser/RecursiveParserWrap
>per.html). It returns metadata object for each parsed document and its
>children with content stored in that metadata object. It isn't thread
>safe (so create new object for each thread) and you have to reset it
>after each parse call. Also, this method is not suitable for large files
>since their content will be stored in memory.
>
>If you need even more fine-grained control -- use Apache James Mime4j
>(which is used in Tika itself to parse emails). If your application is
>email-centric and you don't need metadata normalization (provided by
>Tika) for email messages it can be right way. Also, each multipart
>message body can be parsed by Tika. I recommend to set at least
>content-type info to metadata object from MIME Content-Type of
>appropriate multipart/* headers before parsing it with Tika. You'll get
>metadata and content for each message part and can stream content if it's
>quite large.
>
>-- Best regards,
>Konstantin Gribov
>
>
>
>чт, 4 июня 2015 г. в 8:07, Mark Kerzner <[email protected]>:
>
>
>Hi,
>usually I just do new Tika().parse(myfile...), and Tika does all the work.
>
>Is there anything special about *.eml files? How does Tika treat
>attachments? What would be a reference for me to read?
>
>Thank you
>
>
>-- 
>Mark Kerzner, Managing Partner, Elephant Scale <http://elephantscale.com/>
>Mobile: 713-724-2534 <tel:713-724-2534>, Skype: mark.kerzner1
>https://www.linkedin.com/in/markkerzner
>
>To schedule a meeting with me: http://www.meetme.so/markkerzner
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>-- 
>Mark Kerzner, President & CEO, SHMsoft <http://shmsoft.com/>,
>To schedule a meeting with me: http://www.meetme.so/markkerzner
>
>Mobile: 713-724-2534
>Skype: mark.kerzner1
>Office: One Riverway Suite 1700
>Houston, TX 77056
>
>Privileged and Confidential
> <http://shmsoft.com/>
>
>
>
>


Reply via email to