Thank you, Konstantin. That is a wealth of information that will last me
for both my current project and the next two :)

Mark

On Thu, Jun 4, 2015 at 3:44 AM, Konstantin Gribov <[email protected]> wrote:

> Hi, Mark.
>
> If you use Tika facade you will receive all text content to ContentHandler
> passed to parse(...), including attachments. You can use
> XHTMLContentHandler to receive each part of email to it's own <div
> class="email-entry">. Tika usually parse content recursively and emits all
> to ContentHandler.
>
> If you need more fine-grained control take a look at
> RecursiveParserWrapper (
> http://tika.apache.org/1.8/api/org/apache/tika/parser/RecursiveParserWrapper.html).
> It returns metadata object for each parsed document and its children with
> content stored in that metadata object. It isn't thread safe (so create new
> object for each thread) and you have to reset it after each parse call.
> Also, this method is not suitable for large files since their content will
> be stored in memory.
>
> If you need even more fine-grained control -- use Apache James Mime4j
> (which is used in Tika itself to parse emails). If your application is
> email-centric and you don't need metadata normalization (provided by Tika)
> for email messages it can be right way. Also, each multipart message body
> can be parsed by Tika. I recommend to set at least content-type info to
> metadata object from MIME Content-Type of appropriate multipart/* headers
> before parsing it with Tika. You'll get metadata and content for each
> message part and can stream content if it's quite large.
>
> --
> Best regards,
> Konstantin Gribov
>
> чт, 4 июня 2015 г. в 8:07, Mark Kerzner <[email protected]>:
>
>> Hi,
>>
>> usually I just do new Tika().parse(myfile...), and Tika does all the work.
>>
>> Is there anything special about *.eml files? How does Tika treat
>> attachments? What would be a reference for me to read?
>>
>> Thank you
>>
>> --
>> Mark Kerzner, Managing Partner, Elephant Scale
>> <http://elephantscale.com/>
>> Mobile: 713-724-2534, Skype: mark.kerzner1
>> https://www.linkedin.com/in/markkerzner
>> To schedule a meeting with me: http://www.meetme.so/markkerzner
>>
>>


-- 
Mark Kerzner, President & CEO, SHMsoft <http://shmsoft.com/>,
To schedule a meeting with me: http://www.meetme.so/markkerzner

Mobile: 713-724-2534
Skype: mark.kerzner1
Office: One Riverway Suite 1700
Houston, TX 77056

*Privileged and Confidential *
<http://shmsoft.com/>

Reply via email to