Mike,

See the issue [TIKA-801] which you referenced below. It was easy to reproduce. I have attached a MSG file to the issue which blows chunks when you drop it onto Tika-app 1.0. The example is a one -line e-mail forwarded to myself then saved as an MSG file outside of outlook. I suspect that a simple e-mail with an attachment is complex (compound) enough to cause the same problem and it is not related particularly to compound Outlook e-mails inside e-mails, because I believe I saw it on a flat e-mail with an attachment.

Let me know if I can be of further assistance.

-Paul

On 12/7/2011 11:15 AM, Michael McCandless wrote:
This looks just like:

     https://issues.apache.org/jira/browse/TIKA-801

Likely Tika's parser is (incorrectly) producing invalid XHTML tags for
your document... when you open the Jira issue can you attach the
problematic document?  Thanks.

Mike McCandless


Reply via email to