Mike,
See the issue [TIKA-801] which you referenced below. It was easy to
reproduce. I have attached a MSG file to the issue which blows chunks
when you drop it onto Tika-app 1.0. The example is a one -line e-mail
forwarded to myself then saved as an MSG file outside of outlook. I
suspect that a simple e-mail with an attachment is complex (compound)
enough to cause the same problem and it is not related particularly to
compound Outlook e-mails inside e-mails, because I believe I saw it on a
flat e-mail with an attachment.
Let me know if I can be of further assistance.
-Paul
On 12/7/2011 11:15 AM, Michael McCandless wrote:
This looks just like:
https://issues.apache.org/jira/browse/TIKA-801
Likely Tika's parser is (incorrectly) producing invalid XHTML tags for
your document... when you open the Jira issue can you attach the
problematic document? Thanks.
Mike McCandless