On 12/7/2011 11:15 AM, Michael McCandless wrote:
This looks just like:https://issues.apache.org/jira/browse/TIKA-801 Likely Tika's parser is (incorrectly) producing invalid XHTML tags for your document... when you open the Jira issue can you attach the problematic document? Thanks.
Maybe, because ironically enough the document is an actual e-mail exchange between the CTO of our company and an alpha test customer about a non-discloser agreement. :-(
If I could binary edit it to drop the names referenced and drop the actual NDA attached document I might be able to generate an example that fails. I will experiment with things like forwarding it without the attachment and then hacking some bytes. If it still fails, I'll send it your way.
-Paul
