On 12/6/2011 6:50 PM, Nick Burch wrote:
On Tue, 6 Dec 2011, P. Hill wrote:
at org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94) at org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77)
[rest of stack trace removed]

You've alas snipped the interesting bit, which is what the parser broke on

Further note on the type of message, it was a many-level nested reply chain generated by I believe Outlook for all coorespondants. The attached PDF itself parses in all versions of tika-app.

Wow, really? You wanted to see the AWT call? Probably not, but here is the trace to swing followed by the cause. org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@6337bb9c at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
    at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:320)
    at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:279)
at org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94) at org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77)
    at javax.swing.TransferHandler.importData(Unknown Source)

OOPS Sorry I didn't see the cause way down there: :-)

Caused by: java.lang.NullPointerException
at com.sun.org.apache.xml.internal.serializer.ToHTMLStream.endElement(Unknown Source) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endElement(Unknown Source) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
    at org.apache.tika.gui.TikaGUI$2.endElement(TikaGUI.java:519)
at org.apache.tika.sax.TeeContentHandler.endElement(TeeContentHandler.java:94) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136) at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136) at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136) at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:273) at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:213) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:178) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)

Try with a recent svn nightly build, and see if that fixes it. If not, please post a problem file and the full stacktrace to a new issue in JIRA

I will try to find time to check into that.
-Paul

Reply via email to