On 12/6/2011 6:50 PM, Nick Burch wrote:
On Tue, 6 Dec 2011, P. Hill wrote:
at
org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94)
at
org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77)
[rest of stack trace removed]
You've alas snipped the interesting bit, which is what the parser
broke on
Further note on the type of message, it was a many-level nested reply
chain generated by I believe Outlook for all coorespondants. The
attached PDF itself parses in all versions of tika-app.
Wow, really? You wanted to see the AWT call? Probably not, but here is
the trace to swing followed by the cause.
org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@6337bb9c
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.gui.TikaGUI.handleStream(TikaGUI.java:320)
at org.apache.tika.gui.TikaGUI.openFile(TikaGUI.java:279)
at
org.apache.tika.gui.ParsingTransferHandler.importFiles(ParsingTransferHandler.java:94)
at
org.apache.tika.gui.ParsingTransferHandler.importData(ParsingTransferHandler.java:77)
at javax.swing.TransferHandler.importData(Unknown Source)
OOPS Sorry I didn't see the cause way down there: :-)
Caused by: java.lang.NullPointerException
at
com.sun.org.apache.xml.internal.serializer.ToHTMLStream.endElement(Unknown
Source)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endElement(Unknown
Source)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at org.apache.tika.gui.TikaGUI$2.endElement(TikaGUI.java:519)
at
org.apache.tika.sax.TeeContentHandler.endElement(TeeContentHandler.java:94)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at
org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at
org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
at
org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:273)
at
org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:213)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:178)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
Try with a recent svn nightly build, and see if that fixes it. If not,
please post a problem file and the full stacktrace to a new issue in JIRA
I will try to find time to check into that.
-Paul