Hello,
I just unpacked tika 0.8 on my Mac OS 10.6.5 system, hoping to be able to
extract data from an xls spreadsheet. Unfortunately, I got the following error
below.
I was excited to find tika, (I have been using python xlrb before)... I hope I
was just unlucky to hit an error with my first parsing attempt. I would be
grateful for any help.
-- Shaun
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.microsoft.officepar...@61128f5a
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:231)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:81)
Caused by: java.lang.NullPointerException
at
com.sun.org.apache.xml.internal.serializer.ToStream.writeAttrString(ToStream.java:1962)
at
com.sun.org.apache.xml.internal.serializer.ToStream.processAttributes(ToStream.java:1942)
at
com.sun.org.apache.xml.internal.serializer.ToStream.closeStartTag(ToStream.java:2425)
at
com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1377)
at
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.characters(TransformerHandlerImpl.java:168)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)
at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)
at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)
at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:261)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:287)
at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)
at
org.apache.tika.parser.microsoft.CellDecorator.render(CellDecorator.java:34)
at
org.apache.tika.parser.microsoft.LinkedCell.render(LinkedCell.java:36)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processExtraText(ExcelExtractor.java:423)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:522)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:346)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:297)
at
org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:82)
at
org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:112)
at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:147)
at
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:106)
at
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:276)
at
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:136)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:196)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
... 4 more