Hello,

I just unpacked tika 0.8 on my Mac OS 10.6.5 system, hoping to be able to 
extract data from an xls spreadsheet. Unfortunately, I got the following error 
below.

I was excited to find tika, (I have been using python xlrb before)... I hope I 
was just unlucky to hit an error with my first parsing attempt. I would be 
grateful for any help.

-- Shaun

Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected 
RuntimeException from org.apache.tika.parser.microsoft.officepar...@61128f5a
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
        at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
        at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:231)
        at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:81)
Caused by: java.lang.NullPointerException
        at 
com.sun.org.apache.xml.internal.serializer.ToStream.writeAttrString(ToStream.java:1962)
        at 
com.sun.org.apache.xml.internal.serializer.ToStream.processAttributes(ToStream.java:1942)
        at 
com.sun.org.apache.xml.internal.serializer.ToStream.closeStartTag(ToStream.java:2425)
        at 
com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1377)
        at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.characters(TransformerHandlerImpl.java:168)
        at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)
        at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
        at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)
        at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)
        at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)
        at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)
        at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:261)
        at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:287)
        at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34)
        at 
org.apache.tika.parser.microsoft.CellDecorator.render(CellDecorator.java:34)
        at 
org.apache.tika.parser.microsoft.LinkedCell.render(LinkedCell.java:36)
        at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processExtraText(ExcelExtractor.java:423)
        at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:522)
        at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:346)
        at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:297)
        at 
org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:82)
        at 
org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:112)
        at 
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:147)
        at 
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:106)
        at 
org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:276)
        at 
org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:136)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:196)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
        ... 4 more

Reply via email to