Anyone? Kevin
On Fri, Dec 2, 2011 at 11:34 AM, Kevin Krouse <[email protected]> wrote: > > Hello Tikas, > We are getting XML parse exceptions when Tika tries to index Mac hidden > metadata files that start with a "._" prefix. I don't know much about > these hidden files, but they are binary files and won't > parse as XML. > Should we be filtering these out before Tika tries to process them or > is it a bug in the AutoDetectParser? > > org.labkey.search.model.LuceneSearchServiceImpl$PreProcessingException:/Users/kevink/data/._somefile.xml > at > org.labkey.search.model.LuceneSearchServiceImpl.logAsPreProcessingException(LuceneSearchServiceImpl.java:701) > at > org.labkey.search.model.LuceneSearchServiceImpl.preprocess(LuceneSearchServiceImpl.java:499) > at > org.labkey.search.model.AbstractSearchService.preprocess(AbstractSearchService.java:883) > at > org.labkey.search.model.AbstractSearchService.getPreprocessedItem(AbstractSearchService.java:967) > at > org.labkey.search.model.AbstractSearchService$7.run(AbstractSearchService.java:1003) > at > java.lang.Thread.run(Thread.java:680)org.apache.tika.exception.TikaException: > XML parse error at > org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71) at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:145) > at > org.labkey.search.model.LuceneSearchServiceImpl.parse(LuceneSearchServiceImpl.java:575) > at > org.labkey.search.model.LuceneSearchServiceImpl.preprocess(LuceneSearchServiceImpl.java:339) > ... 4 moreCaused by: org.xml.sax.SAXParseException: Content is not > allowed in prolog. at > org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:196) > at > org.apache.xerces.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:175) > at > org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:394) > at > org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:322) > at > org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:281) > at org.apache.xerces.impl.XMLScanner.reportFatalError(XMLScanner.java:1459) > at > org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(XMLDocumentScannerImpl.java:870) > at > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:324) > at > org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:845) > at > org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:768) > at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:108) > at > org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1196) > at > org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:555) > at org.apache.xerces.jaxp.SAXParserImpl.parse(SAXParserImpl.java:289) > at javax.xml.parsers.SAXParser.parse(SAXParser.java:198) at > org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:65) ... > 10 more > > Kevin
