Hello Tikas, We are getting XML parse exceptions when Tika tries to index Machidden metadata files that start with a "._" prefix. I don't knowmuch about these hidden files, but they are binary files and won't parse as XML. Should we be filtering these out before Tika tries to processthem or is it a bug in the AutoDetectParser?
org.labkey.search.model.LuceneSearchServiceImpl$PreProcessingException:/Users/kevink/data/._somefile.xml at org.labkey.search.model.LuceneSearchServiceImpl.logAsPreProcessingException(LuceneSearchServiceImpl.java:701) at org.labkey.search.model.LuceneSearchServiceImpl.preprocess(LuceneSearchServiceImpl.java:499) at org.labkey.search.model.AbstractSearchService.preprocess(AbstractSearchService.java:883) at org.labkey.search.model.AbstractSearchService.getPreprocessedItem(AbstractSearchService.java:967) at org.labkey.search.model.AbstractSearchService$7.run(AbstractSearchService.java:1003) at java.lang.Thread.run(Thread.java:680)org.apache.tika.exception.TikaException: XML parse error at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:145) at org.labkey.search.model.LuceneSearchServiceImpl.parse(LuceneSearchServiceImpl.java:575) at org.labkey.search.model.LuceneSearchServiceImpl.preprocess(LuceneSearchServiceImpl.java:339) ... 4 moreCaused by: org.xml.sax.SAXParseException: Content is not allowed in prolog. at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:196) at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:175) at org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:394) at org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:322) at org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:281) at org.apache.xerces.impl.XMLScanner.reportFatalError(XMLScanner.java:1459) at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(XMLDocumentScannerImpl.java:870) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:324) at org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:845) at org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:768) at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:108) at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1196) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:555) at org.apache.xerces.jaxp.SAXParserImpl.parse(SAXParserImpl.java:289) at javax.xml.parsers.SAXParser.parse(SAXParser.java:198) at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:65) ... 10 more Kevin
