Hello Tikas,
We are getting XML parse exceptions when Tika tries to index Machidden
metadata files that start with a "._" prefix.  I don't knowmuch about
these hidden files, but they are binary files and won't
parse as XML.
Should we be filtering these out before Tika tries to processthem or
is it a bug in the AutoDetectParser?

org.labkey.search.model.LuceneSearchServiceImpl$PreProcessingException:/Users/kevink/data/._somefile.xml
  at 
org.labkey.search.model.LuceneSearchServiceImpl.logAsPreProcessingException(LuceneSearchServiceImpl.java:701)
  at 
org.labkey.search.model.LuceneSearchServiceImpl.preprocess(LuceneSearchServiceImpl.java:499)
  at 
org.labkey.search.model.AbstractSearchService.preprocess(AbstractSearchService.java:883)
  at 
org.labkey.search.model.AbstractSearchService.getPreprocessedItem(AbstractSearchService.java:967)
  at 
org.labkey.search.model.AbstractSearchService$7.run(AbstractSearchService.java:1003)
  at 
java.lang.Thread.run(Thread.java:680)org.apache.tika.exception.TikaException:
XML parse error    at
org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71)    at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
   at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:145)
   at 
org.labkey.search.model.LuceneSearchServiceImpl.parse(LuceneSearchServiceImpl.java:575)
   at 
org.labkey.search.model.LuceneSearchServiceImpl.preprocess(LuceneSearchServiceImpl.java:339)
   ... 4 moreCaused by: org.xml.sax.SAXParseException: Content is not
allowed in prolog.    at
org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:196)
   at 
org.apache.xerces.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:175)
   at 
org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:394)
   at 
org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:322)
   at 
org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:281)
   at org.apache.xerces.impl.XMLScanner.reportFatalError(XMLScanner.java:1459)
   at 
org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(XMLDocumentScannerImpl.java:870)
   at 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:324)
   at 
org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:845)
   at 
org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:768)
   at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:108)
at 
org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1196)
   at 
org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:555)
   at org.apache.xerces.jaxp.SAXParserImpl.parse(SAXParserImpl.java:289)
   at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)    at
org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:65)    ...
10 more
Kevin

Reply via email to