Anyone?

Kevin


On Fri, Dec 2, 2011 at 11:34 AM, Kevin Krouse <[email protected]> wrote:
>
> Hello Tikas,
> We are getting XML parse exceptions when Tika tries to index Mac hidden
> metadata files that start with a "._" prefix.  I don't know much about
> these hidden files, but they are binary files and won't
> parse as XML.
> Should we be filtering these out before Tika tries to process them or
> is it a bug in the AutoDetectParser?
>
> org.labkey.search.model.LuceneSearchServiceImpl$PreProcessingException:/Users/kevink/data/._somefile.xml
>   at 
> org.labkey.search.model.LuceneSearchServiceImpl.logAsPreProcessingException(LuceneSearchServiceImpl.java:701)
>   at 
> org.labkey.search.model.LuceneSearchServiceImpl.preprocess(LuceneSearchServiceImpl.java:499)
>   at 
> org.labkey.search.model.AbstractSearchService.preprocess(AbstractSearchService.java:883)
>   at 
> org.labkey.search.model.AbstractSearchService.getPreprocessedItem(AbstractSearchService.java:967)
>   at 
> org.labkey.search.model.AbstractSearchService$7.run(AbstractSearchService.java:1003)
>   at 
> java.lang.Thread.run(Thread.java:680)org.apache.tika.exception.TikaException:
> XML parse error    at
> org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71)    at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
>    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:145)
>    at 
> org.labkey.search.model.LuceneSearchServiceImpl.parse(LuceneSearchServiceImpl.java:575)
>    at 
> org.labkey.search.model.LuceneSearchServiceImpl.preprocess(LuceneSearchServiceImpl.java:339)
>    ... 4 moreCaused by: org.xml.sax.SAXParseException: Content is not
> allowed in prolog.    at
> org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:196)
>    at 
> org.apache.xerces.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:175)
>    at 
> org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:394)
>    at 
> org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:322)
>    at 
> org.apache.xerces.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:281)
>    at org.apache.xerces.impl.XMLScanner.reportFatalError(XMLScanner.java:1459)
>    at 
> org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(XMLDocumentScannerImpl.java:870)
>    at 
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:324)
>    at 
> org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:845)
>    at 
> org.apache.xerces.parsers.XML11Configuration.parse(XML11Configuration.java:768)
>    at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:108)
> at 
> org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1196)
>    at 
> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:555)
>    at org.apache.xerces.jaxp.SAXParserImpl.parse(SAXParserImpl.java:289)
>    at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)    at
> org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:65)    ...
> 10 more
>
> Kevin

Reply via email to