Re: Solr xml img parsing exception
Hi Jack, we have analyzed the issue and there were duplicated jar into the tomcat classpath for Tika. After the removal of the dulicated library now the search engine works as expected. Thanks for the support, Marcello On 11/14/2013 05:24 PM, Jack Krupansky wrote: The actual error appears to be: Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. So, check the input document at line 91, column 105. There should be an img tag there, but SAX is complaining that there is no matching /img. -- Jack Krupansky -Original Message- From: Marcello Lorenzi Sent: Thursday, November 14, 2013 9:26 AM To: solr-user@lucene.apache.org Subject: Solr xml img parsing exception Hi, I have installed a Solr 4.3 instance and we have configured manifoldcf to pass web content to the shard collection, but during the crawling we have noticed a lot of this exception: ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:150) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:107) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:76) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:934) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:90) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:515) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1012) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:642) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.tika.exception.TikaException: XML parse error at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:147) ... 24 more Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1753
Solr xml img parsing exception
Hi, I have installed a Solr 4.3 instance and we have configured manifoldcf to pass web content to the shard collection, but during the crawling we have noticed a lot of this exception: ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:150) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:107) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:76) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:934) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:90) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:515) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1012) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:642) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.tika.exception.TikaException: XML parse error at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:147) ... 24 more Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1753) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2951) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:846) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:775) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123) at
Re: Solr xml img parsing exception
It looks like bad data. The XML you're sending to Solr looks mal-formed, so I suspect this is completely outside of Solr's purview. Best, Erick On Thu, Nov 14, 2013 at 9:26 AM, Marcello Lorenzi mlore...@sorint.itwrote: Hi, I have installed a Solr 4.3 instance and we have configured manifoldcf to pass web content to the shard collection, but during the crawling we have noticed a lot of this exception: ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load( CwsExtractingDocumentLoader.java:150) at org.apache.solr.handler.ContentStreamHandlerBase. handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper. handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain. internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:107) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:76) at org.apache.catalina.valves.AccessLogValve.invoke( AccessLogValve.java:934) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:90) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:515) at org.apache.coyote.http11.AbstractHttp11Processor.process( AbstractHttp11Processor.java:1012) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler. process(AbstractProtocol.java:642) at org.apache.coyote.http11.Http11NioProtocol$ Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor. doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor. run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.tika.exception.TikaException: XML parse error at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78) at org.apache.tika.parser.CompositeParser.parse( CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse( CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse( AutoDetectParser.java:120) at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load( CwsExtractingDocumentLoader.java:147) ... 24 more Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper. createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper. fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl. XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl. XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner. reportFatalError(XMLScanner.java:1388) at com.sun.org.apache.xerces.internal.impl. XMLDocumentFragmentScannerImpl.scanEndElement( XMLDocumentFragmentScannerImpl.java:1753) at com.sun.org.apache.xerces.internal.impl. XMLDocumentFragmentScannerImpl$FragmentContentDriver.next( XMLDocumentFragmentScannerImpl.java:2951) at com.sun.org.apache.xerces.internal.impl. XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl. XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116) at com.sun.org.apache.xerces.internal.impl. XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl .java:511) at com.sun.org.apache.xerces.internal.parsers.
Re: Solr xml img parsing exception
Also there's a custom loader here that is the culprit: com.lsegroup.solr.handler.CwsExtractingDocumentLoader On Nov 14, 2013, at 10:20, Erick Erickson erickerick...@gmail.com wrote: It looks like bad data. The XML you're sending to Solr looks mal-formed, so I suspect this is completely outside of Solr's purview. Best, Erick On Thu, Nov 14, 2013 at 9:26 AM, Marcello Lorenzi mlore...@sorint.itwrote: Hi, I have installed a Solr 4.3 instance and we have configured manifoldcf to pass web content to the shard collection, but during the crawling we have noticed a lot of this exception: ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load( CwsExtractingDocumentLoader.java:150) at org.apache.solr.handler.ContentStreamHandlerBase. handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper. handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain. internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:107) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:76) at org.apache.catalina.valves.AccessLogValve.invoke( AccessLogValve.java:934) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:90) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:515) at org.apache.coyote.http11.AbstractHttp11Processor.process( AbstractHttp11Processor.java:1012) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler. process(AbstractProtocol.java:642) at org.apache.coyote.http11.Http11NioProtocol$ Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor. doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor. run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.tika.exception.TikaException: XML parse error at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78) at org.apache.tika.parser.CompositeParser.parse( CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse( CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse( AutoDetectParser.java:120) at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load( CwsExtractingDocumentLoader.java:147) ... 24 more Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper. createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper. fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl. XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl. XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner. reportFatalError(XMLScanner.java:1388) at com.sun.org.apache.xerces.internal.impl. XMLDocumentFragmentScannerImpl.scanEndElement( XMLDocumentFragmentScannerImpl.java:1753) at com.sun.org.apache.xerces.internal.impl. XMLDocumentFragmentScannerImpl$FragmentContentDriver.next( XMLDocumentFragmentScannerImpl.java:2951) at com.sun.org.apache.xerces.internal.impl. XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl. XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116) at
Re: Solr xml img parsing exception
Hi Erik, but in this case the custom loader receives an HTTP Error 500 by SOLR? Thanks, Marcello On 11/14/2013 04:29 PM, Erik Hatcher wrote: Also there's a custom loader here that is the culprit: com.lsegroup.solr.handler.CwsExtractingDocumentLoader On Nov 14, 2013, at 10:20, Erick Erickson erickerick...@gmail.com wrote: It looks like bad data. The XML you're sending to Solr looks mal-formed, so I suspect this is completely outside of Solr's purview. Best, Erick On Thu, Nov 14, 2013 at 9:26 AM, Marcello Lorenzi mlore...@sorint.itwrote: Hi, I have installed a Solr 4.3 instance and we have configured manifoldcf to pass web content to the shard collection, but during the crawling we have noticed a lot of this exception: ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load( CwsExtractingDocumentLoader.java:150) at org.apache.solr.handler.ContentStreamHandlerBase. handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper. handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain. internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:107) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:76) at org.apache.catalina.valves.AccessLogValve.invoke( AccessLogValve.java:934) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:90) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:515) at org.apache.coyote.http11.AbstractHttp11Processor.process( AbstractHttp11Processor.java:1012) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler. process(AbstractProtocol.java:642) at org.apache.coyote.http11.Http11NioProtocol$ Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor. doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor. run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.tika.exception.TikaException: XML parse error at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78) at org.apache.tika.parser.CompositeParser.parse( CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse( CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse( AutoDetectParser.java:120) at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load( CwsExtractingDocumentLoader.java:147) ... 24 more Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper. createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper. fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl. XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl. XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner. reportFatalError(XMLScanner.java:1388) at com.sun.org.apache.xerces.internal.impl. XMLDocumentFragmentScannerImpl.scanEndElement( XMLDocumentFragmentScannerImpl.java:1753) at com.sun.org.apache.xerces.internal.impl. XMLDocumentFragmentScannerImpl$FragmentContentDriver.next( XMLDocumentFragmentScannerImpl.java:2951) at com.sun.org.apache.xerces.internal.impl. XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl.
Re: Solr xml img parsing exception
The actual error appears to be: Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. So, check the input document at line 91, column 105. There should be an img tag there, but SAX is complaining that there is no matching /img. -- Jack Krupansky -Original Message- From: Marcello Lorenzi Sent: Thursday, November 14, 2013 9:26 AM To: solr-user@lucene.apache.org Subject: Solr xml img parsing exception Hi, I have installed a Solr 4.3 instance and we have configured manifoldcf to pass web content to the shard collection, but during the crawling we have noticed a lot of this exception: ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: XML parse error at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:150) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:221) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:107) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:155) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:76) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:934) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:90) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:515) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1012) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:642) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:223) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1597) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1555) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: org.apache.tika.exception.TikaException: XML parse error at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(CwsExtractingDocumentLoader.java:147) ... 24 more Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: 105; The element type img must be terminated by the matching end-tag /img. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1388) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1753) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2951) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116