It looks like bad data. The XML you're sending to Solr looks mal-formed, so I suspect this is completely outside of Solr's purview.
Best, Erick On Thu, Nov 14, 2013 at 9:26 AM, Marcello Lorenzi <mlore...@sorint.it>wrote: > Hi, > I have installed a Solr 4.3 instance and we have configured manifoldcf to > pass web content to the shard collection, but during the crawling we have > noticed a lot of this exception: > > ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException; > org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: > XML parse error > at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load( > CwsExtractingDocumentLoader.java:150) > at org.apache.solr.handler.ContentStreamHandlerBase. > handleRequestBody(ContentStreamHandlerBase.java:74) > at org.apache.solr.handler.RequestHandlerBase.handleRequest( > RequestHandlerBase.java:135) > at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper. > handleRequest(RequestHandlers.java:242) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) > at org.apache.solr.servlet.SolrDispatchFilter.execute( > SolrDispatchFilter.java:656) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java:359) > at org.apache.solr.servlet.SolrDispatchFilter.doFilter( > SolrDispatchFilter.java:155) > at org.apache.catalina.core.ApplicationFilterChain. > internalDoFilter(ApplicationFilterChain.java:241) > at org.apache.catalina.core.ApplicationFilterChain.doFilter( > ApplicationFilterChain.java:208) > at org.apache.catalina.core.StandardWrapperValve.invoke( > StandardWrapperValve.java:221) > at org.apache.catalina.core.StandardContextValve.invoke( > StandardContextValve.java:107) > at org.apache.catalina.core.StandardHostValve.invoke( > StandardHostValve.java:155) > at org.apache.catalina.valves.ErrorReportValve.invoke( > ErrorReportValve.java:76) > at org.apache.catalina.valves.AccessLogValve.invoke( > AccessLogValve.java:934) > at org.apache.catalina.core.StandardEngineValve.invoke( > StandardEngineValve.java:90) > at org.apache.catalina.connector.CoyoteAdapter.service( > CoyoteAdapter.java:515) > at org.apache.coyote.http11.AbstractHttp11Processor.process( > AbstractHttp11Processor.java:1012) > at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler. > process(AbstractProtocol.java:642) > at org.apache.coyote.http11.Http11NioProtocol$ > Http11ConnectionHandler.process(Http11NioProtocol.java:223) > at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor. > doRun(NioEndpoint.java:1597) > at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor. > run(NioEndpoint.java:1555) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > Caused by: org.apache.tika.exception.TikaException: XML parse error > at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78) > at org.apache.tika.parser.CompositeParser.parse( > CompositeParser.java:242) > at org.apache.tika.parser.CompositeParser.parse( > CompositeParser.java:242) > at org.apache.tika.parser.AutoDetectParser.parse( > AutoDetectParser.java:120) > at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load( > CwsExtractingDocumentLoader.java:147) > ... 24 more > Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber: > 105; The element type "img" must be terminated by the matching end-tag > "</img>". > at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper. > createSAXParseException(ErrorHandlerWrapper.java:198) > at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper. > fatalError(ErrorHandlerWrapper.java:177) > at com.sun.org.apache.xerces.internal.impl. > XMLErrorReporter.reportError(XMLErrorReporter.java:441) > at com.sun.org.apache.xerces.internal.impl. > XMLErrorReporter.reportError(XMLErrorReporter.java:368) > at com.sun.org.apache.xerces.internal.impl.XMLScanner. > reportFatalError(XMLScanner.java:1388) > at com.sun.org.apache.xerces.internal.impl. > XMLDocumentFragmentScannerImpl.scanEndElement( > XMLDocumentFragmentScannerImpl.java:1753) > at com.sun.org.apache.xerces.internal.impl. > XMLDocumentFragmentScannerImpl$FragmentContentDriver.next( > XMLDocumentFragmentScannerImpl.java:2951) > at com.sun.org.apache.xerces.internal.impl. > XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) > at com.sun.org.apache.xerces.internal.impl. > XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116) > at com.sun.org.apache.xerces.internal.impl. > XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl > .java:511) > at com.sun.org.apache.xerces.internal.parsers. > XML11Configuration.parse(XML11Configuration.java:846) > at com.sun.org.apache.xerces.internal.parsers. > XML11Configuration.parse(XML11Configuration.java:775) > at com.sun.org.apache.xerces.internal.parsers.XMLParser. > parse(XMLParser.java:123) > at com.sun.org.apache.xerces.internal.parsers. > AbstractSAXParser.parse(AbstractSAXParser.java:1210) > at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$ > JAXPSAXParser.parse(SAXParserImpl.java:628) > at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl. > parse(SAXParserImpl.java:332) > at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) > at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:72) > ... 28 more > > Could it be not configured correctly the SOLR collection? > > Thanks, > Marcello >