It looks like bad data. The XML you're sending to Solr looks mal-formed, so
I
suspect this is completely outside of Solr's purview.

Best,
Erick


On Thu, Nov 14, 2013 at 9:26 AM, Marcello Lorenzi <mlore...@sorint.it>wrote:

> Hi,
> I have installed a Solr 4.3 instance and we have configured manifoldcf to
> pass web content to the shard collection, but during the crawling we have
> noticed a lot of this exception:
>
> ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException;
> org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException:
> XML parse error
>         at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
> CwsExtractingDocumentLoader.java:150)
>         at org.apache.solr.handler.ContentStreamHandlerBase.
> handleRequestBody(ContentStreamHandlerBase.java:74)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(
> RequestHandlerBase.java:135)
>         at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
> handleRequest(RequestHandlers.java:242)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(
> SolrDispatchFilter.java:656)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:359)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> SolrDispatchFilter.java:155)
>         at org.apache.catalina.core.ApplicationFilterChain.
> internalDoFilter(ApplicationFilterChain.java:241)
>         at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:208)
>         at org.apache.catalina.core.StandardWrapperValve.invoke(
> StandardWrapperValve.java:221)
>         at org.apache.catalina.core.StandardContextValve.invoke(
> StandardContextValve.java:107)
>         at org.apache.catalina.core.StandardHostValve.invoke(
> StandardHostValve.java:155)
>         at org.apache.catalina.valves.ErrorReportValve.invoke(
> ErrorReportValve.java:76)
>         at org.apache.catalina.valves.AccessLogValve.invoke(
> AccessLogValve.java:934)
>         at org.apache.catalina.core.StandardEngineValve.invoke(
> StandardEngineValve.java:90)
>         at org.apache.catalina.connector.CoyoteAdapter.service(
> CoyoteAdapter.java:515)
>         at org.apache.coyote.http11.AbstractHttp11Processor.process(
> AbstractHttp11Processor.java:1012)
>         at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.
> process(AbstractProtocol.java:642)
>         at org.apache.coyote.http11.Http11NioProtocol$
> Http11ConnectionHandler.process(Http11NioProtocol.java:223)
>         at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
> doRun(NioEndpoint.java:1597)
>         at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
> run(NioEndpoint.java:1555)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> Caused by: org.apache.tika.exception.TikaException: XML parse error
>         at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
>         at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:242)
>         at org.apache.tika.parser.CompositeParser.parse(
> CompositeParser.java:242)
>         at org.apache.tika.parser.AutoDetectParser.parse(
> AutoDetectParser.java:120)
>         at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
> CwsExtractingDocumentLoader.java:147)
>         ... 24 more
> Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
> 105; The element type "img" must be terminated by the matching end-tag
> "</img>".
>         at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
> createSAXParseException(ErrorHandlerWrapper.java:198)
>         at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
> fatalError(ErrorHandlerWrapper.java:177)
>         at com.sun.org.apache.xerces.internal.impl.
> XMLErrorReporter.reportError(XMLErrorReporter.java:441)
>         at com.sun.org.apache.xerces.internal.impl.
> XMLErrorReporter.reportError(XMLErrorReporter.java:368)
>         at com.sun.org.apache.xerces.internal.impl.XMLScanner.
> reportFatalError(XMLScanner.java:1388)
>         at com.sun.org.apache.xerces.internal.impl.
> XMLDocumentFragmentScannerImpl.scanEndElement(
> XMLDocumentFragmentScannerImpl.java:1753)
>         at com.sun.org.apache.xerces.internal.impl.
> XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(
> XMLDocumentFragmentScannerImpl.java:2951)
>         at com.sun.org.apache.xerces.internal.impl.
> XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
>         at com.sun.org.apache.xerces.internal.impl.
> XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
>         at com.sun.org.apache.xerces.internal.impl.
> XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl
> .java:511)
>         at com.sun.org.apache.xerces.internal.parsers.
> XML11Configuration.parse(XML11Configuration.java:846)
>         at com.sun.org.apache.xerces.internal.parsers.
> XML11Configuration.parse(XML11Configuration.java:775)
>         at com.sun.org.apache.xerces.internal.parsers.XMLParser.
> parse(XMLParser.java:123)
>         at com.sun.org.apache.xerces.internal.parsers.
> AbstractSAXParser.parse(AbstractSAXParser.java:1210)
>         at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$
> JAXPSAXParser.parse(SAXParserImpl.java:628)
>         at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.
> parse(SAXParserImpl.java:332)
>         at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
>         at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:72)
>         ... 28 more
>
> Could it be not configured correctly the SOLR collection?
>
> Thanks,
> Marcello
>

Reply via email to