Hi Erik,
but in this case the custom loader receives an HTTP Error 500 by SOLR?

Thanks,
Marcello
On 11/14/2013 04:29 PM, Erik Hatcher wrote:
Also there's a custom loader here that is the culprit:  
com.lsegroup.solr.handler.CwsExtractingDocumentLoader

On Nov 14, 2013, at 10:20, Erick Erickson <erickerick...@gmail.com> wrote:

It looks like bad data. The XML you're sending to Solr looks mal-formed, so
I
suspect this is completely outside of Solr's purview.

Best,
Erick


On Thu, Nov 14, 2013 at 9:26 AM, Marcello Lorenzi <mlore...@sorint.it>wrote:

Hi,
I have installed a Solr 4.3 instance and we have configured manifoldcf to
pass web content to the shard collection, but during the crawling we have
noticed a lot of this exception:

ERROR - 2013-11-14 15:13:57.954; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException:
XML parse error
        at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
CwsExtractingDocumentLoader.java:150)
        at org.apache.solr.handler.ContentStreamHandlerBase.
handleRequestBody(ContentStreamHandlerBase.java:74)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:135)
        at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.
handleRequest(RequestHandlers.java:242)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(
SolrDispatchFilter.java:656)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:359)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:155)
        at org.apache.catalina.core.ApplicationFilterChain.
internalDoFilter(ApplicationFilterChain.java:241)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:208)
        at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:221)
        at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:107)
        at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:155)
        at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:76)
        at org.apache.catalina.valves.AccessLogValve.invoke(
AccessLogValve.java:934)
        at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:90)
        at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:515)
        at org.apache.coyote.http11.AbstractHttp11Processor.process(
AbstractHttp11Processor.java:1012)
        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.
process(AbstractProtocol.java:642)
        at org.apache.coyote.http11.Http11NioProtocol$
Http11ConnectionHandler.process(Http11NioProtocol.java:223)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
doRun(NioEndpoint.java:1597)
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.
run(NioEndpoint.java:1555)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.tika.exception.TikaException: XML parse error
        at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:78)
        at org.apache.tika.parser.CompositeParser.parse(
CompositeParser.java:242)
        at org.apache.tika.parser.CompositeParser.parse(
CompositeParser.java:242)
        at org.apache.tika.parser.AutoDetectParser.parse(
AutoDetectParser.java:120)
        at com.lsegroup.solr.handler.CwsExtractingDocumentLoader.load(
CwsExtractingDocumentLoader.java:147)
        ... 24 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 91; columnNumber:
105; The element type "img" must be terminated by the matching end-tag
"</img>".
        at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
createSAXParseException(ErrorHandlerWrapper.java:198)
        at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.
fatalError(ErrorHandlerWrapper.java:177)
        at com.sun.org.apache.xerces.internal.impl.
XMLErrorReporter.reportError(XMLErrorReporter.java:441)
        at com.sun.org.apache.xerces.internal.impl.
XMLErrorReporter.reportError(XMLErrorReporter.java:368)
        at com.sun.org.apache.xerces.internal.impl.XMLScanner.
reportFatalError(XMLScanner.java:1388)
        at com.sun.org.apache.xerces.internal.impl.
XMLDocumentFragmentScannerImpl.scanEndElement(
XMLDocumentFragmentScannerImpl.java:1753)
        at com.sun.org.apache.xerces.internal.impl.
XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(
XMLDocumentFragmentScannerImpl.java:2951)
        at com.sun.org.apache.xerces.internal.impl.
XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
        at com.sun.org.apache.xerces.internal.impl.
XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
        at com.sun.org.apache.xerces.internal.impl.
XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl
.java:511)
        at com.sun.org.apache.xerces.internal.parsers.
XML11Configuration.parse(XML11Configuration.java:846)
        at com.sun.org.apache.xerces.internal.parsers.
XML11Configuration.parse(XML11Configuration.java:775)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.
parse(XMLParser.java:123)
        at com.sun.org.apache.xerces.internal.parsers.
AbstractSAXParser.parse(AbstractSAXParser.java:1210)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$
JAXPSAXParser.parse(SAXParserImpl.java:628)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.
parse(SAXParserImpl.java:332)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
        at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:72)
        ... 28 more

Could it be not configured correctly the SOLR collection?

Thanks,
Marcello


Reply via email to