Hello, After looking looking at http://issues.apache.org/jira/browse/SOLR-964, where it seems this issue has been addressed, I had another go at indexing documents containing DOCTYPE. It failed as follows.
This was using the nightly build from 21-jan 2009. The comments section within jira suggested my inital message had been replied to twice, I somehow missed them in my inbox! Regards Fergus. Jan 21, 2009 12:15:21 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Jan 21, 2009 12:15:21 PM org.apache.solr.core.SolrCore execute INFO: [jdocs] webapp=/solr path=/dataimport params={command=show-config} status=0 QTime=0 Jan 21, 2009 12:15:22 PM org.apache.solr.handler.dataimport.DocBuilder buildDocument SEVERE: Exception while processing: jc document : null org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:/Volumes/spare/ts/j/dtd/jxml/data/news/f/f2008/frp70450.xmlrows processed :0 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:180) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1325) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:613) Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory) at [row,col {unknown-source}]: [3,81] at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242) ... 27 more Caused by: com.ctc.wstx.exc.WstxParsingException: (was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such file or directory) at [row,col {unknown-source}]: [3,81] at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461) at com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:475) at com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:358) at com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3351) at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:1988) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) at org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:141) at org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$000(XPathRecordReader.java:89) at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:82) ... 28 more Jan 21, 2009 12:15:22 PM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed >Hello all, as the subject says: > DIH XPathEntityProcessor fails with docs containing <!DOCTYPE> > >This is using a solr nightly build from monday. > >INFO: Server startup in 3623 ms >Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.SolrWriter >readIndexerProperties >INFO: Read dataimport.properties >Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrCore execute >INFO: [jdocs] webapp=/solr path=/walkj params={command=full-import} status=0 >QTime=13 >Jan 16, 2009 9:54:12 AM org.apache.solr.handler.dataimport.DataImporter >doFullImport >INFO: Starting Full Import >Jan 16, 2009 9:54:12 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll >INFO: [jdocs] REMOVING ALL DOCUMENTS FROM INDEX >Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy onInit >INFO: SolrDeletionPolicy.onInit: commits:num=2 > > commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_c,version=1232026423291,generation=12,filenames=[segments_c, > _4.fnm, _4.frq, _4.prx, _4.tis, _4.tii, _4.nrm, _4.fdx, _4.fdt] > > commit{dir=/Volumes/spare/ts/solrnightlyj/data/index,segFN=segments_d,version=1232026423292,generation=13,filenames=[segments_d] >Jan 16, 2009 9:54:12 AM org.apache.solr.core.SolrDeletionPolicy updateCommits >INFO: last commit = 1232026423292 >Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DocBuilder >buildDocument >SEVERE: Exception while processing: jcurrent document : null >org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed >for xml, url:/j/dtd/jxml/data/news/2008/frp70450.xmlrows processed :0 >Processing Document # 1 > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:252) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:177) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362) >Caused by: java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: >(was java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No >such file or directory) > at [row,col {unknown-source}]: [3,81] > at > org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:242) > ... 9 more >Caused by: com.ctc.wstx.exc.WstxParsingException: (was >java.io.FileNotFoundException) /../config/jml-delivery-norm-2.1.dtd (No such >file or directory) > at [row,col {unknown-source}]: [3,81] > at > com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630) > at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:461) > at > com.ctc.wstx.sr.ValidatingStreamReader.findDtdExtSubset(ValidatingStreamReader.java:475) > at > com.ctc.wstx.sr.ValidatingStreamReader.finishDTD(ValidatingStreamReader.java:358) > at > com.ctc.wstx.sr.BasicStreamReader.skipToken(BasicStreamReader.java:3351) > at > com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:1988) > at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) > at > org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:141) > at > org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$000(XPathRecordReader.java:89) > at > org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:82) > ... 10 more >Jan 16, 2009 9:54:13 AM org.apache.solr.handler.dataimport.DataImporter >doFullImport >SEVERE: Full Import failed > >A fragment from the top of the failing document is > ><?xml version="1.0" encoding="ISO-8859-1"?> ><?xml-stylesheet type="text/xsl" >href="../../../../config/support/j-deliver.xsl"?> ><!DOCTYPE j:record SYSTEM "../../../../config/jml-delivery-norm-2.1.dtd"> ><j:record xmlns:j="http://dtd.j.com/2002/Content/" id="frp70450" >urname="record"> > <j:metadata xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="" > urname="metadata" xlink:type="simple"> > <dc:date xmlns:dc="http://purl.org/dc/elements/1.1/" > qualifier="pdate">20080131</dc:date> > >The DTD does exist at the specified location. Removing the DOCTYPE directive >fixes everything. I know that use of DOCTYPE is out of fashion, and it does >not exist in our newer documents, however there are lots of older XML docs >about! -- =============================================================== Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================