Hmm, a last option would be to upgrade your Solr instance. 4x is really old and it might do the trick.
Markus -----Original message----- > From:Richardson, Jacquelyn F. <[email protected]> > Sent: Wednesday 21st September 2016 15:54 > To: [email protected] > Subject: RE: Error while attempting to add documents to Solr > > Hi Markus, > > Thanks very much for your response. > > I did what you suggested but did not see anything missing in the first few > bytes. > > Because I have the same setup on my local machine I was curious to see what > would happen if I copied the directory containing the segments created from > the crawl (on my local machine) of a seed file. Once copied I issued the > following commands to index into Solr. To do so on the server, I did: > > > > 1. Double-click Cygwin.bat file to open command window. > 2. CD to nutch home directory. > 3. Issue command: export JAVA_HOME=F:/Programs/jdk1.7.0_80 > 4. Issue command: bin/nutch solrindex > http://fegddd.enther.rlco.gov/solr/collection1_tst > crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb > crawls/crawlsitemap/segments/* > > I received a slightly different error this time. In the Hadoop.log I > received: > WARN mapred.LocalJobRunner - job_local_0001 > org.apache.solr.common.SolrException: Bad Request > > Bad Request > > request: > http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&version=2 > > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) > > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) > > at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > > at > org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155) > > at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118) > at > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44) > > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > 2016-09-21 08:39:55,499 ERROR indexer.IndexingJob - Indexer: > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) > > And in solr.log I received: > ERROR - 2016-09-21 08:39:54.509; org.apache.solr.common.SolrException; > org.apache.solr.common.SolrException: Unexpected EOF in prolog > at [row,col {unknown-source}]: [1,0] > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) > at > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) > > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) > > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) > > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) > > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212) > > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106) > > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) > > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521) > at > org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:850) > > at > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674) > > at > org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2500) > > at > org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2489) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) > > at java.lang.Thread.run(Thread.java:745) > Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog > at [row,col {unknown-source}]: [1,0] > at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686) > at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134) > at > com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040) > at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) > at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213) > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) > ... 23 more > > Now I am totally at a loss. I thought it might be my setup, but when I > compared them they are the same. > > Any light you may be able to shed on what is wrong will be greatly > appreciated. > > Thanks, > Jackie > > -----Original Message----- > From: Markus Jelsma [mailto:[email protected]] > Sent: Friday, August 12, 2016 3:00 PM > To: [email protected] > Subject: RE: Error while attempting to add documents to Solr > > Hello Jacquelyn, > > This is very odd: > > > Unexpected EOF in prolog > > at [row,col {unknown-source}]: [1,0] > > We've fixed this problem a long time ago. It was a problem of non-unicode > codepoints in the data sent to Solr. The Solr indexing plugin strips them > all, and to my knowledge, there are no other non-unicode codepoints to strip. > > What you can do to analyze the problem is to use debug or even trace logging, > so you can see the exact XML Nutch is sending on the wire, and use a > hexeditor to check for position 1,0, well, the first few bytes. > > Markus > > > > -----Original message----- > > From:Richardson, Jacquelyn F. <[email protected]> > > Sent: Friday 12th August 2016 19:37 > > To: [email protected] > > Subject: Error while attempting to add documents to Solr > > > > Hi All, > > > > Some background information that maybe of some help. I have Cygwin64, Solr > > 4.7, apache Nutch 1.9 source and tomcat configured in a Windows 7 > > environment. This setup works well on my local machine. I can crawl the > > specified web page(s) and Nutch can successfully index the content to Solr. > > > > I moved this setup to one of our servers (except tomcat 8; it was already > > on the server and the OS is Windows Server 2008). I executed a crawl of a > > seed file using the individual Nutch commands. Everything worked fine > > until I ran the command to index the content to Solr. I issued the > > following command: > > bin/nutch solrindex http://fegddd.enther.rlco.gov/solr/collection1_tst > > crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb > > crawls/crawlsitemap/segments/* > > > > I received the following error in haddoop.log: > > WARN mapred.LocalJobRunner - job_local_0001 > > org.apache.solr.common.SolrException: Bad Request > > > > Bad Request > > > > request: > > http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&v > > ersion=2 > > > > Solr.log reports this error: > > INFO - 2016-08-12 07:18:27.656; > > org.apache.solr.update.processor.LogUpdateProcessor; [collection1_tst] > > webapp=/solr path=/update params={wt=javabin&version=2} {} 0 62 ERROR - > > 2016-08-12 07:18:27.656; org.apache.solr.common.SolrException; > > org.apache.solr.common.SolrException: Unexpected EOF in prolog at [row,col > > {unknown-source}]: [1,0] > > at > >org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) > > at > >org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) > > at > >org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > > at > >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) > > at > >org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) > > at > >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) > > at > >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) > > at > >org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239) > > at > >org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > > at > >org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212) > > at > >org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106) > > at > >org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141) > > at > >org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) > > at > >org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88) > > at > >org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521) > > at > >org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:850) > > at > >org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674) > > at > >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2500) > > at > >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2489) > > at > >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at > >org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog > > at [row,col {unknown-source}]: [1,0] > > at > >com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686) > > at > >com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134) > > at > >com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040) > > at > >com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) > > at > >org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213) > > at > > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) > > > > I have compared the setup on my local machine with the setup on the server > > machine and I cannot see a difference. I thought perhaps it had something > > to do with the solrindex-mapping.xml file but what is on the server agrees > > with what I have on my local machine. > > > > Any help you can provide will be most appreciated. > > > > Thanks, > > Jackie > > > > > >

