Hello Jackie, I'd sure try it to be sure.
Markus -----Original message----- > From:Richardson, Jacquelyn F. <[email protected]> > Sent: Wednesday 5th October 2016 20:36 > To: [email protected] > Subject: RE: Error while attempting to add documents to Solr > > Hi Markus, > > If I should upgrade to the latest version of Solr (6.2.1) is it advisable to > upgrade my current version (1.9) of nutch? If so, should I upgrade to the > latest version of nutch (1.12)? > > Jackie > > -----Original Message----- > From: Markus Jelsma [mailto:[email protected]] > Sent: Wednesday, September 21, 2016 1:30 PM > To: [email protected] > Subject: RE: Error while attempting to add documents to Solr > > Hmm, a last option would be to upgrade your Solr instance. 4x is really old > and it might do the trick. > > Markus > > > > -----Original message----- > > From:Richardson, Jacquelyn F. <[email protected]> > > Sent: Wednesday 21st September 2016 15:54 > > To: [email protected] > > Subject: RE: Error while attempting to add documents to Solr > > > > Hi Markus, > > > > Thanks very much for your response. > > > > I did what you suggested but did not see anything missing in the first > > few bytes. > > > > Because I have the same setup on my local machine I was curious to see > > what would happen if I copied the directory containing the segments created > > from the crawl (on my local machine) of a seed file. Once copied I issued > > the following commands to index into Solr. To do so on the server, I did: > > > > > > > > 1. Double-click Cygwin.bat file to open command window. > > 2. CD to nutch home directory. > > 3. Issue command: export JAVA_HOME=F:/Programs/jdk1.7.0_80 4. Issue > > command: bin/nutch solrindex > > http://fegddd.enther.rlco.gov/solr/collection1_tst > > crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb > > crawls/crawlsitemap/segments/* > > > > I received a slightly different error this time. In the Hadoop.log I > > received: > > WARN mapred.LocalJobRunner - job_local_0001 > > org.apache.solr.common.SolrException: Bad Request > > > > Bad Request > > > > request: > > http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&v > > ersion=2 at > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common > > sHttpSolrServer.java:430) at > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common > > sHttpSolrServer.java:244) at > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abs > > tractUpdateRequest.java:105) at > > org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWrite > > r.java:155) at > > org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118) > > at > > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputForm > > at.java:44) at > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:21 > > 6) > > 2016-09-21 08:39:55,499 ERROR indexer.IndexingJob - Indexer: > > java.io.IOException: Job failed! > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) > > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) > > > > And in solr.log I received: > > ERROR - 2016-09-21 08:39:54.509; org.apache.solr.common.SolrException; > > org.apache.solr.common.SolrException: Unexpected EOF in prolog at > > [row,col {unknown-source}]: [1,0] at > > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) > > at > > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandl > > er.java:92) at > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con > > tentStreamHandlerBase.java:74) at > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle > > rBase.java:135) at > > org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) > > at > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter. > > java:780) at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter > > .java:427) at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter > > .java:217) at > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli > > cationFilterChain.java:239) at > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi > > lterChain.java:206) at > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa > > lve.java:212) at > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa > > lve.java:106) at > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja > > va:141) at > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja > > va:79) at > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv > > e.java:88) at > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java > > :521) at > > org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcesso > > r.java:850) at > > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(A > > bstractProtocol.java:674) at > > org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoi > > nt.java:2500) at > > org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint > > .java:2489) at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j > > ava:1145) at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor. > > java:615) at > > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThr > > ead.java:61) at java.lang.Thread.run(Thread.java:745) > > Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog > > at [row,col {unknown-source}]: [1,0] at > > com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:68 > > 6) at > > com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:213 > > 4) at > > com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.jav > > a:2040) at > > com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) > > at > > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java: > > 213) at > > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) > > ... 23 more > > > > Now I am totally at a loss. I thought it might be my setup, but when > > I compared them they are the same. > > > > Any light you may be able to shed on what is wrong will be greatly > > appreciated. > > > > Thanks, > > Jackie > > > > -----Original Message----- > > From: Markus Jelsma [mailto:[email protected]] > > Sent: Friday, August 12, 2016 3:00 PM > > To: [email protected] > > Subject: RE: Error while attempting to add documents to Solr > > > > Hello Jacquelyn, > > > > This is very odd: > > > > > Unexpected EOF in prolog > > > at [row,col {unknown-source}]: [1,0] > > > > We've fixed this problem a long time ago. It was a problem of non-unicode > > codepoints in the data sent to Solr. The Solr indexing plugin strips them > > all, and to my knowledge, there are no other non-unicode codepoints to > > strip. > > > > What you can do to analyze the problem is to use debug or even trace > > logging, so you can see the exact XML Nutch is sending on the wire, and use > > a hexeditor to check for position 1,0, well, the first few bytes. > > > > Markus > > > > > > > > -----Original message----- > > > From:Richardson, Jacquelyn F. <[email protected]> > > > Sent: Friday 12th August 2016 19:37 > > > To: [email protected] > > > Subject: Error while attempting to add documents to Solr > > > > > > Hi All, > > > > > > Some background information that maybe of some help. I have Cygwin64, > > > Solr 4.7, apache Nutch 1.9 source and tomcat configured in a Windows 7 > > > environment. This setup works well on my local machine. I can crawl the > > > specified web page(s) and Nutch can successfully index the content to > > > Solr. > > > > > > I moved this setup to one of our servers (except tomcat 8; it was already > > > on the server and the OS is Windows Server 2008). I executed a crawl of > > > a seed file using the individual Nutch commands. Everything worked fine > > > until I ran the command to index the content to Solr. I issued the > > > following command: > > > bin/nutch solrindex > > > http://fegddd.enther.rlco.gov/solr/collection1_tst > > > crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb > > > crawls/crawlsitemap/segments/* > > > > > > I received the following error in haddoop.log: > > > WARN mapred.LocalJobRunner - job_local_0001 > > > org.apache.solr.common.SolrException: Bad Request > > > > > > Bad Request > > > > > > request: > > > http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin > > > &v > > > ersion=2 > > > > > > Solr.log reports this error: > > > INFO - 2016-08-12 07:18:27.656; > > >org.apache.solr.update.processor.LogUpdateProcessor; > > >[collection1_tst] webapp=/solr path=/update > > >params={wt=javabin&version=2} {} 0 62 ERROR - 2016-08-12 > > >07:18:27.656; org.apache.solr.common.SolrException; > > >org.apache.solr.common.SolrException: Unexpected EOF in prolog at > > >[row,col {unknown-source}]: [1,0] > > > at > > >org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) > > > at > > >org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHand > > >ler.java:92) > > > at > > >org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co > > >ntentStreamHandlerBase.java:74) > > > at > > >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl > > >erBase.java:135) > > > at > > >org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) > > > at > > >org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter > > >.java:780) > > > at > > >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte > > >r.java:427) > > > at > > >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte > > >r.java:217) > > > at > > >org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl > > >icationFilterChain.java:239) > > > at > > >org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF > > >ilterChain.java:206) > > > at > > >org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV > > >alve.java:212) > > > at > > >org.apache.catalina.core.StandardContextValve.invoke(StandardContextV > > >alve.java:106) > > > at > > >org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j > > >ava:141) > > > at > > >org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j > > >ava:79) > > > at > > >org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal > > >ve.java:88) > > > at > > >org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav > > >a:521) > > > at > > >org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcess > > >or.java:850) > > > at > > >org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process( > > >AbstractProtocol.java:674) > > > at > > >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpo > > >int.java:2500) > > > at > > >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoin > > >t.java:2489) > > > at > > >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. > > >java:1145) > > > at > > >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor > > >.java:615) > > > at > > >org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskTh > > >read.java:61) > > > at java.lang.Thread.run(Thread.java:745) > > > Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in > > >prolog at [row,col {unknown-source}]: [1,0] > > > at > > >com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:6 > > >86) > > > at > > >com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:21 > > >34) > > > at > > >com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.ja > > >va:2040) > > > at > > >com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) > > > at > > >org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java > > >:213) > > > at > > > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) > > > > > > I have compared the setup on my local machine with the setup on the > > > server machine and I cannot see a difference. I thought perhaps it had > > > something to do with the solrindex-mapping.xml file but what is on the > > > server agrees with what I have on my local machine. > > > > > > Any help you can provide will be most appreciated. > > > > > > Thanks, > > > Jackie > > > > > > > > > > > >

