Hi Markus,

If I should upgrade to the latest version of Solr (6.2.1) is it advisable to 
upgrade my current version (1.9) of nutch?  If so, should I upgrade to the 
latest version of nutch (1.12)?  

Jackie

-----Original Message-----
From: Markus Jelsma [mailto:[email protected]] 
Sent: Wednesday, September 21, 2016 1:30 PM
To: [email protected]
Subject: RE: Error while attempting to add documents to Solr

Hmm, a last option would be to upgrade your Solr instance. 4x is really old and 
it might do the trick.

Markus

 
 
-----Original message-----
> From:Richardson, Jacquelyn F. <[email protected]>
> Sent: Wednesday 21st September 2016 15:54
> To: [email protected]
> Subject: RE: Error while attempting to add documents to Solr
> 
> Hi Markus,
> 
> Thanks very much for your response.
> 
> I did what you suggested but did not see anything missing in the first 
> few bytes.
> 
> Because I have the same setup on my local machine I was curious to see 
> what would happen if I copied the directory containing the segments created 
> from the crawl  (on my local machine) of a seed file.  Once copied I issued 
> the following commands to index into Solr.  To do so on the server, I did:
> 
>  
> 
> 1.  Double-click Cygwin.bat file to open command window. 
> 2. CD to nutch home directory. 
> 3. Issue command: export JAVA_HOME=F:/Programs/jdk1.7.0_80 4. Issue 
> command: bin/nutch solrindex 
> http://fegddd.enther.rlco.gov/solr/collection1_tst 
> crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb 
> crawls/crawlsitemap/segments/*
> 
> I received a slightly different error this time.  In the Hadoop.log I 
> received: 
> WARN  mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Bad Request
> 
> Bad Request
> 
> request: 
> http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&v
> ersion=2 at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common
> sHttpSolrServer.java:430) at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common
> sHttpSolrServer.java:244) at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abs
> tractUpdateRequest.java:105) at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWrite
> r.java:155) at 
> org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
> at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputForm
> at.java:44) at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:21
> 6)
> 2016-09-21 08:39:55,499 ERROR indexer.IndexingJob - Indexer: 
> java.io.IOException: Job failed! 
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
> 
> And in solr.log I received: 
> ERROR - 2016-09-21 08:39:54.509; org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: Unexpected EOF in prolog  at 
> [row,col {unknown-source}]: [1,0] at 
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
> at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandl
> er.java:92) at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con
> tentStreamHandlerBase.java:74) at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:135) at 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> java:780) at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:427) at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:217) at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> cationFilterChain.java:239) at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> lterChain.java:206) at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> lve.java:212) at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> lve.java:106) at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> va:141) at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> va:79) at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> e.java:88) at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :521) at 
> org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcesso
> r.java:850) at 
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(A
> bstractProtocol.java:674) at 
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoi
> nt.java:2500) at 
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint
> .java:2489) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1145) at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:615) at 
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThr
> ead.java:61) at java.lang.Thread.run(Thread.java:745)
> Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog  
> at [row,col {unknown-source}]: [1,0] at 
> com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:68
> 6) at 
> com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:213
> 4) at 
> com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.jav
> a:2040) at 
> com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
> at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:
> 213) at 
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> ... 23 more
> 
> Now I am totally at a loss.  I thought it might be my setup, but when 
> I compared them they are the same.
> 
> Any light you may be able to shed on what is wrong will be greatly 
> appreciated.
> 
> Thanks,
> Jackie
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:[email protected]]
> Sent: Friday, August 12, 2016 3:00 PM
> To: [email protected]
> Subject: RE: Error while attempting to add documents to Solr
> 
> Hello Jacquelyn,
> 
> This is very odd:
> 
> > Unexpected EOF in prolog
> > at [row,col {unknown-source}]: [1,0]
> 
> We've fixed this problem a long time ago. It was a problem of non-unicode 
> codepoints in the data sent to Solr. The Solr indexing plugin strips them 
> all, and to my knowledge, there are no other non-unicode codepoints to strip.
> 
> What you can do to analyze the problem is to use debug or even trace logging, 
> so you can see the exact XML Nutch is sending on the wire, and use a 
> hexeditor to check for position 1,0, well, the first few bytes.
> 
> Markus
> 
>  
>  
> -----Original message-----
> > From:Richardson, Jacquelyn F. <[email protected]>
> > Sent: Friday 12th August 2016 19:37
> > To: [email protected]
> > Subject: Error while attempting to add documents to Solr
> > 
> > Hi All,
> > 
> > Some background information that maybe of some help.  I have Cygwin64, Solr 
> > 4.7, apache Nutch 1.9 source and tomcat configured in a Windows 7 
> > environment.  This setup works well on my local machine.  I can crawl the 
> > specified web page(s) and Nutch can successfully index the content to Solr.
> > 
> > I moved this setup to one of our servers (except tomcat 8; it was already 
> > on the server and the OS is Windows Server 2008).  I executed a crawl of a 
> > seed file using the individual Nutch commands.  Everything worked fine 
> > until I ran the command to index the content to Solr.  I issued the 
> > following command:
> > bin/nutch solrindex 
> > http://fegddd.enther.rlco.gov/solr/collection1_tst
> > crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb
> > crawls/crawlsitemap/segments/*
> > 
> > I received the following error in haddoop.log:
> >                 WARN  mapred.LocalJobRunner - job_local_0001
> > org.apache.solr.common.SolrException: Bad Request
> > 
> > Bad Request
> > 
> > request: 
> > http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin
> > &v
> > ersion=2
> > 
> > Solr.log reports this error:
> >                 INFO  - 2016-08-12 07:18:27.656;  
> >org.apache.solr.update.processor.LogUpdateProcessor; 
> >[collection1_tst]  webapp=/solr path=/update 
> >params={wt=javabin&version=2} {} 0 62 ERROR - 2016-08-12 
> >07:18:27.656; org.apache.solr.common.SolrException; 
> >org.apache.solr.common.SolrException: Unexpected EOF in prolog at 
> >[row,col {unknown-source}]: [1,0]
> >                 at 
> >org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
> >                 at 
> >org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHand
> >ler.java:92)
> >                 at 
> >org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
> >ntentStreamHandlerBase.java:74)
> >                 at 
> >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
> >erBase.java:135)
> >                 at 
> >org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> >                 at 
> >org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
> >.java:780)
> >                 at 
> >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> >r.java:427)
> >                 at 
> >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> >r.java:217)
> >                 at 
> >org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
> >icationFilterChain.java:239)
> >                 at 
> >org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
> >ilterChain.java:206)
> >                 at 
> >org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
> >alve.java:212)
> >                 at 
> >org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
> >alve.java:106)
> >                 at 
> >org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
> >ava:141)
> >                 at 
> >org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
> >ava:79)
> >                 at 
> >org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
> >ve.java:88)
> >                 at 
> >org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
> >a:521)
> >                 at 
> >org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcess
> >or.java:850)
> >                 at 
> >org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(
> >AbstractProtocol.java:674)
> >                 at 
> >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpo
> >int.java:2500)
> >                 at 
> >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoin
> >t.java:2489)
> >                 at 
> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> >java:1145)
> >                 at 
> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> >.java:615)
> >                 at 
> >org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskTh
> >read.java:61)
> >                 at java.lang.Thread.run(Thread.java:745)
> > Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in 
> >prolog  at [row,col {unknown-source}]: [1,0]
> >                 at 
> >com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:6
> >86)
> >                 at 
> >com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:21
> >34)
> >                 at 
> >com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.ja
> >va:2040)
> >                 at 
> >com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
> >                 at 
> >org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java
> >:213)
> >                 at
> > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> > 
> > I have compared the setup on my local machine with the setup on the server 
> > machine and I cannot see a difference.  I thought perhaps it had something 
> > to do with the solrindex-mapping.xml file but what is on the server agrees 
> > with what I have on my local machine.
> > 
> > Any help you can provide will be most appreciated.
> > 
> > Thanks,
> > Jackie
> > 
> > 
> 
> 

Reply via email to