Hello Jackie,

I'd sure try it to be sure.

Markus

 
 
-----Original message-----
> From:Richardson, Jacquelyn F. <[email protected]>
> Sent: Wednesday 5th October 2016 20:36
> To: [email protected]
> Subject: RE: Error while attempting to add documents to Solr
> 
> Hi Markus,
> 
> If I should upgrade to the latest version of Solr (6.2.1) is it advisable to 
> upgrade my current version (1.9) of nutch?  If so, should I upgrade to the 
> latest version of nutch (1.12)?  
> 
> Jackie
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:[email protected]] 
> Sent: Wednesday, September 21, 2016 1:30 PM
> To: [email protected]
> Subject: RE: Error while attempting to add documents to Solr
> 
> Hmm, a last option would be to upgrade your Solr instance. 4x is really old 
> and it might do the trick.
> 
> Markus
> 
>  
>  
> -----Original message-----
> > From:Richardson, Jacquelyn F. <[email protected]>
> > Sent: Wednesday 21st September 2016 15:54
> > To: [email protected]
> > Subject: RE: Error while attempting to add documents to Solr
> > 
> > Hi Markus,
> > 
> > Thanks very much for your response.
> > 
> > I did what you suggested but did not see anything missing in the first 
> > few bytes.
> > 
> > Because I have the same setup on my local machine I was curious to see 
> > what would happen if I copied the directory containing the segments created 
> > from the crawl  (on my local machine) of a seed file.  Once copied I issued 
> > the following commands to index into Solr.  To do so on the server, I did:
> > 
> >  
> > 
> > 1.  Double-click Cygwin.bat file to open command window. 
> > 2. CD to nutch home directory. 
> > 3. Issue command: export JAVA_HOME=F:/Programs/jdk1.7.0_80 4. Issue 
> > command: bin/nutch solrindex 
> > http://fegddd.enther.rlco.gov/solr/collection1_tst 
> > crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb 
> > crawls/crawlsitemap/segments/*
> > 
> > I received a slightly different error this time.  In the Hadoop.log I 
> > received: 
> > WARN  mapred.LocalJobRunner - job_local_0001
> > org.apache.solr.common.SolrException: Bad Request
> > 
> > Bad Request
> > 
> > request: 
> > http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&v
> > ersion=2 at 
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common
> > sHttpSolrServer.java:430) at 
> > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(Common
> > sHttpSolrServer.java:244) at 
> > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abs
> > tractUpdateRequest.java:105) at 
> > org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWrite
> > r.java:155) at 
> > org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
> > at 
> > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputForm
> > at.java:44) at 
> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> > at 
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:21
> > 6)
> > 2016-09-21 08:39:55,499 ERROR indexer.IndexingJob - Indexer: 
> > java.io.IOException: Job failed! 
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
> > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
> > 
> > And in solr.log I received: 
> > ERROR - 2016-09-21 08:39:54.509; org.apache.solr.common.SolrException; 
> > org.apache.solr.common.SolrException: Unexpected EOF in prolog  at 
> > [row,col {unknown-source}]: [1,0] at 
> > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
> > at 
> > org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandl
> > er.java:92) at 
> > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con
> > tentStreamHandlerBase.java:74) at 
> > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> > rBase.java:135) at 
> > org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> > at 
> > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> > java:780) at 
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:427) at 
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> > .java:217) at 
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> > cationFilterChain.java:239) at 
> > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> > lterChain.java:206) at 
> > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> > lve.java:212) at 
> > org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> > lve.java:106) at 
> > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> > va:141) at 
> > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> > va:79) at 
> > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> > e.java:88) at 
> > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> > :521) at 
> > org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcesso
> > r.java:850) at 
> > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(A
> > bstractProtocol.java:674) at 
> > org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoi
> > nt.java:2500) at 
> > org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint
> > .java:2489) at 
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> > ava:1145) at 
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> > java:615) at 
> > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThr
> > ead.java:61) at java.lang.Thread.run(Thread.java:745)
> > Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog  
> > at [row,col {unknown-source}]: [1,0] at 
> > com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:68
> > 6) at 
> > com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:213
> > 4) at 
> > com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.jav
> > a:2040) at 
> > com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
> > at 
> > org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:
> > 213) at 
> > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> > ... 23 more
> > 
> > Now I am totally at a loss.  I thought it might be my setup, but when 
> > I compared them they are the same.
> > 
> > Any light you may be able to shed on what is wrong will be greatly 
> > appreciated.
> > 
> > Thanks,
> > Jackie
> > 
> > -----Original Message-----
> > From: Markus Jelsma [mailto:[email protected]]
> > Sent: Friday, August 12, 2016 3:00 PM
> > To: [email protected]
> > Subject: RE: Error while attempting to add documents to Solr
> > 
> > Hello Jacquelyn,
> > 
> > This is very odd:
> > 
> > > Unexpected EOF in prolog
> > > at [row,col {unknown-source}]: [1,0]
> > 
> > We've fixed this problem a long time ago. It was a problem of non-unicode 
> > codepoints in the data sent to Solr. The Solr indexing plugin strips them 
> > all, and to my knowledge, there are no other non-unicode codepoints to 
> > strip.
> > 
> > What you can do to analyze the problem is to use debug or even trace 
> > logging, so you can see the exact XML Nutch is sending on the wire, and use 
> > a hexeditor to check for position 1,0, well, the first few bytes.
> > 
> > Markus
> > 
> >  
> >  
> > -----Original message-----
> > > From:Richardson, Jacquelyn F. <[email protected]>
> > > Sent: Friday 12th August 2016 19:37
> > > To: [email protected]
> > > Subject: Error while attempting to add documents to Solr
> > > 
> > > Hi All,
> > > 
> > > Some background information that maybe of some help.  I have Cygwin64, 
> > > Solr 4.7, apache Nutch 1.9 source and tomcat configured in a Windows 7 
> > > environment.  This setup works well on my local machine.  I can crawl the 
> > > specified web page(s) and Nutch can successfully index the content to 
> > > Solr.
> > > 
> > > I moved this setup to one of our servers (except tomcat 8; it was already 
> > > on the server and the OS is Windows Server 2008).  I executed a crawl of 
> > > a seed file using the individual Nutch commands.  Everything worked fine 
> > > until I ran the command to index the content to Solr.  I issued the 
> > > following command:
> > > bin/nutch solrindex 
> > > http://fegddd.enther.rlco.gov/solr/collection1_tst
> > > crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb
> > > crawls/crawlsitemap/segments/*
> > > 
> > > I received the following error in haddoop.log:
> > >                 WARN  mapred.LocalJobRunner - job_local_0001
> > > org.apache.solr.common.SolrException: Bad Request
> > > 
> > > Bad Request
> > > 
> > > request: 
> > > http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin
> > > &v
> > > ersion=2
> > > 
> > > Solr.log reports this error:
> > >                 INFO  - 2016-08-12 07:18:27.656;  
> > >org.apache.solr.update.processor.LogUpdateProcessor; 
> > >[collection1_tst]  webapp=/solr path=/update 
> > >params={wt=javabin&version=2} {} 0 62 ERROR - 2016-08-12 
> > >07:18:27.656; org.apache.solr.common.SolrException; 
> > >org.apache.solr.common.SolrException: Unexpected EOF in prolog at 
> > >[row,col {unknown-source}]: [1,0]
> > >                 at 
> > >org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
> > >                 at 
> > >org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHand
> > >ler.java:92)
> > >                 at 
> > >org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
> > >ntentStreamHandlerBase.java:74)
> > >                 at 
> > >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
> > >erBase.java:135)
> > >                 at 
> > >org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> > >                 at 
> > >org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
> > >.java:780)
> > >                 at 
> > >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> > >r.java:427)
> > >                 at 
> > >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> > >r.java:217)
> > >                 at 
> > >org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
> > >icationFilterChain.java:239)
> > >                 at 
> > >org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
> > >ilterChain.java:206)
> > >                 at 
> > >org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
> > >alve.java:212)
> > >                 at 
> > >org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
> > >alve.java:106)
> > >                 at 
> > >org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
> > >ava:141)
> > >                 at 
> > >org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
> > >ava:79)
> > >                 at 
> > >org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
> > >ve.java:88)
> > >                 at 
> > >org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
> > >a:521)
> > >                 at 
> > >org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcess
> > >or.java:850)
> > >                 at 
> > >org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(
> > >AbstractProtocol.java:674)
> > >                 at 
> > >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpo
> > >int.java:2500)
> > >                 at 
> > >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoin
> > >t.java:2489)
> > >                 at 
> > >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
> > >java:1145)
> > >                 at 
> > >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> > >.java:615)
> > >                 at 
> > >org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskTh
> > >read.java:61)
> > >                 at java.lang.Thread.run(Thread.java:745)
> > > Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in 
> > >prolog  at [row,col {unknown-source}]: [1,0]
> > >                 at 
> > >com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:6
> > >86)
> > >                 at 
> > >com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:21
> > >34)
> > >                 at 
> > >com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.ja
> > >va:2040)
> > >                 at 
> > >com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
> > >                 at 
> > >org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java
> > >:213)
> > >                 at
> > > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> > > 
> > > I have compared the setup on my local machine with the setup on the 
> > > server machine and I cannot see a difference.  I thought perhaps it had 
> > > something to do with the solrindex-mapping.xml file but what is on the 
> > > server agrees with what I have on my local machine.
> > > 
> > > Any help you can provide will be most appreciated.
> > > 
> > > Thanks,
> > > Jackie
> > > 
> > > 
> > 
> > 
> 
> 

Reply via email to