Hmm, a last option would be to upgrade your Solr instance. 4x is really old and 
it might do the trick.

Markus

 
 
-----Original message-----
> From:Richardson, Jacquelyn F. <fluke...@ornl.gov>
> Sent: Wednesday 21st September 2016 15:54
> To: user@nutch.apache.org
> Subject: RE: Error while attempting to add documents to Solr
> 
> Hi Markus,
> 
> Thanks very much for your response.  
> 
> I did what you suggested but did not see anything missing in the first few 
> bytes.  
> 
> Because I have the same setup on my local machine I was curious to see what 
> would happen if I copied the directory containing the segments created from 
> the crawl  (on my local machine) of a seed file.  Once copied I issued the 
> following commands to index into Solr.  To do so on the server, I did:   
> 
>  
> 
> 1.  Double-click Cygwin.bat file to open command window. 
> 2. CD to nutch home directory. 
> 3. Issue command: export JAVA_HOME=F:/Programs/jdk1.7.0_80 
> 4. Issue command: bin/nutch solrindex 
> http://fegddd.enther.rlco.gov/solr/collection1_tst 
> crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb 
> crawls/crawlsitemap/segments/*
> 
> I received a slightly different error this time.  In the Hadoop.log I 
> received: 
> WARN  mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.SolrException: Bad Request
> 
> Bad Request
> 
> request: 
> http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&version=2
>  
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
>  
> at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>  
> at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>  
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
>  
> at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118) 
> at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
>  
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) 
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) 
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2016-09-21 08:39:55,499 ERROR indexer.IndexingJob - Indexer: 
> java.io.IOException: Job failed! 
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) 
> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) 
> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) 
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
> 
> And in solr.log I received: 
> ERROR - 2016-09-21 08:39:54.509; org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: Unexpected EOF in prolog
>  at [row,col {unknown-source}]: [1,0] 
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) 
> at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>  
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>  
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>  
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) 
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
>  
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
>  
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
>  
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
>  
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>  
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
>  
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
>  
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141) 
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) 
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
>  
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521) 
> at 
> org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:850)
>  
> at 
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
>  
> at 
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2500)
>  
> at 
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2489)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at 
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>  
> at java.lang.Thread.run(Thread.java:745)
> Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
>  at [row,col {unknown-source}]: [1,0] 
> at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686) 
> at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134) 
> at 
> com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040) 
> at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) 
> at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213) 
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) 
> ... 23 more
> 
> Now I am totally at a loss.  I thought it might be my setup, but when I 
> compared them they are the same.  
> 
> Any light you may be able to shed on what is wrong will be greatly 
> appreciated.
> 
> Thanks,
> Jackie
> 
> -----Original Message-----
> From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
> Sent: Friday, August 12, 2016 3:00 PM
> To: user@nutch.apache.org
> Subject: RE: Error while attempting to add documents to Solr
> 
> Hello Jacquelyn,
> 
> This is very odd:
> 
> > Unexpected EOF in prolog
> > at [row,col {unknown-source}]: [1,0]
> 
> We've fixed this problem a long time ago. It was a problem of non-unicode 
> codepoints in the data sent to Solr. The Solr indexing plugin strips them 
> all, and to my knowledge, there are no other non-unicode codepoints to strip.
> 
> What you can do to analyze the problem is to use debug or even trace logging, 
> so you can see the exact XML Nutch is sending on the wire, and use a 
> hexeditor to check for position 1,0, well, the first few bytes.
> 
> Markus
> 
>  
>  
> -----Original message-----
> > From:Richardson, Jacquelyn F. <fluke...@ornl.gov>
> > Sent: Friday 12th August 2016 19:37
> > To: user@nutch.apache.org
> > Subject: Error while attempting to add documents to Solr
> > 
> > Hi All,
> > 
> > Some background information that maybe of some help.  I have Cygwin64, Solr 
> > 4.7, apache Nutch 1.9 source and tomcat configured in a Windows 7 
> > environment.  This setup works well on my local machine.  I can crawl the 
> > specified web page(s) and Nutch can successfully index the content to Solr.
> > 
> > I moved this setup to one of our servers (except tomcat 8; it was already 
> > on the server and the OS is Windows Server 2008).  I executed a crawl of a 
> > seed file using the individual Nutch commands.  Everything worked fine 
> > until I ran the command to index the content to Solr.  I issued the 
> > following command:
> > bin/nutch solrindex http://fegddd.enther.rlco.gov/solr/collection1_tst 
> > crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb 
> > crawls/crawlsitemap/segments/*
> > 
> > I received the following error in haddoop.log:
> >                 WARN  mapred.LocalJobRunner - job_local_0001 
> > org.apache.solr.common.SolrException: Bad Request
> > 
> > Bad Request
> > 
> > request: 
> > http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&v
> > ersion=2
> > 
> > Solr.log reports this error:
> >                 INFO  - 2016-08-12 07:18:27.656; 
> > org.apache.solr.update.processor.LogUpdateProcessor; [collection1_tst] 
> > webapp=/solr path=/update params={wt=javabin&version=2} {} 0 62 ERROR - 
> > 2016-08-12 07:18:27.656; org.apache.solr.common.SolrException; 
> > org.apache.solr.common.SolrException: Unexpected EOF in prolog at [row,col 
> > {unknown-source}]: [1,0]
> >                 at 
> >org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
> >                 at 
> >org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> >                 at 
> >org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >                 at 
> >org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >                 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
> >                 at 
> >org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
> >                 at 
> >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
> >                 at 
> >org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
> >                 at 
> >org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
> >                 at 
> >org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >                 at 
> >org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
> >                 at 
> >org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
> >                 at 
> >org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
> >                 at 
> >org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
> >                 at 
> >org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
> >                 at 
> >org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
> >                 at 
> >org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:850)
> >                 at 
> >org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
> >                 at 
> >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2500)
> >                 at 
> >org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2489)
> >                 at 
> >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >                 at 
> >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >                 at 
> >org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
> >                 at java.lang.Thread.run(Thread.java:745)
> > Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog 
> > at [row,col {unknown-source}]: [1,0]
> >                 at 
> >com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
> >                 at 
> >com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134)
> >                 at 
> >com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040)
> >                 at 
> >com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
> >                 at 
> >org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213)
> >                 at 
> > org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> > 
> > I have compared the setup on my local machine with the setup on the server 
> > machine and I cannot see a difference.  I thought perhaps it had something 
> > to do with the solrindex-mapping.xml file but what is on the server agrees 
> > with what I have on my local machine.
> > 
> > Any help you can provide will be most appreciated.
> > 
> > Thanks,
> > Jackie
> > 
> > 
> 
> 

Reply via email to