Hi Markus,

Thanks very much for your response.  

I did what you suggested but did not see anything missing in the first few 
bytes.  

Because I have the same setup on my local machine I was curious to see what 
would happen if I copied the directory containing the segments created from the 
crawl  (on my local machine) of a seed file.  Once copied I issued the 
following commands to index into Solr.  To do so on the server, I did:   

        1.  Double-click Cygwin.bat file to open command window.
        2. CD to nutch home directory.
        3. Issue command: export JAVA_HOME=F:/Programs/jdk1.7.0_80
        4. Issue command: bin/nutch solrindex 
http://fegddd.enther.rlco.gov/solr/collection1_tst crawls/crawlsitemap/crawldb 
-linkdb crawls/crawlsitemap/linkdb crawls/crawlsitemap/segments/*

I received a slightly different error this time.  In the Hadoop.log I received:
        WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Bad Request

Bad Request

request: 
http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&version=2
        at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
        at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
        at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
        at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
        at 
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
        at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
2016-09-21 08:39:55,499 ERROR indexer.IndexingJob - Indexer: 
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)

And in solr.log I received:
        ERROR - 2016-09-21 08:39:54.509; org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
        at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
        at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
        at 
org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:850)
        at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
        at 
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2500)
        at 
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2489)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at 
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
        at 
com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
        at 
com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134)
        at 
com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040)
        at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
        at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213)
        at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
        ... 23 more

Now I am totally at a loss.  I thought it might be my setup, but when I 
compared them they are the same.  

Any light you may be able to shed on what is wrong will be greatly appreciated.

Thanks,
Jackie

-----Original Message-----
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Friday, August 12, 2016 3:00 PM
To: user@nutch.apache.org
Subject: RE: Error while attempting to add documents to Solr

Hello Jacquelyn,

This is very odd:

> Unexpected EOF in prolog
> at [row,col {unknown-source}]: [1,0]

We've fixed this problem a long time ago. It was a problem of non-unicode 
codepoints in the data sent to Solr. The Solr indexing plugin strips them all, 
and to my knowledge, there are no other non-unicode codepoints to strip.

What you can do to analyze the problem is to use debug or even trace logging, 
so you can see the exact XML Nutch is sending on the wire, and use a hexeditor 
to check for position 1,0, well, the first few bytes.

Markus

 
 
-----Original message-----
> From:Richardson, Jacquelyn F. <fluke...@ornl.gov>
> Sent: Friday 12th August 2016 19:37
> To: user@nutch.apache.org
> Subject: Error while attempting to add documents to Solr
> 
> Hi All,
> 
> Some background information that maybe of some help.  I have Cygwin64, Solr 
> 4.7, apache Nutch 1.9 source and tomcat configured in a Windows 7 
> environment.  This setup works well on my local machine.  I can crawl the 
> specified web page(s) and Nutch can successfully index the content to Solr.
> 
> I moved this setup to one of our servers (except tomcat 8; it was already on 
> the server and the OS is Windows Server 2008).  I executed a crawl of a seed 
> file using the individual Nutch commands.  Everything worked fine until I ran 
> the command to index the content to Solr.  I issued the following command:
> bin/nutch solrindex http://fegddd.enther.rlco.gov/solr/collection1_tst 
> crawls/crawlsitemap/crawldb -linkdb crawls/crawlsitemap/linkdb 
> crawls/crawlsitemap/segments/*
> 
> I received the following error in haddoop.log:
>                 WARN  mapred.LocalJobRunner - job_local_0001 
> org.apache.solr.common.SolrException: Bad Request
> 
> Bad Request
> 
> request: 
> http://fegddd.enther.rlco.gov/solr/collection1_tst/update?wt=javabin&v
> ersion=2
> 
> Solr.log reports this error:
>                 INFO  - 2016-08-12 07:18:27.656; 
> org.apache.solr.update.processor.LogUpdateProcessor; [collection1_tst] 
> webapp=/solr path=/update params={wt=javabin&version=2} {} 0 62 ERROR - 
> 2016-08-12 07:18:27.656; org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: Unexpected EOF in prolog at [row,col 
> {unknown-source}]: [1,0]
>                 at 
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
>                 at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>                 at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>                 at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>                 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916)
>                 at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780)
>                 at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
>                 at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
>                 at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
>                 at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>                 at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
>                 at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
>                 at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
>                 at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
>                 at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
>                 at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
>                 at 
> org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:850)
>                 at 
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
>                 at 
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2500)
>                 at 
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2489)
>                 at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>                 at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>                 at 
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>                 at java.lang.Thread.run(Thread.java:745)
> Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog 
> at [row,col {unknown-source}]: [1,0]
>                 at 
> com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
>                 at 
> com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134)
>                 at 
> com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040)
>                 at 
> com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
>                 at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213)
>                 at 
> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
> 
> I have compared the setup on my local machine with the setup on the server 
> machine and I cannot see a difference.  I thought perhaps it had something to 
> do with the solrindex-mapping.xml file but what is on the server agrees with 
> what I have on my local machine.
> 
> Any help you can provide will be most appreciated.
> 
> Thanks,
> Jackie
> 
> 

Reply via email to