1. yes, the config file of Solr need to be replaced by Nutch config file.
if you use Solr 4.x , you need to use use this Solr config file in conf
directory called schema-solr4.xml.
2. you can run crawl step by step. such as for testing
* bin/nutch inject ...
* bin/nutch generate ...
* bin/nutch fetch ...
* bin/nutch parse ...
* bin/nutch index ...
On Mon, Dec 16, 2013 at 3:19 PM, Junqiang Zhang <[email protected]>wrote:
> Hi,
>
> I am new to Nutch. I followed the Nutch 1.x tutorial to install the
> 1.7 version. During my installation of Nutch 1.7, I had two problems
> with the integration of Solr with Nutch.
>
>
> (1) Section 6 (Integrate Solr with Nutch) of Nutch 1.x tutorial
> basically replaces the original schema.xml file in Solr with the
> schema.xml in Nutch, and does some modification. Is it necessary to
> replace the file? After the file was replaced, I was not able to start
> Solr with the command “java -jar start.jar”.
>
>
>
> (2) If I do not replace the schema.xml, I can run “java -jar
> start.jar”. However, some exception happens after I run the following
> Solr Index command at Section 6.6 of the tutorial.
>
> bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb
> crawl/linkdb crawl/segments/*
>
> The exception is:
> Indexer: java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob.
>
> I think this exception is related to Hadoop. How to fix it?
>
>
> I hope somebody could kindly help me with the above two problems, or
> point out where I can find the answers. Thanks in advance.
>
> Regards,
> Junqiang
>
--
Don't Grow Old, Grow Up... :-)