No output to solr, no running error, with my install and config of nutch

veryblues_cn Tue, 31 Jul 2012 06:45:18 -0700

My environment is win7,Tomcat 6.0 ,cygwin,nutch 1.5.1,solr 3.60
I downloaded both the nutch-1.5.1 src and bin zip package ,hadoop 0.20.0


1.configure the environment (tomcat and cygwin)

2.extract nutch-1.5.1(bin) and solr 3.0 in cygnwin/home/

3.configure solr-3.6 in tomcat(after put solr.war in tomcat and restart
tomcat), with the  solr/home
<env-entry-value>C:\cygwin\home\apache-solr-3.6.0\example\solr</env-entry-value>

4.copy the nutch command script file form nutch-1.5.1(src)/src/bin to
nutch-1.5.1(bin)/bin

5.create a new folder named 'urls' in cygwin/home/nutch-1.5.1/bin ,and a txt
file name 'nutch.txt' in it.So I can put the crawl target url
(www.google.com)in it

6.modify the regex-urlfilter.txt by adding :
+^http://([a-z0-9]*\.)*www.google.com/
+^http://\S*/

7.configure the nutch-site.xml by adding:
<configuration>
        <name>http.agent.name</name>
        <value>My Nutch Spider</value>
</configuration>

8.add value of http.agent.name in the nuthc-default.xml like below:



<property>
  <name>http.agent.name</name>
  <value>My Nutch Spider</value>

9.for the hadoop in nutch-1.5.1 is 1.0.3 , and it would some ioexception
(like some staff about ).So , I replace it with hadoop-core-0.20.0.jar ,and
I also rename the 'hadoop-core-0.20.0.jar' as 'hadoop-core-1.0.3.jar'

Above is what I do to configure nutch, but wondering if I have made any
mistake.But I run crawl without any error,if you know what's the matter is
,can you pls tell me??


Thanks all ^_^



--
View this message in context: 
http://lucene.472066.n3.nabble.com/No-output-to-solr-no-running-error-with-my-install-and-config-of-nutch-tp3998290.html
Sent from the Nutch - User mailing list archive at Nabble.com.

No output to solr, no running error, with my install and config of nutch

Reply via email to