Paul, I think that your problem is related with 'http.agent.name' property. 
Please, change this property in your configuration file, such as describe the 
tutorial in: 



Good! You are almost ready to crawl. You need to give your crawler a name. This 
is required. 

    1. Open up $NUTCH_HOME/conf/nutch-default.xml file 
    2. 

Search for http.agent.name , and give it value 'YOURNAME Spider' 
    3. 

Optionally you may also set http.agent.url and http.agent.email properties. 

and try again. 

Grettings 

----- Mensaje original ----- 
De: "Paul van Hoven" <[email protected]> 
Para: [email protected] 
Enviados: Domingo, 10 de Julio 2011 7:42:47 GMT -08:00 Tijuana / Baja 
California 
Asunto: Problems with tutorial 

I'm completly new to nutch so I downloaded version 1.3 and worked 
through the beginners tutorial at 
http://wiki.apache.org/nutch/NutchTutorial. The first problem was that I 
did not find the file "conf/crawl-urlfilter.txt" so I omitted that and 
continued with launiching nutch. Therefore I created a plain text file 
in "/Users/toom/Downloads/nutch-1.3/crawled" called "urls.txt" which 
contains the following text: 

tom:crawled toom$ cat urls.txt 
http://nutch.apache.org/ 

So after that I invoked nutch by calling 
tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled -dir 
/Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50 
solrUrl is not set, indexing will be skipped... 
crawl started in: /Users/toom/Downloads/nutch-1.3/sites 
rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled 
threads = 10 
depth = 3 
solrUrl=null 
topN = 50 
Injector: starting at 2011-07-07 14:02:31 
Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb 
Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled 
Injector: Converting injected urls to crawl db entries. 
Injector: Merging injected urls into crawl db. 
Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03 
Generator: starting at 2011-07-07 14:02:35 
Generator: Selecting best-scoring urls due for fetch. 
Generator: filtering: true 
Generator: normalizing: true 
Generator: topN: 50 
Generator: jobtracker is 'local', generating exactly one partition. 
Generator: Partitioning selected urls for politeness. 
Generator: segment: 
/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238 
Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04 
Fetcher: No agents listed in 'http.agent.name' property. 
Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: 
No agents listed in 'http.agent.name' property. 
at 
org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166) 
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068) 
at org.apache.nutch.crawl.Crawl.run(Crawl.java:135) 
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
at org.apache.nutch.crawl.Crawl.main(Crawl.java:54) 


I do not understand what happend here, maybe one of you can help me? 



-- 



--------------------------------------------------------------------------------------------
 
Ing. Yusniel Hidalgo Delgado 
Participe en COMPUMAT 2011 http://www.mfc.uclv.edu.cu/scmc 
Participe en INFO 2012 http://www.congreso-info.cu 
Universidad de las Ciencias Informáticas 
--------------------------------------------------------------------------------------------
 

Reply via email to