Paul, I think that your problem is related with 'http.agent.name' property.
Please, change this property in your configuration file, such as describe the
tutorial in:
Good! You are almost ready to crawl. You need to give your crawler a name. This
is required.
1. Open up $NUTCH_HOME/conf/nutch-default.xml file
2.
Search for http.agent.name , and give it value 'YOURNAME Spider'
3.
Optionally you may also set http.agent.url and http.agent.email properties.
and try again.
Grettings
----- Mensaje original -----
De: "Paul van Hoven" <[email protected]>
Para: [email protected]
Enviados: Domingo, 10 de Julio 2011 7:42:47 GMT -08:00 Tijuana / Baja
California
Asunto: Problems with tutorial
I'm completly new to nutch so I downloaded version 1.3 and worked
through the beginners tutorial at
http://wiki.apache.org/nutch/NutchTutorial. The first problem was that I
did not find the file "conf/crawl-urlfilter.txt" so I omitted that and
continued with launiching nutch. Therefore I created a plain text file
in "/Users/toom/Downloads/nutch-1.3/crawled" called "urls.txt" which
contains the following text:
tom:crawled toom$ cat urls.txt
http://nutch.apache.org/
So after that I invoked nutch by calling
tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled -dir
/Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50
solrUrl is not set, indexing will be skipped...
crawl started in: /Users/toom/Downloads/nutch-1.3/sites
rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled
threads = 10
depth = 3
solrUrl=null
topN = 50
Injector: starting at 2011-07-07 14:02:31
Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb
Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03
Generator: starting at 2011-07-07 14:02:35
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment:
/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238
Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04
Fetcher: No agents listed in 'http.agent.name' property.
Exception in thread "main" java.lang.IllegalArgumentException: Fetcher:
No agents listed in 'http.agent.name' property.
at
org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:135)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)
I do not understand what happend here, maybe one of you can help me?
--
--------------------------------------------------------------------------------------------
Ing. Yusniel Hidalgo Delgado
Participe en COMPUMAT 2011 http://www.mfc.uclv.edu.cu/scmc
Participe en INFO 2012 http://www.congreso-info.cu
Universidad de las Ciencias Informáticas
--------------------------------------------------------------------------------------------