Hi
You didn't follow bullet two of ----- Good! You are almost ready to crawl. You need to give your crawler a name. This is required. Open up $NUTCH_HOME/conf/nutch-default.xml file Search for http.agent.name , and give it value 'YOURNAME Spider' Optionally you may also set http.agent.url and http.agent.email properties. ------ this part of the tutorial. Actually, we recommend not modifying nutch-default, copy the properties you need to nutch-site.xml instead. On Thursday 07 July 2011 14:17:25 Paul van Hoven wrote: > I'm completly new to nutch so I downloaded version 1.3 and worked > through the beginners tutorial at > http://wiki.apache.org/nutch/NutchTutorial. The first problem was that I > did not find the file "conf/crawl-urlfilter.txt" so I omitted that and > continued with launiching nutch. Therefore I created a plain text file > in "/Users/toom/Downloads/nutch-1.3/crawled" called "urls.txt" which > contains the following text: > > tom:crawled toom$ cat urls.txt > http://nutch.apache.org/ > > So after that I invoked nutch by calling > tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled -dir > /Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50 > solrUrl is not set, indexing will be skipped... > crawl started in: /Users/toom/Downloads/nutch-1.3/sites > rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled > threads = 10 > depth = 3 > solrUrl=null > topN = 50 > Injector: starting at 2011-07-07 14:02:31 > Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb > Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03 > Generator: starting at 2011-07-07 14:02:35 > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: true > Generator: normalizing: true > Generator: topN: 50 > Generator: jobtracker is 'local', generating exactly one partition. > Generator: Partitioning selected urls for politeness. > Generator: segment: > /Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238 > Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04 > Fetcher: No agents listed in 'http.agent.name' property. > Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: > No agents listed in 'http.agent.name' property. > at > org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166) > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068) > at org.apache.nutch.crawl.Crawl.run(Crawl.java:135) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:54) > > > I do not understand what happend here, maybe one of you can help me? -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

