Hi Paul, Please see this tutorial for working with Nutch 1.3 [1]
The tutorial you were using is for Nutch 1.2 from memory. [1] http://wiki.apache.org/nutch/RunningNutchAndSolr Thank you On Thu, Jul 7, 2011 at 1:17 PM, Paul van Hoven < [email protected]> wrote: > I'm completly new to nutch so I downloaded version 1.3 and worked through > the beginners tutorial at > http://wiki.apache.org/nutch/**NutchTutorial<http://wiki.apache.org/nutch/NutchTutorial>. > The first problem was that I did not find the file > "conf/crawl-urlfilter.txt" so I omitted that and continued with launiching > nutch. Therefore I created a plain text file in > "/Users/toom/Downloads/nutch-**1.3/crawled" called "urls.txt" which > contains the following text: > > tom:crawled toom$ cat urls.txt > http://nutch.apache.org/ > > So after that I invoked nutch by calling > tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.**3/crawled -dir > /Users/toom/Downloads/nutch-1.**3/sites -depth 3 -topN 50 > solrUrl is not set, indexing will be skipped... > crawl started in: /Users/toom/Downloads/nutch-1.**3/sites > rootUrlDir = /Users/toom/Downloads/nutch-1.**3/crawled > threads = 10 > depth = 3 > solrUrl=null > topN = 50 > Injector: starting at 2011-07-07 14:02:31 > Injector: crawlDb: /Users/toom/Downloads/nutch-1.**3/sites/crawldb > Injector: urlDir: /Users/toom/Downloads/nutch-1.**3/crawled > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03 > Generator: starting at 2011-07-07 14:02:35 > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: true > Generator: normalizing: true > Generator: topN: 50 > Generator: jobtracker is 'local', generating exactly one partition. > Generator: Partitioning selected urls for politeness. > Generator: segment: /Users/toom/Downloads/nutch-1.**3/sites/segments/** > 20110707140238 > Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04 > Fetcher: No agents listed in 'http.agent.name' property. > Exception in thread "main" java.lang.**IllegalArgumentException: Fetcher: > No agents listed in 'http.agent.name' property. > at org.apache.nutch.fetcher.**Fetcher.checkConfiguration(** > Fetcher.java:1166) > at org.apache.nutch.fetcher.**Fetcher.fetch(Fetcher.java:**1068) > at org.apache.nutch.crawl.Crawl.**run(Crawl.java:135) > at org.apache.hadoop.util.**ToolRunner.run(ToolRunner.**java:65) > at org.apache.nutch.crawl.Crawl.**main(Crawl.java:54) > > > I do not understand what happend here, maybe one of you can help me? > > -- *Lewis*

