Have just updated the tutorial, as of 1.3 the files shoudl be changed in $NUTCH_HOME/runtime/local/conf/ unless you rebuild with ANT
On 12 July 2011 10:43, Paul van Hoven <[email protected]> wrote: > Thanks for the answers. I'm not shure if the 'http.agent.name' is the > problem since I set it: > > This is the configuration I'm using from nutch-1.3/conf/nutch-default.xml: > > <!-- HTTP properties --> > > <property> > <name>http.agent.name</name> > <value>MyFirstNutchCrawler</value> > <description>HTTP 'User-Agent' request header. MUST NOT be empty - > please set this to a single word uniquely related to your organization. > > NOTE: You should also check other related properties: > > http.robots.agents > http.agent.description > http.agent.url > http.agent.email > http.agent.version > > and set their values appropriately. > > </description> > </property> > > As I understand the tutorial this should be correct: > turoial citation "Search for http.agent.name , and give it value > 'YOURNAME Spider'" > > > I already had that set this way in my first email. > > > > 2011/7/10 Ing. Yusniel Hidalgo Delgado <[email protected]>: > > Paul, I think that your problem is related with 'http.agent.name' > property. Please, change this property in your configuration file, such as > describe the tutorial in: > > > > > > > > Good! You are almost ready to crawl. You need to give your crawler a > name. This is required. > > > > 1. Open up $NUTCH_HOME/conf/nutch-default.xml file > > 2. > > > > Search for http.agent.name , and give it value 'YOURNAME Spider' > > 3. > > > > Optionally you may also set http.agent.url and http.agent.email > properties. > > > > and try again. > > > > Grettings > > > > ----- Mensaje original ----- > > De: "Paul van Hoven" <[email protected]> > > Para: [email protected] > > Enviados: Domingo, 10 de Julio 2011 7:42:47 GMT -08:00 Tijuana / Baja > California > > Asunto: Problems with tutorial > > > > I'm completly new to nutch so I downloaded version 1.3 and worked > > through the beginners tutorial at > > http://wiki.apache.org/nutch/NutchTutorial. The first problem was that I > > did not find the file "conf/crawl-urlfilter.txt" so I omitted that and > > continued with launiching nutch. Therefore I created a plain text file > > in "/Users/toom/Downloads/nutch-1.3/crawled" called "urls.txt" which > > contains the following text: > > > > tom:crawled toom$ cat urls.txt > > http://nutch.apache.org/ > > > > So after that I invoked nutch by calling > > tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled -dir > > /Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50 > > solrUrl is not set, indexing will be skipped... > > crawl started in: /Users/toom/Downloads/nutch-1.3/sites > > rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled > > threads = 10 > > depth = 3 > > solrUrl=null > > topN = 50 > > Injector: starting at 2011-07-07 14:02:31 > > Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb > > Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled > > Injector: Converting injected urls to crawl db entries. > > Injector: Merging injected urls into crawl db. > > Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03 > > Generator: starting at 2011-07-07 14:02:35 > > Generator: Selecting best-scoring urls due for fetch. > > Generator: filtering: true > > Generator: normalizing: true > > Generator: topN: 50 > > Generator: jobtracker is 'local', generating exactly one partition. > > Generator: Partitioning selected urls for politeness. > > Generator: segment: > > /Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238 > > Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04 > > Fetcher: No agents listed in 'http.agent.name' property. > > Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: > > No agents listed in 'http.agent.name' property. > > at > > org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166) > > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068) > > at org.apache.nutch.crawl.Crawl.run(Crawl.java:135) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:54) > > > > > > I do not understand what happend here, maybe one of you can help me? > > > > > > > > -- > > > > > > > > > -------------------------------------------------------------------------------------------- > > Ing. Yusniel Hidalgo Delgado > > Participe en COMPUMAT 2011 http://www.mfc.uclv.edu.cu/scmc > > Participe en INFO 2012 http://www.congreso-info.cu > > Universidad de las Ciencias Informáticas > > > -------------------------------------------------------------------------------------------- > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

