Have just updated the tutorial, as of 1.3 the files shoudl be changed in
$NUTCH_HOME/runtime/local/conf/ unless you rebuild with ANT


On 12 July 2011 10:43, Paul van Hoven <[email protected]> wrote:

> Thanks for the answers. I'm not shure if the 'http.agent.name' is the
> problem since I set it:
>
> This is the configuration I'm using from nutch-1.3/conf/nutch-default.xml:
>
> <!-- HTTP properties -->
>
> <property>
>  <name>http.agent.name</name>
>  <value>MyFirstNutchCrawler</value>
>  <description>HTTP 'User-Agent' request header. MUST NOT be empty -
>  please set this to a single word uniquely related to your organization.
>
>  NOTE: You should also check other related properties:
>
>        http.robots.agents
>        http.agent.description
>        http.agent.url
>        http.agent.email
>        http.agent.version
>
>  and set their values appropriately.
>
>  </description>
> </property>
>
> As I understand the tutorial this should be correct:
> turoial citation "Search for http.agent.name , and give it value
> 'YOURNAME Spider'"
>
>
> I already had that set this way in my first email.
>
>
>
> 2011/7/10 Ing. Yusniel Hidalgo Delgado <[email protected]>:
> > Paul, I think that your problem is related with 'http.agent.name'
> property. Please, change this property in your configuration file, such as
> describe the tutorial in:
> >
> >
> >
> > Good! You are almost ready to crawl. You need to give your crawler a
> name. This is required.
> >
> >    1. Open up $NUTCH_HOME/conf/nutch-default.xml file
> >    2.
> >
> > Search for http.agent.name , and give it value 'YOURNAME Spider'
> >    3.
> >
> > Optionally you may also set http.agent.url and http.agent.email
> properties.
> >
> > and try again.
> >
> > Grettings
> >
> > ----- Mensaje original -----
> > De: "Paul van Hoven" <[email protected]>
> > Para: [email protected]
> > Enviados: Domingo, 10 de Julio 2011 7:42:47 GMT -08:00 Tijuana / Baja
> California
> > Asunto: Problems with tutorial
> >
> > I'm completly new to nutch so I downloaded version 1.3 and worked
> > through the beginners tutorial at
> > http://wiki.apache.org/nutch/NutchTutorial. The first problem was that I
> > did not find the file "conf/crawl-urlfilter.txt" so I omitted that and
> > continued with launiching nutch. Therefore I created a plain text file
> > in "/Users/toom/Downloads/nutch-1.3/crawled" called "urls.txt" which
> > contains the following text:
> >
> > tom:crawled toom$ cat urls.txt
> > http://nutch.apache.org/
> >
> > So after that I invoked nutch by calling
> > tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled -dir
> > /Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50
> > solrUrl is not set, indexing will be skipped...
> > crawl started in: /Users/toom/Downloads/nutch-1.3/sites
> > rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled
> > threads = 10
> > depth = 3
> > solrUrl=null
> > topN = 50
> > Injector: starting at 2011-07-07 14:02:31
> > Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb
> > Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled
> > Injector: Converting injected urls to crawl db entries.
> > Injector: Merging injected urls into crawl db.
> > Injector: finished at 2011-07-07 14:02:35, elapsed: 00:00:03
> > Generator: starting at 2011-07-07 14:02:35
> > Generator: Selecting best-scoring urls due for fetch.
> > Generator: filtering: true
> > Generator: normalizing: true
> > Generator: topN: 50
> > Generator: jobtracker is 'local', generating exactly one partition.
> > Generator: Partitioning selected urls for politeness.
> > Generator: segment:
> > /Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238
> > Generator: finished at 2011-07-07 14:02:39, elapsed: 00:00:04
> > Fetcher: No agents listed in 'http.agent.name' property.
> > Exception in thread "main" java.lang.IllegalArgumentException: Fetcher:
> > No agents listed in 'http.agent.name' property.
> > at
> > org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166)
> > at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068)
> > at org.apache.nutch.crawl.Crawl.run(Crawl.java:135)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)
> >
> >
> > I do not understand what happend here, maybe one of you can help me?
> >
> >
> >
> > --
> >
> >
> >
> >
> --------------------------------------------------------------------------------------------
> > Ing. Yusniel Hidalgo Delgado
> > Participe en COMPUMAT 2011 http://www.mfc.uclv.edu.cu/scmc
> > Participe en INFO 2012 http://www.congreso-info.cu
> > Universidad de las Ciencias Informáticas
> >
> --------------------------------------------------------------------------------------------
> >
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to