Yes, the files are identical. No change in behavior.

Matthew Stevens

On Oct 30, 2010, at 9:44, Khang Ich <[email protected]> wrote:

Have you try to set that property in conf/nutch-default.xml ?

-- Khang

On Fri, Oct 29, 2010 at 2:08 PM, Matthew Stevens <[email protected] >wrote:

Running the following command:

./bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log

Generates the following text in crawl.log

Fetcher: No agents listed in 'http.agent.name' property.

Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: No
agents listed in 'http.agent.name' property.

at org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java: 1166)

at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068)

at org.apache.nutch.crawl.Crawl.main(Crawl.java:133)


My questions are:

Is the property being referred to supposed to be that listed in

*nutch-site.xml*
If so, then the xml value is:

<property>
<name>http.agent.name</name>
<value>mini3</value>
<description>HTTP 'User-Agent' request header. MUST NOT be empty -
please set this to a single word uniquely related to your organization.

NOTE: You should also check other related properties:

     http.robots.agents
     http.agent.description
     http.agent.url
     http.agent.email
     http.agent.version

and set their values appropriately.

</description>
</property>

Can someone reproduce this error or tell me how to correct it? Additionally it should be noted that I have not yet gotten this to run successfully.

Reply via email to