Newbie issue resolved. The filename was mangled and the correct filename
really didn't have the correct value.
On , Matthew Stevens <[email protected]> wrote:
Yes, the files are identical. No change in behavior.
Matthew Stevens
On Oct 30, 2010, at 9:44, Khang Ich [email protected]> wrote:
Have you try to set that property in conf/nutch-default.xml ?
-- Khang
On Fri, Oct 29, 2010 at 2:08 PM, Matthew Stevens
[email protected]>wrote:
Running the following command:
./bin/nutch crawl urls -dir crawl.test -depth 3 >& crawl.log
Generates the following text in crawl.log
Fetcher: No agents listed in 'http.agent.name' property.
Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: No
agents listed in 'http.agent.name' property.
at org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1166)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1068)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:133)
My questions are:
Is the property being referred to supposed to be that listed in
*nutch-site.xml*
If so, then the xml value is:
http.agent.name
mini3
HTTP 'User-Agent' request header. MUST NOT be empty -
please set this to a single word uniquely related to your organization.
NOTE: You should also check other related properties:
http.robots.agents
http.agent.description
http.agent.url
http.agent.email
http.agent.version
and set their values appropriately.
Can someone reproduce this error or tell me how to correct it?
Additionally
it should be noted that I have not yet gotten this to run successfully.