Hello.

I think that I had the same problem some weeks ago. Try to resolve it
including in *nutch-site.xml* the next properties:


<property>
  <name>*http.agent.name*</name>
  <value>*YourCrawlerName*</value>
  <description>HTTP 'User-Agent' request header. MUST NOT be empty -
  please set this to a single word uniquely related to your organization.
  </description>
</property>

<property>
  <name>*http.robots.agents*</name>
  <value>*YourCrawlerName,**</value>
  <description>The agent strings we'll look for in robots.txt files,
  comma-separated, in decreasing order of precedence. You should
  put the value of http.agent.name as the first agent name, and keep the
  default * at the end of the list. E.g.: BlurflDev,Blurfl,*
  </description>
</property>



It's mandatory to include your *http.agent.name* into
*http.robots.agents*property.


Good luck!

Reply via email to