I thought I understood how to set my user-agent, but after asking a few sites 
to add me to their robots.txt it looks like I'm missing something.

My nutch-sites.xml includes:
<property>
  <name>http.agent.name</name>
  <value>PHFAWS Spider</value>
</property>
<property>
  <name>http.robots.agents</name>
  <value>PHFAWS Spider,*</value>
  <description>The agent strings we'll look for in robots.txt files,
  comma-separated, in decreasing order of precedence. You should
  put the value of http.agent.name as the first agent name, and keep the
  default * at the end of the list. E.g.: BlurflDev,Blurfl,*
  </description>
</property>

A friendly site created a robots.txt which includes the following:
User-agent: PHFAWS Spider
Disallow:

User-agent: *
Disallow: /

Why doesn't this work?

Thanks,
Chip

Reply via email to