I thought I understood how to set my user-agent, but after asking a few sites to add me to their robots.txt it looks like I'm missing something.
My nutch-sites.xml includes: <property> <name>http.agent.name</name> <value>PHFAWS Spider</value> </property> <property> <name>http.robots.agents</name> <value>PHFAWS Spider,*</value> <description>The agent strings we'll look for in robots.txt files, comma-separated, in decreasing order of precedence. You should put the value of http.agent.name as the first agent name, and keep the default * at the end of the list. E.g.: BlurflDev,Blurfl,* </description> </property> A friendly site created a robots.txt which includes the following: User-agent: PHFAWS Spider Disallow: User-agent: * Disallow: / Why doesn't this work? Thanks, Chip

