Hello David,

can you specify which version of Nutch you are using?

I've run a local test crawl with Nutch 1.5 two weeks ago
and just looked into the Apache log file. All seems correct:

127.0.0.1 - - [31/May/2012:22:25:46 +0200] "GET /robots.txt HTTP/1.0" 404 462 
"-"
"sn-test-crawler/Nutch-1.5 (wastldotnagelatgooglemaildotcom)"
127.0.0.1 - - [31/May/2012:22:25:46 +0200] "GET / HTTP/1.0" 200 447 "-" 
"sn-test-crawler/Nutch-1.5
(wastldotnagelatgooglemaildotcom)"


The agent properties from my nutch-site.xml:

<property>
  <name>http.agent.name</name>
  <value>sn-test-crawler</value>
</property>

<property>
  <name>http.robots.agents</name>
  <value>sn-test-crawler,*</value>
</property>

<property>
  <name>http.agent.description</name>
  <value></value>
</property>

<property>
  <name>http.agent.email</name>
  <value>wastldotnagelatgooglemaildotcom</value>
</property>


Regards,
Sebastian


On 06/12/2012 02:36 PM, david wrote:
> Hello, I have changed
> 
> 
>  <name>http.agent.name</name>
>  <value>MyNameSpider</value>
> 
>   <name>http.robots.agents</name>
>   <value>MyNameSpider,*</value> 
> 
> 
> When I look at my website stats, I always
> Robots / Spiders visitors
> " Nutch with a link http://nutch.apache.org/";
> 
> you have a solution for the name of the spider is correct
> 
> 
> 
> David
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Nutch-name-spyder-tp3989188.html
> Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to