Hi 
 
I am a new Nutch user, and am using Nutch 8.1 with  Hadoop. The domain I am 
trying to crawl _http://autos.aol.com_ (http://autos.aol.com) . I am crawling 
to the depth  of 10. 
There are certain pages that Nutch could not fetch. An  example would be 
_http://autos.aol.com/acura-rl-2006:8060-review_ 
(http://autos.aol.com/acura-rl-2006:8060-review) .
 
The referring url to this page is 
_http://autos.aol.com/acura-rl-2007:8060-review_ 
(http://autos.aol.com/acura-rl-2007:8060-review) .  This url was there 
in the fetch list.
 
I did a mini crawl pointing directly to  
_http://autos.aol.com/acura-rl-2007:8060-review_ 
(http://autos.aol.com/acura-rl-2007:8060-review) ,  then the page 
_http://autos.aol.com/acura-rl-2006:8060-review_ 
(http://autos.aol.com/acura-rl-2006:8060-review)  gets  fetched.
 
Does anyone have any ideas on why I am seeing this  behavior.
 
 
Thanks
Anita Bidari (X55746)




************************************** See what's free at http://www.aol.com.

Reply via email to