Hi 
 
I am a new Nutch user, and am using Nutch 8.1 with  Hadoop. The domain I am 
trying to crawl _http://autos.aol.com_ (http://autos.aol.com) . I am crawling 
to the depth  of 10. 
There are certain pages that Nutch could not fetch. An  example would be 
_http://autos.aol.com/acura-rl-2006:8060-review_ 
(http://autos.aol.com/acura-rl-2006:8060-review) .
 
The referring url to this page is 
_http://autos.aol.com/acura-rl-2007:8060-review_ 
(http://autos.aol.com/acura-rl-2007:8060-review) .  This url was there 
in the fetch list.
 
I did a mini crawl pointing directly to  
_http://autos.aol.com/acura-rl-2007:8060-review_ 
(http://autos.aol.com/acura-rl-2007:8060-review) ,  then the page 
_http://autos.aol.com/acura-rl-2006:8060-review_ 
(http://autos.aol.com/acura-rl-2006:8060-review)  gets  fetched.
 
Does anyone have any ideas on why I am seeing this  behavior.
 
 
Thanks
Anita Bidari (X55746)




************************************** See what's free at http://www.aol.com.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to