Hi
I am a new Nutch user, and am using Nutch 8.1 with Hadoop. The domain I am
trying to crawl _http://autos.aol.com_ (http://autos.aol.com) . I am crawling
to the depth of 10.
There are certain pages that Nutch could not fetch. An example would be
_http://autos.aol.com/acura-rl-2006:8060-review_
(http://autos.aol.com/acura-rl-2006:8060-review) .
The referring url to this page is
_http://autos.aol.com/acura-rl-2007:8060-review_
(http://autos.aol.com/acura-rl-2007:8060-review) . This url was there
in the fetch list.
I did a mini crawl pointing directly to
_http://autos.aol.com/acura-rl-2007:8060-review_
(http://autos.aol.com/acura-rl-2007:8060-review) , then the page
_http://autos.aol.com/acura-rl-2006:8060-review_
(http://autos.aol.com/acura-rl-2006:8060-review) gets fetched.
Does anyone have any ideas on why I am seeing this behavior.
Thanks
Anita Bidari (X55746)
************************************** See what's free at http://www.aol.com.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general