nutch redirect treatment

abhayd Wed, 17 Aug 2011 07:44:10 -0700

hi 
I have seen similar posts in this forum but still not able to understand how
redirect is handled..


I m trying to crawl http://developer.att.com/developer/ . After successful
crawl i dump the crawldb using readdb. I see entries like following.  What
does this mean? Has nutch crawled the redirected page and is it in index?

 I tried using readseg command  with all the segments under crawl/segments
directory but i could not find 
http://developer.att.com/developer/tier1page.jsp?passedItemId=100006&_requestid=35037
url.

heres is my crawl/segments directory listing.
20110817001833  20110817002117  20110817003028  20110817003930 
20110817004202
20110817001844  20110817002556  20110817003532  20110817004105

Any help why redirected page is not crawled?

http://developer.att.com/developer/     Version: 7
Status: 4 (db_redir_temp)
Fetch time: Fri Sep 16 00:18:36 CDT 2011
Modified time: Wed Dec 31 18:00:00 CST 1969
Retries since fetch: 0
Retry interval: 2592000 seconds (30 days)
Score: 1.0
Signature: null
Metadata: _pst_: temp_moved(13), lastModified=0:
http://developer.att.com/developer/tier1page.jsp?passedItemId=100006&_requestid=35037

http://developer.att.com/developer/100006       Version: 7
Status: 5 (db_redir_perm)
Fetch time: Fri Sep 16 00:43:33 CDT 2011
Modified time: Wed Dec 31 18:00:00 CST 1969
Retries since fetch: 0
Retry interval: 2592000 seconds (30 days)
Score: 0.0
Signature: null
Metadata: _pst_: moved(12), lastModified=0:
http://developer.att.com/developer/forward.jsp?passedItemId=100006



--
View this message in context: 
http://lucene.472066.n3.nabble.com/nutch-redirect-treatment-tp3261546p3261546.html
Sent from the Nutch - User mailing list archive at Nabble.com.

nutch redirect treatment

Reply via email to