hi I have seen similar posts in this forum but still not able to understand how redirect is handled..
I m trying to crawl http://developer.att.com/developer/ . After successful crawl i dump the crawldb using readdb. I see entries like following. What does this mean? Has nutch crawled the redirected page and is it in index? I tried using readseg command with all the segments under crawl/segments directory but i could not find http://developer.att.com/developer/tier1page.jsp?passedItemId=100006&_requestid=35037 url. heres is my crawl/segments directory listing. 20110817001833 20110817002117 20110817003028 20110817003930 20110817004202 20110817001844 20110817002556 20110817003532 20110817004105 Any help why redirected page is not crawled? http://developer.att.com/developer/ Version: 7 Status: 4 (db_redir_temp) Fetch time: Fri Sep 16 00:18:36 CDT 2011 Modified time: Wed Dec 31 18:00:00 CST 1969 Retries since fetch: 0 Retry interval: 2592000 seconds (30 days) Score: 1.0 Signature: null Metadata: _pst_: temp_moved(13), lastModified=0: http://developer.att.com/developer/tier1page.jsp?passedItemId=100006&_requestid=35037 http://developer.att.com/developer/100006 Version: 7 Status: 5 (db_redir_perm) Fetch time: Fri Sep 16 00:43:33 CDT 2011 Modified time: Wed Dec 31 18:00:00 CST 1969 Retries since fetch: 0 Retry interval: 2592000 seconds (30 days) Score: 0.0 Signature: null Metadata: _pst_: moved(12), lastModified=0: http://developer.att.com/developer/forward.jsp?passedItemId=100006 -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-redirect-treatment-tp3261546p3261546.html Sent from the Nutch - User mailing list archive at Nabble.com.

