> Thing is my seed url is redirected to a diiferent url . Yes, it is. > But problem is the > content of this redirected url is not fetched by nutch.I have changed > rediect.max to 5 . But still content is not fetched . I assume you set "http.redirect.max" to 5. If the property is spelled correctly, you will find the content under the final target of a redirect chain (up to five hops).
You can do this also manually by: % bin/nutch parsechecker 'http://farmer.gov.in/COLD_STROAGE_Link.aspx' fetching: http://farmer.gov.in/COLD_STROAGE_Link.aspx Fetch failed with protocol status: temp_moved(13), lastModified=0: http://farmer.gov.in/(S(frbcuppdu1rmeu30yisifam5))/COLD_STROAGE_Link.aspx % bin/nutch parsechecker 'http://farmer.gov.in/(S(ulnaaubb1l0bku22vik2tzjt))/COLD_STROAGE_Link.aspx' fetching: http://farmer.gov.in/(S(ulnaaubb1l0bku22vik2tzjt))/COLD_STROAGE_Link.aspx Fetch failed with protocol status: temp_moved(13), lastModified=0: http://farmer.gov.in/(S(knqavhgccae51czzxdb100zr))/COLD_STROAGE_Link.aspx % bin/nutch parsechecker 'http://farmer.gov.in/(S(knqavhgccae51czzxdb100zr))/COLD_STROAGE_Link.aspx' fetching: http://farmer.gov.in/(S(knqavhgccae51czzxdb100zr))/COLD_STROAGE_Link.aspx ... --------- ParseText --------- State wise list of Cold-Storage ANDHRA PRADESH Warehouse Project Descriptio ... Now you get content. Finally, don't forget to check your URL filters. Cheers, Sebastian On 07/12/2013 10:48 AM, devang pandey wrote: > Hello, > > I am using nutch 1.4 to crawl a url . After crawling the content of segment > is : > Status: 1 (db_unfetched) > Fetch time: Fri Jul 12 13:43:43 IST 2013 > Modified time: Thu Jan 01 05:30:00 IST 1970 > Retries since fetch: 0 > Retry interval: 2592000 seconds (30 days) > Score: 1.0 > Signature: null > Metadata: _ngt_: 1373616835706 > > Content:: > Version: -1 > url: http://farmer.gov.in/COLD_STROAGE_Link.aspx > base: http://farmer.gov.in/COLD_STROAGE_Link.aspx > contentType: text/html > metadata: X-AspNet-Version=4.0.30319 Date=Fri, 12 Jul 2013 08:19:30 GMT > Content-Length=170 nutch.crawl.score=1.0 > Location=/(S(ulnaaubb1l0bku22vik2tzjt))/COLD_STROAGE_Link.aspx _fst_=35 > nutch.segment.name=20130712134358 Content-Type=text/html; charset=utf-8 > Connection=close Server=Microsoft-IIS/7.5 > X-Powered-By=ASP.NETCache-Control=private > Content: > <html><head><title>Object moved</title></head><body> > <h2>Object moved to <a > href="/(S(ulnaaubb1l0bku22vik2tzjt))/COLD_STROAGE_Link.aspx">here</a>.</h2> > </body></html> > > CrawlDatum:: > Version: 7 > Status: 35 (fetch_redir_temp) > Fetch time: Fri Jul 12 13:44:03 IST 2013 > Modified time: Thu Jan 01 05:30:00 IST 1970 > Retries since fetch: 0 > Retry interval: 2592000 seconds (30 days) > Score: 1.0 > Signature: null > Metadata: _ngt_: 1373616835706_pst_: temp_moved(13), lastModified=0: > http://farmer.gov.in/(S(ulnaaubb1l0bku22vik2tzjt))/COLD_STROAGE_Link.aspx > > Thing is my seed url is redirected to a diiferent url . But problem is the > content of this redirected url is not fetched by nutch. I have changed > rediect.max to 5 . But still content is not fetched .Please help >

