> Thing is my seed url is redirected to a diiferent url .
Yes, it is.

> But problem is the
> content of this redirected url is not fetched by nutch.I  have changed
> rediect.max to 5 . But still content is not fetched .
I assume you set "http.redirect.max" to 5.
If the property is spelled correctly, you will find the content
under the final target of a redirect chain (up to five hops).

You can do this also manually by:

% bin/nutch parsechecker 'http://farmer.gov.in/COLD_STROAGE_Link.aspx'
fetching: http://farmer.gov.in/COLD_STROAGE_Link.aspx
Fetch failed with protocol status: temp_moved(13), lastModified=0:
http://farmer.gov.in/(S(frbcuppdu1rmeu30yisifam5))/COLD_STROAGE_Link.aspx

% bin/nutch parsechecker 
'http://farmer.gov.in/(S(ulnaaubb1l0bku22vik2tzjt))/COLD_STROAGE_Link.aspx'
fetching: 
http://farmer.gov.in/(S(ulnaaubb1l0bku22vik2tzjt))/COLD_STROAGE_Link.aspx
Fetch failed with protocol status: temp_moved(13), lastModified=0:
http://farmer.gov.in/(S(knqavhgccae51czzxdb100zr))/COLD_STROAGE_Link.aspx

% bin/nutch parsechecker 
'http://farmer.gov.in/(S(knqavhgccae51czzxdb100zr))/COLD_STROAGE_Link.aspx'
fetching: 
http://farmer.gov.in/(S(knqavhgccae51czzxdb100zr))/COLD_STROAGE_Link.aspx
...
---------
ParseText
---------

State wise list of Cold-Storage ANDHRA PRADESH Warehouse Project Descriptio
...

Now you get content.

Finally, don't forget to check your URL filters.

Cheers,
Sebastian



On 07/12/2013 10:48 AM, devang pandey wrote:
> Hello,
> 
> I am using nutch 1.4 to crawl a url . After crawling the content of segment
> is :
> Status: 1 (db_unfetched)
> Fetch time: Fri Jul 12 13:43:43 IST 2013
> Modified time: Thu Jan 01 05:30:00 IST 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 1.0
> Signature: null
> Metadata: _ngt_: 1373616835706
> 
> Content::
> Version: -1
> url: http://farmer.gov.in/COLD_STROAGE_Link.aspx
> base: http://farmer.gov.in/COLD_STROAGE_Link.aspx
> contentType: text/html
> metadata: X-AspNet-Version=4.0.30319 Date=Fri, 12 Jul 2013 08:19:30 GMT
> Content-Length=170 nutch.crawl.score=1.0
> Location=/(S(ulnaaubb1l0bku22vik2tzjt))/COLD_STROAGE_Link.aspx _fst_=35
> nutch.segment.name=20130712134358 Content-Type=text/html; charset=utf-8
> Connection=close Server=Microsoft-IIS/7.5
> X-Powered-By=ASP.NETCache-Control=private
> Content:
> <html><head><title>Object moved</title></head><body>
> <h2>Object moved to <a
> href="/(S(ulnaaubb1l0bku22vik2tzjt))/COLD_STROAGE_Link.aspx">here</a>.</h2>
> </body></html>
> 
> CrawlDatum::
> Version: 7
> Status: 35 (fetch_redir_temp)
> Fetch time: Fri Jul 12 13:44:03 IST 2013
> Modified time: Thu Jan 01 05:30:00 IST 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 1.0
> Signature: null
> Metadata: _ngt_: 1373616835706_pst_: temp_moved(13), lastModified=0:
> http://farmer.gov.in/(S(ulnaaubb1l0bku22vik2tzjt))/COLD_STROAGE_Link.aspx
> 
> Thing is my seed url is redirected to a diiferent url . But problem is the
> content of this redirected url is not fetched by nutch. I have changed
> rediect.max to 5 . But still content is not fetched .Please help
> 

Reply via email to