Hi, if you are able to extract content via parsechecker you should be able to crawl the content.
For all _3_ URLs in the redirect chain 1. check whether they pass URL filters and normalizers 2. check whether "http.redirect.max" is set appropriately 3. run crawl. Ideally, set the URL to be checked as seed URL and choose small values for depth and topN. That makes analysis simpler. If "http.redirect.max" >= 3 you can even set depth and topN to 1. 4. check you logs for all _3_ URLs. You should see "fetching ..." 3 times (3 URLs) 5. then check crawl Db for all URLs % bin/nutch readdb .../crawldb -url URL 6. check content of segment(s) for all URLs Sorry, there is no tool which does all the steps automatically. You have to do it by hand. Good luck, Sebastian On 07/15/2013 06:39 AM, devang pandey wrote: > Hello Sebastian, Thankyou for your response . But thing is that my task is > to crawl this url and using parsechecker command I am able to see content > of page but not able to crawl it .Please help me with crawling aspect also. > >

