This looks pretty tricky. I am not experienced with using http-client in general and we could do with getting a wiki page established to comment on the re-direct policies and scenarios as there is quite a bit of confusion within the community as to what some 'states' actually mean and how to crawl/index the pages.
To address you problem specifically, as you said your log output suggests that basic authentication passes but that nothing is fetched due to the redirect. How large is the site you are trying to crawl? Does your http.content.limit property accommodate this? Where are you getting the info on the 302 redirect moved temp? from reading or dumping crawldb stats, surely there must be more information available to narrow the problem area down here. On Tue, Sep 13, 2011 at 10:41 AM, Anshuman Mor <[email protected]>wrote: > Hi Lewis, > > My Fault, sorry for that..<br/> > > I had enabled some of the logging for httpclient. Please find attached log > file.<br/> > > Please let me know if you need more information on this.<br/> > http://lucene.472066.n3.nabble.com/file/n3332184/hadoop.log hadoop.log > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Not-able-to-index-url-which-is-giving-http-302-tp3329755p3332184.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- *Lewis*

