Hi, I have to crawl a website on the internet which requires authentication. While going through nutch wiki I found a link http://wiki.apache.org/nutch/HttpAuthenticationSchemes It describes about how we can connect to simple,digest or ntlm authenticated site. I have gone through all the steps and tried to crawl the website, but it does not help many of the pages are still directed to login page. Further while checking the logs for httpclient and httpclient.auth I found that it has thrown an Exception *org.apache.commons.httpclient.ConnectionPoolTimeoutException: Timeout waiting for connection* Can someone please explain what is wrong here ??
I also found another link http://wiki.apache.org/nutch/HttpPostAuthentication that describes about the steps to build the crawler that crawls Http Post authenticated pages Is there any new development on this ?. regards Sourabh

