Can any one suggest something on this ??
> Hi, > > I have to crawl a website on the internet which requires authentication. > While going through nutch wiki I found a link > http://wiki.apache.org/nutch/HttpAuthenticationSchemes > It describes about how we can connect to simple,digest or ntlm > authenticated site. > I have gone through all the steps and tried to crawl the website, but it > does not help many of the pages are still directed to login page. > Further while checking the logs for httpclient and httpclient.auth I found > that it has thrown an Exception > *org.apache.commons.httpclient.ConnectionPoolTimeoutException: Timeout > waiting for connection* > Can someone please explain what is wrong here ?? > > I also found another link > http://wiki.apache.org/nutch/HttpPostAuthentication > that describes about the steps to build the crawler that crawls Http Post > authenticated pages > Is there any new development on this ?. > > regards > Sourabh >

