Can any one suggest something on this ??

> Hi,
>
> I have to crawl a website on the internet which requires authentication.
> While going through nutch wiki I found a link
> http://wiki.apache.org/nutch/HttpAuthenticationSchemes
> It describes about how we can connect to simple,digest or ntlm
> authenticated site.
> I have gone through all the steps and tried to crawl the website, but it
> does not help many of the pages are still directed to login page.
> Further while checking the logs for httpclient and httpclient.auth I found
> that it has thrown an Exception
> *org.apache.commons.httpclient.ConnectionPoolTimeoutException: Timeout
> waiting for connection*
> Can someone please explain what is wrong here ??
>
> I also found another link
> http://wiki.apache.org/nutch/HttpPostAuthentication
> that describes about the steps to build the crawler that crawls Http Post
> authenticated pages
> Is there any new development on this ?.
>
> regards
> Sourabh
>

Reply via email to