Thanks lewis, but I don't think its related to NUTCH-769. >From what I understand of NUTCH-769, it concerns scenarios in which the hosts are indeed unresponsive and an exception is thrown on same url over and over. My problem here is with protocol-httpclient. The urls and hosts are responsive, but the max-connections-per-host parameter of http-client is always set to 2. Thus if 2 urls from same host are being fetched for some time, then a third url from the same host will encounter a timeout very soon. Note that the host has no problem being fetched for any amount of times at a time. The only limitation here is protocol-httpclient, which I must use because I need to authenticate. When I didnt have to authenticate, I used protocol-http, and didnt encounter this problem.
Regarding your second comment, I might have another idea. actually its not related to this list becuae its a OS issue, but maybe someone can help nonetheless. Can I authenticate once in the OS-session scope? I mean, can I run a linux command once, that will authenticate the user to the host, and thus when accessing the urls via nutch, it will already be authenticated? lewis john mcgibbney wrote > > Hi, > > On Tue, Jun 26, 2012 at 1:39 PM, nutch.buddy@ > <nutch.buddy@> wrote: >> after a while fetcher starts throwing >> httpclient.connectionPoolTimeoutException: Timeout waiting for connection >> for almost each url. >> >> Any solution for this issue? > > This looks like it's related to the fix in NUTCH-769 can you please > check this out and provide you opinion. > >> >> Another issue is that nutch authenticates for each url, and i think its >> an >> overhead. >> Any way to get nutch to read a cookie or something of this sort? > > In short no. Please see here [0] > > [0] http://www.mail-archive.com/[email protected]/msg05348.html > -- View this message in context: http://lucene.472066.n3.nabble.com/problem-when-fetching-with-http-client-and-authentication-tp3991342p3991410.html Sent from the Nutch - User mailing list archive at Nabble.com.

