Thanks lewis, but I don't think its related to NUTCH-769.
>From what I understand of NUTCH-769, it concerns scenarios in which the
hosts are indeed unresponsive and an exception is thrown on same url over
and over.
My problem here is with protocol-httpclient.
The urls and hosts are responsive, but the max-connections-per-host
parameter of http-client is always set to 2. Thus if 2 urls from same host 
are being fetched for some time, then a third url from the same host will
encounter a timeout very soon. 
Note that the host has no problem being fetched for any amount of times at a
time. The only limitation here is protocol-httpclient, which I must use
because I need to authenticate.
When I didnt have to authenticate, I used protocol-http, and didnt encounter
this problem.

Regarding your second comment, I might have another idea. actually its not
related to this list becuae its a OS issue, but maybe someone can help
nonetheless. Can I authenticate once in the OS-session scope? I mean, can I
run a linux command once, that will authenticate the user to the host, and
thus when accessing the urls via nutch, it will already be authenticated?




lewis john mcgibbney wrote
> 
> Hi,
> 
> On Tue, Jun 26, 2012 at 1:39 PM, nutch.buddy@
> <nutch.buddy@> wrote:
>> after a while fetcher starts throwing
>> httpclient.connectionPoolTimeoutException: Timeout waiting for connection
>> for almost each url.
>>
>> Any solution for this issue?
> 
> This looks like it's related to the fix in NUTCH-769 can you please
> check this out and provide you opinion.
> 
>>
>> Another issue is that nutch authenticates for each url, and i think its
>> an
>> overhead.
>> Any way to get nutch to read a cookie or something of this sort?
> 
> In short no. Please see here [0]
> 
> [0] http://www.mail-archive.com/[email protected]/msg05348.html
> 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-when-fetching-with-http-client-and-authentication-tp3991342p3991410.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to