[ 
https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798890#action_12798890
 ] 

Ken Krugler commented on NUTCH-751:
-----------------------------------

i agree that this should be in crawler-commons. E.g. I've recently made changes 
to avoid synchronization bottlenecks with HttpClient 4.0, and identified a few 
places in HC where things should be improved.

Though I'm concerned that the level of customization each crawler wants could 
result in a pretty ugly ball of code. For example, in Bixo I'm looking at how 
to use a streaming disk buffer for reads, to avoid OOM errors when many threads 
x big responses. How would that get implemented in a way that's friendly to 
Nutch, Droids & Heritrix?

If we could define some least-common-denominator API, that would be a good 
starting point. E.g. here are the set of config values, here are the set of 
parameters required when making a request, and here's the format of the 
response from a request.


> Upgrade version of HttpClient 
> ------------------------------
>
>                 Key: NUTCH-751
>                 URL: https://issues.apache.org/jira/browse/NUTCH-751
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>            Reporter: Julien Nioche
>
> The existing version of commons http-client (3.01) should be replaced with 
> the latest version from http://hc.apache.org/.
> Currently the only way of using the https protocol is to enable http-client. 
> The version 3.01 is bugged and causes a lot of issues which have been 
> reported before. Apparently the new version has been redesigned and should 
> fix them. The old v3.01 is too unstable to be used on a large scale.
>  
> I will try to send a patch in the next couple of weeks but would love to hear 
> your thoughts on this.
> J.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to