[ https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798890#action_12798890 ]
Ken Krugler commented on NUTCH-751: ----------------------------------- i agree that this should be in crawler-commons. E.g. I've recently made changes to avoid synchronization bottlenecks with HttpClient 4.0, and identified a few places in HC where things should be improved. Though I'm concerned that the level of customization each crawler wants could result in a pretty ugly ball of code. For example, in Bixo I'm looking at how to use a streaming disk buffer for reads, to avoid OOM errors when many threads x big responses. How would that get implemented in a way that's friendly to Nutch, Droids & Heritrix? If we could define some least-common-denominator API, that would be a good starting point. E.g. here are the set of config values, here are the set of parameters required when making a request, and here's the format of the response from a request. > Upgrade version of HttpClient > ------------------------------ > > Key: NUTCH-751 > URL: https://issues.apache.org/jira/browse/NUTCH-751 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Reporter: Julien Nioche > > The existing version of commons http-client (3.01) should be replaced with > the latest version from http://hc.apache.org/. > Currently the only way of using the https protocol is to enable http-client. > The version 3.01 is bugged and causes a lot of issues which have been > reported before. Apparently the new version has been redesigned and should > fix them. The old v3.01 is too unstable to be used on a large scale. > > I will try to send a patch in the next couple of weeks but would love to hear > your thoughts on this. > J. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.