Andy Hedges wrote:
I just been looking at Jakarta Commons HttpClient (http://jakarta.apache.org/commons/httpclient/) as I would like to refactor HttpResponse to use it (as it will make the isTruncate flag easier to implement for this fetcher).

This is what Heritrix uses too. Perhaps Nutch should switch to it. When I first wrote the fetcher I couldn't find an Http library that was robust enough, i.e., that implemented things like socket connect timeouts and content truncation. So I wrote my own. But if this one does the trick, I don't have a problem using it.


It handles headers really nicely and perhaps we could take a leaf from their book? It basically represents them using three classes: HeadMethod, Header, HeaderElement. More information can be found here http://jakarta.apache.org/commons/httpclient/apidocs/.

I'm not sure that's the best API for generic metadata: it's pretty Http-specific, and it's also not very convenient to access. I think a map that supports multiple values wouldn't lose any information, would be more generic, and would be simpler to use, no?


Firstly I would be interested in what the consensus is on using external libraries (I know we already us a few) and secondly whether people though this is a sensible one to use - for me it saves a lot of reinventing the wheel for the http handling.

I don't have a problem using external libaries. This one, in particular looks very promising. For example, they appear to support connect timeouts:


http://jakarta.apache.org/commons/httpclient/apidocs/org/apache/commons/httpclient/HttpConnection.html#setConnectionTimeout(int)

So please feel free to contribute an Http protocol implementation which uses this library. If it is at least as robust as what we have, then we should probalby use it as our default http implementation.

On a related note, I've been thinking that the host delay logic (blockAddr() in Http.java) should probably be moved to Fetcher.java, as this is not unique to Http. Does that make sense to others?

Doug


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to