Doug Cutting wrote:
>... protocol-http is capable of faster crawling than protocol-httpclient.
> So I don't think we should discard protocol-http just yet. 

>What do others think?

I think:

HttpClient-based [protocol-httpclient] uses own Threads. 
[protocol-http] does not create Threads.

We should manage this, [protocol-httpclient] is just temporary solution for
Cookies, Proxy, HTTPS etc.; [protocol-httpclient] still caches DNS-to-IP
mappings forever; Thread-related issues are very important...

Additionally, we should have such a setting:
"Wait 5 second between requests to SLOW servers"

- it means, that Nutch can dynamically define fast/slow servers and work
faster/slower...

Fuad



-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to