Doug Cutting wrote:
I was recently benchmarking fetching at a site with lots of bandwidth,
and it seemed to me that protocol-http is capable of faster crawling
than protocol-httpclient. So I don't think we should discard
protocol-http just yet. But there's a lot of duplicate code between
these, which is difficult to maintain.
Where do you think is the performance loss in protocol-httpclient?
I think we should thus merge these, with a configuration parameter
determining which http backend is used, much like parse-html, which
can switch between neko and tagsoup.
What do others think?
I think it's a good idea. Things like authentication, robots, redirects,
SSL setup and HTTP result code handling logic are nearly the same.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers