Fuad Efendi (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-109?page=all ]
Fuad Efendi updated NUTCH-109:
------------------------------
Summary: Nutch - Fetcher - Performance Test - new
Protocol-HTTPClient-Innovation (was: Nutch - Fetcher - HTTP - Performance Testing
& Tuning)
I performed performance tests, using default Apache HTTPD Web-Server
installation, with crawled 120,000 pages (I used Teleport Ultra to crawl HTML
pages from www.apache.org, I spent probably 10 hours)
Everything run in a separate LAN, Windows XP (Client with Nutch 0.7.1), and
Suse Linux 9.3 (Server with Apache)
I measured crawl for 21,000 pages (Depth=6, Threads=20) (it seems to take few
days to crawl all 120,000 pages):
Protocol-HTTPClient-Innovation:
1,321,470 milliseconds
Protocol-HTTP:
26,946,076 milliseconds
Protocol-HttpClient:
27,062,854 milliseconds
This is interesting. Could you please check what is the difference in
this benchmark, if you set HttpVersion.HTTP_1_1 in
protocol-httpclient/HttpResponse.java:92 ?
Unfortunately, Nutch cannot use that library because it's LGPL.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers