Fuad Efendi (JIRA) wrote:
     [ http://issues.apache.org/jira/browse/NUTCH-109?page=all ]

Fuad Efendi updated NUTCH-109:
------------------------------

    Summary: Nutch - Fetcher - Performance Test - new 
Protocol-HTTPClient-Innovation  (was: Nutch - Fetcher - HTTP - Performance Testing 
& Tuning)

I performed performance tests, using default Apache HTTPD Web-Server 
installation, with crawled 120,000 pages (I used Teleport Ultra to crawl HTML 
pages from www.apache.org, I spent probably 10 hours)

Everything run in a separate LAN, Windows XP (Client with Nutch 0.7.1), and 
Suse Linux 9.3 (Server with Apache)

I measured crawl for 21,000 pages (Depth=6, Threads=20) (it seems to take few 
days to crawl all 120,000 pages):

Protocol-HTTPClient-Innovation: 1,321,470 milliseconds

Protocol-HTTP: 26,946,076 milliseconds

Protocol-HttpClient: 27,062,854 milliseconds

This is interesting. Could you please check what is the difference in this benchmark, if you set HttpVersion.HTTP_1_1 in protocol-httpclient/HttpResponse.java:92 ?

Unfortunately, Nutch cannot use that library because it's LGPL.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to