[ 
http://issues.apache.org/jira/browse/NUTCH-109?page=comments#action_12332070 ] 

Fuad Efendi commented on NUTCH-109:
-----------------------------------

All 3 plugins perform the same.
However, first two plugins used single shared Socket for all 20 threads; third 
plugin used 3 shared Sockets for 20 threads. 
Third one (new plugin based on old Innovation HTTPClient framework) had 
dead-locks when I tried to run 20 threads over single HTTPClient instance.

It is possible to configure Linux box (1Mb RAM) with 6000 client threads in 
Worker model. It is limited only by amout of available RAM. I used such 
configuration in production, 6 Apache servers sustained 75000 of concurrent 
users performing 1 request per minute, 4kb HTML pages, load/stress tests by 
Compuware.

Default installation of Apache Web Server has 150 client threads allowed;

What does it mean for us? One shared TCP transport connection per Web Server, 
one instance of Client Thread on Apache. It is impossible to overload Apache 
using single TCP connection and performing 100 requests per second; another 149 
Threads will successfully handle client requests.

Such proposed behavior of a Search Engine should not be considered as Denial of 
Service Attack; we are using single TCP connection for multiple requests.

> Nutch - Fetcher - Performance Test - new Protocol-HTTPClient-Innovation
> -----------------------------------------------------------------------
>
>          Key: NUTCH-109
>          URL: http://issues.apache.org/jira/browse/NUTCH-109
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Versions: 0.7, 0.6, 0.7.1, 0.8-dev
>  Environment: Nutch: Windows XP, J2SE 1.4.2_09
> Web Server: Suse Linux, Apache HTTPD, apache2-worker,  v. 2.0.53
>     Reporter: Fuad Efendi
>  Attachments: protocol-httpclient-innovation-0.1.0.zip, test_results.txt
>
> 1. TCP connection costs a lot, not only for Nutch and end-point web servers, 
> but also for intermediary network equipment 
> 2. Web Server creates Client thread and hopes that Nutch really uses 
> HTTP/1.1, or at least Nutch sends "Connection: close" before closing in JVM 
> "Socket.close()" ...
> I need to perform very objective tests, probably 2-3 days; new plugin 
> crawled/parsed 23,000 pages for 1,321 seconds; it seems that existing 
> http-plugin needs few days...
> I am using separate network segment with Windows XP (Nutch), and Suse Linux 
> (Apache HTTPD + 120,000 pages)
> Please find attached new plugin based on 
> http://www.innovation.ch/java/HTTPClient/
> Please note: 
> Class HttpFactory contains cache of HTTPConnection objects; each object run 
> each thread; each object is absolutely thread-safe, so we can send multiple 
> GET requests using single instance:
>    private static int CLIENTS_PER_HOST = 
> NutchConf.get().getInt("http.clients.per.host", 3);
> I'll add more comments after finishing tests...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to