Sam Varshavchik writes: 

> Dan Melomedman writes:  
>  
>> I could be wrong here, but running a process per connection is wasteful  
>> on very busy servers in any case. In other words, correctly designed  
>> multi-threaded (not necessarily on Linux, and not necessarily with  
>> pthreads) servers conserve memory and require less kernel scheduling  
>> overhead if any. 

> Only on platforms where processes are expensive.  I really don't see much  
> of a difference between a process and a thread. 

Expensive in comparison to what? A process start-up time on a different 
platform, or scheduling time? Scheduling processes is much more expensive 
than scheduling user-space threads. 

On which platforms (excluding anything M$)? 

There's a difference between a thread that kernel doesn't even see 
(user-space threading library), and a thread that a kernel has to actually 
schedule. In the process case you have the kernel bookkepping structure per 
process. 

If you are talking about Linux, and it's default native threading library, 
creating a thread, means creating a process, so there's no difference. This 
does NOT scale well. 

>> In threaded programs memory is shared, whereas in multi-process forking  
>> servers it's allocated per process. This means higher memory overhead. 
>  
> No.  Immediately after a fork(), you have processes that run in the same  
> memory space.  All that fork() does on modern systems is duplicate a  
> process structure, and the associated resources, and goes through each  
> data page and marks it copy-on-write.  If one of the processes writes to a  
> data page, it will fault, and the kernel will make a copy of the data  
> page.  
>  
> For processes that are mostly code, with very little data, fork()s  
> introduce very little overhead at all.

Threads do not require  duplication of process structure or resources, data 
is accessible by all threads, this is what I tried to convey. Unless they're 
implemented the way they're in Linux kernel. The kernel doesn't spend the 
time to do all the steps required for copy-on-write either in my 
understanding in the user-space thread case. 

>> Also, why does apache prefork a number of servers for instance? 
>  
> Because it can.  That's the direct answer.  

I thought this was a performance feature. Correct me if I am wrong. 

>>  
>> Also, here's an interesting unrelated blurb about copy-on-write: 
>> http://www.netbsd.org/Documentation/kernel/vfork.html   
>>  
>>  
>> Forking is multitudes more expensive than threading, memory and  
>> kernel-wise. 
>  
> Only on broken platforms.  Forking itself does very little.  Only when you  
> begin to modify the data pages of one of the forked processes will you run  
> into overhead.  

What about scheduling the processes. How bad is overhead there?
Which platforms are broken (Unices only)? 

>>> From what I can tell the state-threads project aren't really threads.   
>>> It looks like an event-driven library. 
>>  
>> Exactly. And a very cool one indeed since it allowes good system and load  
>> scalability, while at the same time allowing to avoid thread safety  
>> concerns, and mutexes are not usually needed. 
>  
> Event-driven execution models are very cumbersome ones to work with for  
> anything other than GUI applications.  

This is a generalization. This library was not designed for GUI 
applications, but scalable servers on the contrary. Have you looked at the 
example.c that ships with the library?
I don't see anything Windows or GUI like, and the API seems very clean. 
Correct me if I am wrong. 

> A very good example of a application platform that foists an event-driven  
> model on everyone is Windows.  The end result are applications that are  
> utter piles of crap, from a technical viewpoint.  No longer can you go 

Yes, but what does this have to do with state threads?
In the author's own words: 

"The State Threads is a small application library which provides a 
foundation
for writing fast and highly scalable Internet applications (such as web
servers, proxy servers, mail transfer agents, and so on) on UNIX-like
platforms.  It combines the simplicity of the multithreaded programming
paradigm, in which one thread supports each simultaneous connection,
with the performance and scalability of an event-driven state machine
architecture.  In other words, this library offers a threading API for
structuring an Internet application as a state machine." 

> Only if the ISP can deal with a single point of failure.  ISPs that have  
> some amount of technical experience and know how prefer to use multiple,  
> redundant servers, where is no single point of failure.  

Stand-alone as in a webmail server that's not required to run out of a web 
server. Hardware was not meant to even touch that sentence. Clustering not 
excluded. 

> In order to provide fault-tolerant storage you do not see Netapp plonk a  
> single fat disk into their boxes.  Same basic concept.  
>  
>  
>  
> --  
> Sam  
>  

-- 
Dan
Three days of testing can save 10
minutes reading manuals. 

Reply via email to