Sam Varshavchik writes:
> Dan Melomedman writes:
>
>> I could be wrong here, but running a process per connection is wasteful
>> on very busy servers in any case. In other words, correctly designed
>> multi-threaded (not necessarily on Linux, and not necessarily with
>> pthreads) servers conserve memory and require less kernel scheduling
>> overhead if any.
> Only on platforms where processes are expensive. I really don't see much
> of a difference between a process and a thread.
Expensive in comparison to what? A process start-up time on a different
platform, or scheduling time? Scheduling processes is much more expensive
than scheduling user-space threads.
On which platforms (excluding anything M$)?
There's a difference between a thread that kernel doesn't even see
(user-space threading library), and a thread that a kernel has to actually
schedule. In the process case you have the kernel bookkepping structure per
process.
If you are talking about Linux, and it's default native threading library,
creating a thread, means creating a process, so there's no difference. This
does NOT scale well.
>> In threaded programs memory is shared, whereas in multi-process forking
>> servers it's allocated per process. This means higher memory overhead.
>
> No. Immediately after a fork(), you have processes that run in the same
> memory space. All that fork() does on modern systems is duplicate a
> process structure, and the associated resources, and goes through each
> data page and marks it copy-on-write. If one of the processes writes to a
> data page, it will fault, and the kernel will make a copy of the data
> page.
>
> For processes that are mostly code, with very little data, fork()s
> introduce very little overhead at all.
Threads do not require duplication of process structure or resources, data
is accessible by all threads, this is what I tried to convey. Unless they're
implemented the way they're in Linux kernel. The kernel doesn't spend the
time to do all the steps required for copy-on-write either in my
understanding in the user-space thread case.
>> Also, why does apache prefork a number of servers for instance?
>
> Because it can. That's the direct answer.
I thought this was a performance feature. Correct me if I am wrong.
>>
>> Also, here's an interesting unrelated blurb about copy-on-write:
>> http://www.netbsd.org/Documentation/kernel/vfork.html
>>
>>
>> Forking is multitudes more expensive than threading, memory and
>> kernel-wise.
>
> Only on broken platforms. Forking itself does very little. Only when you
> begin to modify the data pages of one of the forked processes will you run
> into overhead.
What about scheduling the processes. How bad is overhead there?
Which platforms are broken (Unices only)?
>>> From what I can tell the state-threads project aren't really threads.
>>> It looks like an event-driven library.
>>
>> Exactly. And a very cool one indeed since it allowes good system and load
>> scalability, while at the same time allowing to avoid thread safety
>> concerns, and mutexes are not usually needed.
>
> Event-driven execution models are very cumbersome ones to work with for
> anything other than GUI applications.
This is a generalization. This library was not designed for GUI
applications, but scalable servers on the contrary. Have you looked at the
example.c that ships with the library?
I don't see anything Windows or GUI like, and the API seems very clean.
Correct me if I am wrong.
> A very good example of a application platform that foists an event-driven
> model on everyone is Windows. The end result are applications that are
> utter piles of crap, from a technical viewpoint. No longer can you go
Yes, but what does this have to do with state threads?
In the author's own words:
"The State Threads is a small application library which provides a
foundation
for writing fast and highly scalable Internet applications (such as web
servers, proxy servers, mail transfer agents, and so on) on UNIX-like
platforms. It combines the simplicity of the multithreaded programming
paradigm, in which one thread supports each simultaneous connection,
with the performance and scalability of an event-driven state machine
architecture. In other words, this library offers a threading API for
structuring an Internet application as a state machine."
> Only if the ISP can deal with a single point of failure. ISPs that have
> some amount of technical experience and know how prefer to use multiple,
> redundant servers, where is no single point of failure.
Stand-alone as in a webmail server that's not required to run out of a web
server. Hardware was not meant to even touch that sentence. Clustering not
excluded.
> In order to provide fault-tolerant storage you do not see Netapp plonk a
> single fat disk into their boxes. Same basic concept.
>
>
>
> --
> Sam
>
--
Dan
Three days of testing can save 10
minutes reading manuals.