Re: Httpd 3.0 or something else

2009-11-13 Thread Greg Stein
On Fri, Nov 13, 2009 at 14:01, Arturo 'Buanzo' Busleiman
 wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Matthieu Estrade wrote:
>> What about the non http protocol like ftp, or smtp tested during summer
>> code ? The tentation to have a powerful core that we could adapt to any
>> protocol we want...
>
> And Google just released SPDY ("Speedy"), a non-http protocol for web 
> transport...

Paul and I briefly discussed adding some stuff to serf that could
allow serf to do SPDY. For example, add the notion of "priority" into
the request system. It would be ignored in a normal connection, but
could then take effect in a SPDY connection.

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-13 Thread Arturo 'Buanzo' Busleiman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Matthieu Estrade wrote:
> What about the non http protocol like ftp, or smtp tested during summer
> code ? The tentation to have a powerful core that we could adapt to any
> protocol we want...

And Google just released SPDY ("Speedy"), a non-http protocol for web 
transport...

- --
Arturo "Buanzo" Busleiman
Independent Linux and Security Consultant - OWASP - SANS - OISSG
http://www.buanzo.com.ar/pro/eng.html
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEAREKAAYFAkr9rScACgkQAlpOsGhXcE2JuQCeIQdL24we50bzAphSn+KtbTia
pIUAn2O819ym9idAGI19+32o5qPdO/N2
=RuhB
-END PGP SIGNATURE-


Re: Httpd 3.0 or something else

2009-11-13 Thread Matthieu Estrade
Woow =) Very nice and interesting thread =)

It's very hard to think how to design httpd 3.0 before knowing what is
the real aim of this new webserver. Many feedback here are from very
spoted problems.
I've started at the end of 1.3 and the beta release of 2.0, and i must
say that applicative architectures and needs from people changed a lot.
Imho, the real question is what we want to do with it ? Still a very
flexible and compatible web server, providing interface for many
languages, with a very interesting API to develop modules ? Running in
the performances issues like nginx, httpd, haproxy or some others
webservers/load-balancers/reverseproxies can do ? Provide event based
design to process event driven application and infrastructure like xmpp
? Able to do Soap or webservice routing message ?
What about the non http protocol like ftp, or smtp tested during summer
code ? The tentation to have a powerful core that we could adapt to any
protocol we want...

Imho, i don't see how to stay competitive without an mpm to handle event
driven application, which could also solve many performances/reliability
problems.
Then maybe have two big categories like delivery (reverse proxy, load
balancing, content caching, gzip/deflate etc.) and applications and
languages (php, perl, python, ruby, external filters etc.).

my 2 cts.

Matthieu



Graham Leggett wrote:
> Jean-Marc Desperrier wrote:
>
>   
>> Last time I've heard about a large scale server thinking about switching
>> from Apache to lighttpd, the one problem that site wanted to solve was a
>> massive number slow clients simultaneously connected to the server, with
>> the http server mostly just serving as a pipe between the client and
>> php, and where the ideal solution had to consume as little resource per
>> client as possible.
>>
>> Did the admin of that site just miss what the solution should have been
>> to handle this properly with Apache ?
>> 
>
> Dedicated reverse proxy servers like varnish have appeared to solve this
> problem, and apparently work quite well for the narrow problem they are
> designed to solve (I say apparently because we're still at the evaluate
> stage on this).
>
> I would prefer in the long term that the two-layered approach wasn't
> necessary, which is why I am so keen to make sure httpd v3.0's
> architecture can optionally do what varnish does out of the box.
>
> Regards,
> Graham
> --
>
>   



Re: Httpd 3.0 or something else

2009-11-12 Thread Graham Leggett
Jean-Marc Desperrier wrote:

> Last time I've heard about a large scale server thinking about switching
> from Apache to lighttpd, the one problem that site wanted to solve was a
> massive number slow clients simultaneously connected to the server, with
> the http server mostly just serving as a pipe between the client and
> php, and where the ideal solution had to consume as little resource per
> client as possible.
> 
> Did the admin of that site just miss what the solution should have been
> to handle this properly with Apache ?

Dedicated reverse proxy servers like varnish have appeared to solve this
problem, and apparently work quite well for the narrow problem they are
designed to solve (I say apparently because we're still at the evaluate
stage on this).

I would prefer in the long term that the two-layered approach wasn't
necessary, which is why I am so keen to make sure httpd v3.0's
architecture can optionally do what varnish does out of the box.

Regards,
Graham
--


Re: Httpd 3.0 or something else

2009-11-12 Thread Jean-Marc Desperrier

Greg Stein wrote:

>  we have to take into account that some of those httpd's, like lighttpd, are
>  replacing Apache plain and simple. [...]

[...] I'm just trying to say those
aren't necessarily*better*  than Apache, but that they are
*better-suited*  to their admin's scenarios.[...]


Last time I've heard about a large scale server thinking about switching 
from Apache to lighttpd, the one problem that site wanted to solve was a 
massive number slow clients simultaneously connected to the server, with 
the http server mostly just serving as a pipe between the client and 
php, and where the ideal solution had to consume as little resource per 
client as possible.


Did the admin of that site just miss what the solution should have been 
to handle this properly with Apache ?


Re: Httpd 3.0 or something else

2009-11-12 Thread Greg Stein
On Thu, Nov 12, 2009 at 09:59, Arturo 'Buanzo' Busleiman
 wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Greg Stein wrote:
>> Apache remains the broad solution, but for narrow requirements, people
>> will select something that is easier to handle for their particular
>> situation.
>>
>> I wouldn't say "wrong", but more along the lines of "not as well-suited"
>
> I partially agree, but we have to take into account that some of those 
> httpd's, like lighttpd, are
> replacing Apache plain and simple. Don't get me wrong. I love Apache. I've 
> written tons of articles
> about it since the very early days. And I haven't released any mod_openpgp 
> code for any other thing
> other than Apache for a reason: i love it.

Yeah... I think we're in agreement. I'm just trying to say those
aren't necessarily *better* than Apache, but that they are
*better-suited* to their admin's scenarios. As the swiss army knife of
web servers, Apache is very heavy in the pocket. In many scenarios,
one little blade is all you need, and it is much easier to use and
maintain.

I'm not sure that is a solvable problem for us, unfortunately. We
would need a drastic overhaul of how we approach configuration. (not
to mention setup/building and module loading/handling)  In essence, I
think the project has concentrated on backwards-compat rather than an
overhaul for usability.

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-12 Thread Arturo 'Buanzo' Busleiman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Greg Stein wrote:
> Apache remains the broad solution, but for narrow requirements, people
> will select something that is easier to handle for their particular
> situation.
> 
> I wouldn't say "wrong", but more along the lines of "not as well-suited"

I partially agree, but we have to take into account that some of those httpd's, 
like lighttpd, are
replacing Apache plain and simple. Don't get me wrong. I love Apache. I've 
written tons of articles
about it since the very early days. And I haven't released any mod_openpgp code 
for any other thing
other than Apache for a reason: i love it.

- --
Arturo "Buanzo" Busleiman
Independent Linux and Security Consultant - OWASP - SANS - OISSG
http://www.buanzo.com.ar/pro/eng.html
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEAREKAAYFAkr8Is4ACgkQAlpOsGhXcE1tFwCdEAEZQDVG9c2yNXwYBk2/
VgIAn2emSNcp1xwXa2bxgoK09JKMcsV4
=T/Fe
-END PGP SIGNATURE-


Re: Httpd 3.0 or something else

2009-11-12 Thread Jim Jagielski

On Nov 11, 2009, at 2:14 PM, Akins, Brian wrote:

> On 11/10/09 6:20 PM, "Greg Stein"  wrote:
> 
>> I'd like to see a few "network" threads multiplexing all the writing
>> to clients. 
> 
> That's what I meant. I just didn't state it properly.
> 
> 
>> Then take all of *that*, and spread it across several processes for
>> solid uptime, with a master monitor process.
> 
> And then you have nginx ;)
> 

Well, nginx is, after all, a fork of httpd


Re: Httpd 3.0 or something else

2009-11-11 Thread Greg Stein
On Wed, Nov 11, 2009 at 15:00, Arturo 'Buanzo' Busleiman
 wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Greg Stein wrote:
>> Right. But they don't have the depth/breadth of modules like we do.
>
> ... yet. Keep going, but if there are great things like lighttpd and nginx 
> (and even more) http
> daemons out there, then that means more than one thing is wrong with current 
> Apache.

Oh, definitely. HTTP serving is commodity functionality now. Thus, it
is very easy to serve niches with a specialized HTTP server.

Apache remains the broad solution, but for narrow requirements, people
will select something that is easier to handle for their particular
situation.

I wouldn't say "wrong", but more along the lines of "not as well-suited"

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-11 Thread Arturo 'Buanzo' Busleiman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Greg Stein wrote:
> Right. But they don't have the depth/breadth of modules like we do.

... yet. Keep going, but if there are great things like lighttpd and nginx (and 
even more) http
daemons out there, then that means more than one thing is wrong with current 
Apache.

Great thread.

- --
Arturo "Buanzo" Busleiman
Independent Linux and Security Consultant - OWASP - SANS - OISSG
http://www.buanzo.com.ar/pro/eng.html
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEAREKAAYFAkr7F+QACgkQAlpOsGhXcE2HbACcCgMwxXMnOJlAyyvfOTURgjiX
w6UAmQHy1fPeMOmwkYiCzV/bOL0sumBv
=l0LS
-END PGP SIGNATURE-


Re: Httpd 3.0 or something else

2009-11-11 Thread Greg Stein
On Wed, Nov 11, 2009 at 14:14, Akins, Brian  wrote:
> On 11/10/09 6:20 PM, "Greg Stein"  wrote:
>
>> I'd like to see a few "network" threads multiplexing all the writing
>> to clients.
>
> That's what I meant. I just didn't state it properly.
>
>
>> Then take all of *that*, and spread it across several processes for
>> solid uptime, with a master monitor process.
>
> And then you have nginx ;)

Right. But they don't have the depth/breadth of modules like we do. As
long as we can keep that ecosystem, then Apache will always be a
leader.

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-11 Thread Akins, Brian
On 11/10/09 6:20 PM, "Greg Stein"  wrote:

> I'd like to see a few "network" threads multiplexing all the writing
> to clients. 

That's what I meant. I just didn't state it properly.

 
> Then take all of *that*, and spread it across several processes for
> solid uptime, with a master monitor process.

And then you have nginx ;)

-- 
Brian Akins



Re: Httpd 3.0 or something else

2009-11-11 Thread Jim Jagielski

On Nov 11, 2009, at 6:09 AM, Graham Leggett wrote:

> William A. Rowe Jr. wrote:
> 
>>> - Supporting prefork as httpd does now; and
>> 
>> I'm very happy to see prefork die it's timely death.
>> 
>> Let's go about working out where out-of-process magic happens.
>> Gated, single threaded handlers may be sensible in some cases.
>> But for the core server it makes async worthless, and supporting
>> both digs us deeper into the bad-old-days of the 1.3 codebase.
> 
> I disagree strongly, for a number of reasons.
> 
> The first is that in our experience of a very high traffic collection of
> websites, the more "hops" you have, the more performance starts to suck,
> with the added complication that you run the risk of bumping your head
> into the ceiling of filehandle limits, and other issues.
> 
> If you move from "httpd-prefork" to "httpd-something proxied to
> random-appserver-X-doing-prefork-for-you" you aren't removing prefork -
> you just moving it somewhere else and adding an extra hop.
> 
> You're also making it more complicated, and more complicated means less
> reliable.
> 
> People like to harp on about how they want "speed speed speed". Right up
> to the point where it first starts becoming unreliable. At that point
> they suddenly start crying "reliable reliable reliable".
> 
> Apache httpd does lots of things right.
> 
> We must resist the temptation to throw out what we do right, while we
> try move forward fixing what we do wrong.
> 

I must say I agree. Having a method to avoid the 1:1 mapping of request/resp
to a specific "entity" (worker or thread) is nice, but that solves a
different problem than that solved by prefork. I'd like for us to solve
the one while also being able to continue to solve the other. When, for
example, nginx works, it works well. When it doesn't, it is simply
completely unsuitable. I'd like for us to continue to avoid that being
the case for httpd.


Re: Httpd 3.0 or something else

2009-11-11 Thread Bojan Smojver
On Wed, 2009-11-11 at 13:09 +0200, Graham Leggett wrote:
> Apache httpd does lots of things right.
> 
> We must resist the temptation to throw out what we do right, while we
> try move forward fixing what we do wrong.

And there is also a reason why Google's Chome is essentially (pre)fork.
This model is simply unrivalled when it comes to reliability.

-- 
Bojan



Re: Httpd 3.0 or something else

2009-11-11 Thread Graham Leggett
William A. Rowe Jr. wrote:

>> - Supporting prefork as httpd does now; and
> 
> I'm very happy to see prefork die it's timely death.
> 
> Let's go about working out where out-of-process magic happens.
> Gated, single threaded handlers may be sensible in some cases.
> But for the core server it makes async worthless, and supporting
> both digs us deeper into the bad-old-days of the 1.3 codebase.

I disagree strongly, for a number of reasons.

The first is that in our experience of a very high traffic collection of
websites, the more "hops" you have, the more performance starts to suck,
with the added complication that you run the risk of bumping your head
into the ceiling of filehandle limits, and other issues.

If you move from "httpd-prefork" to "httpd-something proxied to
random-appserver-X-doing-prefork-for-you" you aren't removing prefork -
you just moving it somewhere else and adding an extra hop.

You're also making it more complicated, and more complicated means less
reliable.

People like to harp on about how they want "speed speed speed". Right up
to the point where it first starts becoming unreliable. At that point
they suddenly start crying "reliable reliable reliable".

Apache httpd does lots of things right.

We must resist the temptation to throw out what we do right, while we
try move forward fixing what we do wrong.

Regards,
Graham
--


Re: Httpd 3.0 or something else

2009-11-10 Thread Basant Kukreja
On Tue, Nov 10, 2009 at 05:30:34PM -0500, Akins, Brian wrote:
> On 11/10/09 1:56 PM, "Greg Stein"  wrote:
> 
> 
> > But some buckets might be performing gzip or SSL encryption. That
> > consumes CPU within the network thread.
> 
> You could just run x times CPU cores number of "network" threads.  You can't
> use more than 100% of a CPU anyway.
> 
> The model that some of us discussed -- Greg, you may have invented it ;) --
> was to have a small pool of acceptor threads (maybe just one) and a pool of
> "worker" threads. The acceptor threads accept connections and move them into
> worker threads - that's it.  A single fd is then entirely owned by that
> worker thread until it (the fd) goes away - network/disk io, gzip, ssl, etc.
Sun Web Server (originated from Netscape) (Also Open Web Server) currently
handle this way.  It has a pool of acceptor threads which accepts connections,
acceptor threads pushes the connection to a connection queue, worker threads
pulls connection from connection queue and serves the request.  Keep alive
daemon is also multi-threaded. So multiple keep alive threads polls for the
various sets of connection for future HTTP requests.  The above architecture is
highly scalable. Recently Sun published a specweb record using this Web Server
for 128 CMT threads (32 cores system).
http://www.spec.org/web2005/results/res2009q4/web2005-20091013-00143.txt

You can see the sources from Open Web Server code if you are interested.
http://wikis.sun.com/display/wsFOSS/Open+Web+Server

Regards,
Basant.



Re: Httpd 3.0 or something else

2009-11-10 Thread William A. Rowe Jr.
Greg Stein wrote:
> On Mon, Nov 9, 2009 at 14:21, Paul Querna  wrote:
>> ...
>> I agree in general, a serf-based core does give us a good start.
>>
>> But Serf Buckets and the event loop definitely do need some more work
>> -- simple things, like if the backend bucket is a socket, how do you
>> tell the event loop, that a would block rvalue maps to a file
>> descriptor talking to an origin server.   You don't want to just keep
>> looping over it until it returns data, you want to poll on the origin
>> socket, and only try to read when data is available.
> 
> The goal would be that the handler's (aka content generator, aka serf
> bucket) socket would be process in the same select() as the client
> connections. When the bucket has no more data from the backend, then
> it returns "done for now". Eventually, all network reads/writes
> finalize and control returns to the core loop. If data comes in the
> backend, then the core opens and that bucket can read/return data.
> 
> There are two caveats that I can think of, right off hand:
> 
> 1) Each client connection is associated with one bucket generating the
> response. Ideally, you would not bother to read that bucket
> unless/until the client connection is ready for reading. But that
> could create a deadlock internal to the bucket -- *some* data may need
> to be consumed from the backend, processed, and returned to the
> backend to "unstick" the entire flow (think SSL). Even though nothing
> pops out the top of the bucket, internal processing may need to
> happen.
> 
> 2) If you have 10,000 client connections, and some number of sockets
> in the system ready for read/write... how do you quickly determine
> *which* buckets to poll to get those sockets processed? You don't want
> to poll  idle connections/buckets if only one is ready for
> read/write. (note: there are optimizations around this; if the bucket
> wants to return data, but wasn't asked to, then next-time-around it
> has the same data; no need to drill way down to the source bucket to
> attempt to read network data; tho this kinda sets up a busy loop until
> that bucket's client is ready for writing)
> 
> Are either of these the considerations you were thinking of?
> 
> I can certainly see some kind of system to associate buckets and the
> sockets that affect their behavior. Though that could get pretty crazy
> since it doesn't have to be a 1:1 mapping. One backend socket might
> actually service multiple buckets, and vice-versa.
> 
>> I am also concerned about the patterns of sendfile() in the current
>> serf bucket archittecture, and making a whole pipeline do sendfile
>> correctly seems quite difficult.
> 
> Well... it generally *is* quite difficult in the presence of SSL,
> gzip, and chunking. Invariably, content is mangled before hitting the
> network, so sendfile() rarely gets a chance to play ball.

This brings us straight back to our discussions from 2000-01 timeframe
when we discussed poll buckets.  Pass it up as metadata that we are stalled
on an event (at the socket, ssl, etc) - sometimes multiple events (ext_filter
blocked and either needs to read more from the socket, or was blocked on its
read, or now has something to write).


Re: Httpd 3.0 or something else

2009-11-10 Thread William A. Rowe Jr.
Graham Leggett wrote:
> - Supporting prefork as httpd does now; and

I'm very happy to see prefork die it's timely death.

Let's go about working out where out-of-process magic happens.
Gated, single threaded handlers may be sensible in some cases.
But for the core server it makes async worthless, and supporting
both digs us deeper into the bad-old-days of the 1.3 codebase.


Re: Httpd 3.0 or something else

2009-11-10 Thread Greg Stein
On Tue, Nov 10, 2009 at 17:30, Akins, Brian  wrote:
> On 11/10/09 1:56 PM, "Greg Stein"  wrote:
>
>
>> But some buckets might be performing gzip or SSL encryption. That
>> consumes CPU within the network thread.
>
> You could just run x times CPU cores number of "network" threads.  You can't
> use more than 100% of a CPU anyway.

One of those buckets might (ahem) block on a file read. While it is
doing that, you want to pass control to another bucket.

The buckets should be avoiding indetermine blocks like a socket or a
pipe, we've basically stated that a file is okay. If we had async I/O,
then we'd want to disallow that, too.

Mutexes/semaphores can be used in a bucket, as long as they attempt to
lock with "nowait" semantics.

> The model that some of us discussed -- Greg, you may have invented it ;) --
> was to have a small pool of acceptor threads (maybe just one) and a pool of
> "worker" threads. The acceptor threads accept connections and move them into
> worker threads - that's it.  A single fd is then entirely owned by that
> worker thread until it (the fd) goes away - network/disk io, gzip, ssl, etc.

Those worker threads are what we have today. It means that you have a
1:1 mapping of client connections to threads. That places serious
bounds on your scaling.

I'd like to see a few "network" threads multiplexing all the writing
to clients. Then you have "worker" threads parsing the request and
assembling the response buckets. The resulting buckets might
generate-as-they-go, so the worker thread will complete very quickly.
Or the worker thread could build a response bucket that already has
all of its data, taking a while to do so. It all depends upon the
implementation of the buckets and their construction.

Then take all of *that*, and spread it across several processes for
solid uptime, with a master monitor process.

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-10 Thread Greg Stein
On Tue, Nov 10, 2009 at 16:33, Lieven Govaerts  wrote:
> On Tue, Nov 10, 2009 at 6:10 PM, Greg Stein  wrote:
>...
>> You have 10k buckets representing the response for 10k clients. The
>> core loop reads the response from the bucket, and writes that to the
>> network.
>>
>> Now. A client socket wakes up as writable. I think it is pretty easy
>> to say "read THAT bucket" to get data for writing.
>>
>> Consider the scenario where one of those responses is proxied -- it is
>> arriving from a backend origin server. That underlying read-socket is
>> stuffed into the core loop. When that read-socket becomes available
>> for reading, *which* client response bucket do you start reading from?
>> And what happens if the client socket is not writable?
>>
>> You could just zip thru the 10k response buckets and poll each one for
>> data to read, and the serf design states that the underlying
>> read-socket *will* get read. But you've gotta do a lot of polling to
>> get there.
>>
>> I think that will be an interesting problem to solve. I believe it
>> would be something like this:
>>
>> Consider when a request arrives. The core looks at the Request-URI and
>> the Headers. From these inputs, it determines the appropriate
>> response. In this case, that response is identified by a bucket,
>> configured with those inputs. (and somewhere in here, any Request-Body
>> is managed; but ignore that for now)  As that response bucket is
>> constructed, along with all interior/nested buckets, that construction
>> can say "I've got an FD here. Please add this to the core loop." The
>> FD would be added, and would then be associated with the response
>> bucket, so we know which to read when the FD wakes up.
>>
> Suppose this is the diagram of the proxy scenario, where A and B are
> buckets wrapping the socket bucket:
>
> browser -->  (client fd)  [core loop]  [A [B [socket bucket  (server
> fd) <-- server
>
> If there's an event on the client fd, the core loop can read bytes
> from bucket A - as much as the client socket can handle.

Right, and right.

> But if only the server fd wakes up,  the core loop can't really read
> anything as it has nowhere to forward the data to.
> The best thing it can do, is tell bucket A: somewhere deep down
> there's data to read and considering I (the core loop) was alerted of
> that fact there must be one of the other buckets B, C.. interested in
> buffering/proactively transforming that data, so please forward this
> trigger.

Buckets have a peek() function.

Hmm. Theoretically, the bucket is *empty* of contents, or you would
not have returned to the event loop. Thus, when the peek() rolls
around, the bucket is going to figure out what it can provide without
blocking.

But... the buckets were designed for client-side operation. Buckets
are supposed to be emptied completely. That isn't true on the server:
the client socket might not be available for writing, so we don't
empty a response bucket to completion.

It does sound like something more may be needed, in order to propagate
some reading down the stack of buckets. But there is also a worry of:
if we read, then were do we put that, if the network isn't ready for
writing?

These read/status/nesting/etc concept are done in order to prevent
deadlocks. Ideally, *everything* is read and written to completion. An
appserver might not be able to provide you with more content, until
you give it something first. So the trick is to flush all writes, and
to flush all reads (because the latter might signal another write in
order to continue generating content... ad nauseum).

> I don't think the buckets interface already has a function for that,
> but something similar to 'read 0 bytes' would do.
>
> So, did I understand your proposal correctly?

Yes. But we may have some refining to do, as you've raised, and
looking more closely at the flows.

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-10 Thread Akins, Brian
On 11/10/09 1:56 PM, "Greg Stein"  wrote:


> But some buckets might be performing gzip or SSL encryption. That
> consumes CPU within the network thread.

You could just run x times CPU cores number of "network" threads.  You can't
use more than 100% of a CPU anyway.

The model that some of us discussed -- Greg, you may have invented it ;) --
was to have a small pool of acceptor threads (maybe just one) and a pool of
"worker" threads. The acceptor threads accept connections and move them into
worker threads - that's it.  A single fd is then entirely owned by that
worker thread until it (the fd) goes away - network/disk io, gzip, ssl, etc.


-- 
Brian Akins



Re: Httpd 3.0 or something else

2009-11-10 Thread Lieven Govaerts
On Tue, Nov 10, 2009 at 6:10 PM, Greg Stein  wrote:
> On Tue, Nov 10, 2009 at 11:14, Akins, Brian  wrote:
>> On 11/9/09 3:08 PM, "Greg Stein"  wrote:
>>
>>> 2) If you have 10,000 client connections, and some number of sockets
>>> in the system ready for read/write... how do you quickly determine
>>> *which* buckets to poll to get those sockets processed? You don't want
>>> to poll  idle connections/buckets if only one is ready for
>>> read/write.
>>
>> Epoll/kqueue/etc. Takes care of that for you.
>
> Sorry. I wasn't clear.
>
> You have 10k buckets representing the response for 10k clients. The
> core loop reads the response from the bucket, and writes that to the
> network.
>
> Now. A client socket wakes up as writable. I think it is pretty easy
> to say "read THAT bucket" to get data for writing.
>
> Consider the scenario where one of those responses is proxied -- it is
> arriving from a backend origin server. That underlying read-socket is
> stuffed into the core loop. When that read-socket becomes available
> for reading, *which* client response bucket do you start reading from?
> And what happens if the client socket is not writable?
>
> You could just zip thru the 10k response buckets and poll each one for
> data to read, and the serf design states that the underlying
> read-socket *will* get read. But you've gotta do a lot of polling to
> get there.
>
> I think that will be an interesting problem to solve. I believe it
> would be something like this:
>
> Consider when a request arrives. The core looks at the Request-URI and
> the Headers. From these inputs, it determines the appropriate
> response. In this case, that response is identified by a bucket,
> configured with those inputs. (and somewhere in here, any Request-Body
> is managed; but ignore that for now)  As that response bucket is
> constructed, along with all interior/nested buckets, that construction
> can say "I've got an FD here. Please add this to the core loop." The
> FD would be added, and would then be associated with the response
> bucket, so we know which to read when the FD wakes up.
>
Suppose this is the diagram of the proxy scenario, where A and B are
buckets wrapping the socket bucket:

browser -->  (client fd)  [core loop]  [A [B [socket bucket  (server
fd) <-- server

If there's an event on the client fd, the core loop can read bytes
from bucket A - as much as the client socket can handle.

But if only the server fd wakes up,  the core loop can't really read
anything as it has nowhere to forward the data to.
The best thing it can do, is tell bucket A: somewhere deep down
there's data to read and considering I (the core loop) was alerted of
that fact there must be one of the other buckets B, C.. interested in
buffering/proactively transforming that data, so please forward this
trigger.

I don't think the buckets interface already has a function for that,
but something similar to 'read 0 bytes' would do.

So, did I understand your proposal correctly?

Lieven


Re: Httpd 3.0 or something else

2009-11-10 Thread Greg Stein
On Tue, Nov 10, 2009 at 12:54, Graham Leggett  wrote:
> Greg Stein wrote:
>
>>> Who is "you"?
>>
>> Anybody who reads from a bucket. In this case, the core network loop
>> when a client connection is ready for writing.
>
> So would it be correct to say that in this theoretical httpd, the httpd
> core, and nobody else, would read from the serf bucket?

Correct. That bucket represents the response to the client, and only
the core reads that.

>...
>> No module *anywhere* ever writes to the network.
>>
>> The core loop reads/pulls from a bucket when it needs more data (for
>> writing to the network).
>>
>> When your cache bucket reads from its interior bucket, it can also
>> drop the content into a file, off to the side. Think of this bucket as
>> a filter. All content that is read through it will be dumped into a
>> file, too.
>
> Makes sense, but what happens when the cache has finished reading the
> interior bucket after the first pass through the code?

If the interior has returned EOF, then the caching bucket can destroy
it, if it likes.

> At this point, my cache needs to make a decision, and before it can make
> that decision it wants to know whether upstream is capable of swallowing
> the data right now without blocking.

No no... the core only asked for as much as it can handle. You return
*no more* than that. It isn't your problem to make blocking decisions
for the reader of your bucket.

If you read more from the interior than the caller wants from you,
then that's your problem :-)  You need to hold that in memory, dump it
to disk, or ... dunno.

> If the answer is yes, I cache the data and pass the data upstream and
> wait to be called again immediately, because I know upstream won't block.
>
> If the answer is no, I *don't* pass data upstream (because it would
> block from my perspective), and I read from the interior bucket again,
> cache some more, and then ask again whether to pass the two data chunks
> upstream.

Again: you don't make that decision. You just return what the caller
asked you for. It may decide to call you again, but that isn't up to
you.

If you return "this is all I have for you right now", then it won't
call you again until some (network) event occurs which may provide
more data for reading.

If you return EOF, then it shouldn't call you again, tho I believe our
rules state that if it *does*, then just return EOF again.

> How does my cache get the answer to its question?
>
> And how does my cache code know when it is safe to read from the
> interior bucket without blocking?

Buckets *never* block. The interior bucket will give you data saying
"I have more", give you data saying "I have no more right now", or say
"no more" (EOF). But in no case should it ever block.

(note: we do "block" on reading a file, but if we had portable async
I/O file operations, then we'd switch to those)

>...
> I figure there are no better people to explain how serf works than they
> who wrote serf ;)

Happy to. Unfortunately, we have a dearth of documentation :-(

Hopefully, this thread will help to educate several (httpd) developers
on the serf model.

>...
> Imagine big bloated expensive application server, the kind that's
> typically built by the lowest bidder.
>
> Imagine this server is fronted by an httpd reverse proxy.
>
> Image at the end of the chain, there is a glacially slow (in computing
> terms) browser waiting to consume the response.
>
> A request is processed, and the httpd proxy receives an EOS from the big
> bloated application server. Ideally it wants to drop the backend
> connection ASAP, no point handing around, but it can't, because the
> cleanup for the backend connection is tied to the pool from the request.
> And the request pool is only complete when the last byte of the request
> has been finally acknowledged by the glacially slow browser.
>
> So httpd, and the big bloated expensive application server, sit around
> waiting, waiting and waiting with memory allocated, database connections
> left open, for the browser to finally say "got it, gimme some more"
> before httpd's event loops goes "that was it,
> apr_pool_destroy(serf_bucket->pool), next!".

Okay. The bucket system is different. We have a somewhat-confusing
blend between explicit and region-based freeing. If you're done with a
bucket, then kill it. Don't wait for the pool to be cleared.

In your above scenario, the reverse-proxy-bucket can kill the
socket-bucket once the latter returns EOF, and that will drop the
connection.

Now... all that said, the above scenario is a bit problematic. If the
appserver return 2G of content to the frontend server, then where does
it go? Any type of bucket that reads-to-EOF is going to have to spool
its results somewhere (memory or disk). Otherwise, you keep a
small-ish read buffer in memory and you stream through the buffer at
whatever read-rate your caller is providing (potentially the client
browser's speed).

>...
> I can see us solve this problem simply by making the filter

Re: Httpd 3.0 or something else

2009-11-10 Thread Graham Leggett
Greg Stein wrote:

>> Who is "you"?
> 
> Anybody who reads from a bucket. In this case, the core network loop
> when a client connection is ready for writing.

So would it be correct to say that in this theoretical httpd, the httpd
core, and nobody else, would read from the serf bucket?

>> Up till now, my understanding is that "you" is the core, and therefore
>> not under control of a module writer.
>>
>> Let me put it another way. Imagine I am a cache module. I want to read
>> as much as possible as fast as possible from a backend, and I want to
>> write this data to two places simultaneously: the cache, and the
>> downstream network. I know the cache is always writable, but the
>> downstream network I am not sure of, I only want to write to the
>> downstream network when the downstream network is ready for me.
>>
>> How would I do this in a serf model?
> 
> No module *anywhere* ever writes to the network.
> 
> The core loop reads/pulls from a bucket when it needs more data (for
> writing to the network).
> 
> When your cache bucket reads from its interior bucket, it can also
> drop the content into a file, off to the side. Think of this bucket as
> a filter. All content that is read through it will be dumped into a
> file, too.

Makes sense, but what happens when the cache has finished reading the
interior bucket after the first pass through the code?

At this point, my cache needs to make a decision, and before it can make
that decision it wants to know whether upstream is capable of swallowing
the data right now without blocking.

If the answer is yes, I cache the data and pass the data upstream and
wait to be called again immediately, because I know upstream won't block.

If the answer is no, I *don't* pass data upstream (because it would
block from my perspective), and I read from the interior bucket again,
cache some more, and then ask again whether to pass the two data chunks
upstream.

How does my cache get the answer to its question?

And how does my cache code know when it is safe to read from the
interior bucket without blocking?

>> That I understand, but it makes no difference as I see it - your loop
>> only reads from the bucket and jams it into the client socket if the
>> client socket is good and ready to accept data.
>>
>> If the client socket isn't good and ready, the bucket doesn't get pulled
>> from, and resources used by the bucket are left in limbo until the
>> client is done. If the bucket wants to do something clever, like cache,
>> or release resources early, it can't - because as soon as it returns the
>> data it has to wait for the client socket to be good and ready all over
>> again. The server runs as slow as the browser, which in computing terms
>> is glacially slow.
> 
> I'm not sure that I understand you, and that you're familiar with the
> serf bucket model.

You are 100% right, I am not completely familiar with the serf bucket
model, which is why I'm asking these questions.

I figure there are no better people to explain how serf works than they
who wrote serf ;)

> The bucket can certainly cache data as it flows through. No problem
> there. Once the bucket has returned all of its data, it can close its
> file handle or socket or whatever resources it may have.
> 
> Buckets are one-time use, so once it has returned all of its data, it
> can throw out any resources.
> 
> And no... the server does NOT run as slow as the browser. There are N
> browsers connected, and the server is processing ALL of them. One
> single response bucket is running as fast as its client, sure, but the
> server certainly is not idle.

That isn't what I meant.

Imagine big bloated expensive application server, the kind that's
typically built by the lowest bidder.

Imagine this server is fronted by an httpd reverse proxy.

Image at the end of the chain, there is a glacially slow (in computing
terms) browser waiting to consume the response.

A request is processed, and the httpd proxy receives an EOS from the big
bloated application server. Ideally it wants to drop the backend
connection ASAP, no point handing around, but it can't, because the
cleanup for the backend connection is tied to the pool from the request.
And the request pool is only complete when the last byte of the request
has been finally acknowledged by the glacially slow browser.

So httpd, and the big bloated expensive application server, sit around
waiting, waiting and waiting with memory allocated, database connections
left open, for the browser to finally say "got it, gimme some more"
before httpd's event loops goes "that was it,
apr_pool_destroy(serf_bucket->pool), next!".

And the reason why this happened was that all of this was driven by the
core's event loop, timed against the speed of the glacially slow browser.

Obviously a second browser next door is being serviced at same time as
you pointed out, but it too waits, waits, waits for that browser to
eventually acknowledge the end of the request.

This is the reason why peop

Re: Httpd 3.0 or something else

2009-11-10 Thread Greg Stein
On Tue, Nov 10, 2009 at 12:01, Jim Jagielski  wrote:
> On Nov 9, 2009, at 2:19 PM, Akins, Brian wrote:
>> On 11/9/09 2:06 PM, "Greg Stein"  wrote:
>>
>>> These issues are already solved by moving to a Serf core. It is fully
>>> asynchronous.
>>
>> Okay that's one convert, any others? ;)

Convert? Bah. Justin and myself *started* serf. I'm rather biased, and
have never been a simple convert. Messiah, maybe. ;-)

>> That's what Paul and I discussed a lot last week.
>>
>> My ideal httpd 3.0 is:
>>
>> Libev + serf + lua
>
> +1
>
> For 3.0, I see us breaking the mold and the API in a pretty
> substantial way.

+1 and ditto.

(tho I think we can provide for old handlers thru the pipe mechanism I
described earlier on this thread)

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-10 Thread Greg Stein
On Tue, Nov 10, 2009 at 11:14, Akins, Brian  wrote:
> On 11/9/09 3:08 PM, "Greg Stein"  wrote:
>
>> 2) If you have 10,000 client connections, and some number of sockets
>> in the system ready for read/write... how do you quickly determine
>> *which* buckets to poll to get those sockets processed? You don't want
>> to poll  idle connections/buckets if only one is ready for
>> read/write.
>
> Epoll/kqueue/etc. Takes care of that for you.

Sorry. I wasn't clear.

You have 10k buckets representing the response for 10k clients. The
core loop reads the response from the bucket, and writes that to the
network.

Now. A client socket wakes up as writable. I think it is pretty easy
to say "read THAT bucket" to get data for writing.

Consider the scenario where one of those responses is proxied -- it is
arriving from a backend origin server. That underlying read-socket is
stuffed into the core loop. When that read-socket becomes available
for reading, *which* client response bucket do you start reading from?
And what happens if the client socket is not writable?

You could just zip thru the 10k response buckets and poll each one for
data to read, and the serf design states that the underlying
read-socket *will* get read. But you've gotta do a lot of polling to
get there.

I think that will be an interesting problem to solve. I believe it
would be something like this:

Consider when a request arrives. The core looks at the Request-URI and
the Headers. From these inputs, it determines the appropriate
response. In this case, that response is identified by a bucket,
configured with those inputs. (and somewhere in here, any Request-Body
is managed; but ignore that for now)  As that response bucket is
constructed, along with all interior/nested buckets, that construction
can say "I've got an FD here. Please add this to the core loop." The
FD would be added, and would then be associated with the response
bucket, so we know which to read when the FD wakes up.

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-10 Thread Greg Stein
On Mon, Nov 9, 2009 at 18:47, Graham Leggett  wrote:
>...
>> When you read from a serf bucket, it will return however much you ask
>> for, or as much as it has without blocking. When it gives you that
>> data, it can say "I have more", "I'm done", or "This is what I had
>> without blocking".
>
> Who is "you"?

Anybody who reads from a bucket. In this case, the core network loop
when a client connection is ready for writing.

> Up till now, my understanding is that "you" is the core, and therefore
> not under control of a module writer.
>
> Let me put it another way. Imagine I am a cache module. I want to read
> as much as possible as fast as possible from a backend, and I want to
> write this data to two places simultaneously: the cache, and the
> downstream network. I know the cache is always writable, but the
> downstream network I am not sure of, I only want to write to the
> downstream network when the downstream network is ready for me.
>
> How would I do this in a serf model?

No module *anywhere* ever writes to the network.

The core loop reads/pulls from a bucket when it needs more data (for
writing to the network).

When your cache bucket reads from its interior bucket, it can also
drop the content into a file, off to the side. Think of this bucket as
a filter. All content that is read through it will be dumped into a
file, too.

>...
> That I understand, but it makes no difference as I see it - your loop
> only reads from the bucket and jams it into the client socket if the
> client socket is good and ready to accept data.
>
> If the client socket isn't good and ready, the bucket doesn't get pulled
> from, and resources used by the bucket are left in limbo until the
> client is done. If the bucket wants to do something clever, like cache,
> or release resources early, it can't - because as soon as it returns the
> data it has to wait for the client socket to be good and ready all over
> again. The server runs as slow as the browser, which in computing terms
> is glacially slow.

I'm not sure that I understand you, and that you're familiar with the
serf bucket model.

The bucket can certainly cache data as it flows through. No problem
there. Once the bucket has returned all of its data, it can close its
file handle or socket or whatever resources it may have.

Buckets are one-time use, so once it has returned all of its data, it
can throw out any resources.

And no... the server does NOT run as slow as the browser. There are N
browsers connected, and the server is processing ALL of them. One
single response bucket is running as fast as its client, sure, but the
server certainly is not idle.

>...
> One event loop handling many requests each == event MPM (speed and
> resource efficient, but we'd better be bug free).
> Many event loops handling many requests each == worker MPM (compromise).
> Many event loops handling one request each == prefork (reliable old
> workhorse).

These have no bearing. The current MPM model is based on
content-generators writing/pushing data into the network.

A serf-based model reads from content-generators.

> In theory if we turn the content handler into a filter and bootstrap the
> filter stack with a bucket of some kind, this may work.
>
> In fact, using both "push" and "pull" at the same time might also make
> some sense - your event loop creates a bucket from which data is
> "pulled" (serf model), which is in turn "pulled" by a filter stack
> (existing filter stack model) and "pushed" upstream.

That is NOT the design that myself, Paul, and Justin envision. The
core is serf. So *everything* is read/pull-based.

The old-style handlers and filters get their own thread and push into
a pipe, or an in-memory data queue. The core loop uses a bucket which
reads out of that pipe.

>...

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-10 Thread Jim Jagielski

On Nov 9, 2009, at 2:19 PM, Akins, Brian wrote:

> On 11/9/09 2:06 PM, "Greg Stein"  wrote:
> 
>> These issues are already solved by moving to a Serf core. It is fully
>> asynchronous.
> 
> Okay that's one convert, any others? ;)
> 

I said the same thing back on the 4th ;)

> That's what Paul and I discussed a lot last week.
> 
> My ideal httpd 3.0 is:
> 
> Libev + serf + lua

+1

For 3.0, I see us breaking the mold and the API in a pretty
substantial way.


Re: Httpd 3.0 or something else

2009-11-10 Thread Graham Leggett
Greg Stein wrote:

>> I am also concerned about the patterns of sendfile() in the current
>> serf bucket archittecture, and making a whole pipeline do sendfile
>> correctly seems quite difficult.
> 
> Well... it generally *is* quite difficult in the presence of SSL,
> gzip, and chunking. Invariably, content is mangled before hitting the
> network, so sendfile() rarely gets a chance to play ball.

Not necessarily - a sensible cache that writes an interim response to
disk should ideally replace the current in-memory response with a
sendfile-capable file bucket.

Having done whatever filtering magic is required, the server just goes
"here kernel, give this file to the network, I'm off to serve the next
request, bye".

Regards,
Graham
--


Re: Httpd 3.0 or something else

2009-11-10 Thread Graham Leggett
Paul Querna wrote:

> But Serf Buckets and the event loop definitely do need some more work
> -- simple things, like if the backend bucket is a socket, how do you
> tell the event loop, that a would block rvalue maps to a file
> descriptor talking to an origin server.   You don't want to just keep
> looping over it until it returns data, you want to poll on the origin
> socket, and only try to read when data is available.

I think it can probably be generally stated that every request processed
by the server has N descriptors associated with that request (instead of
1 descriptor, in the current code).

In the case of a simple file transfer, there are two descriptors, one
belonging to the file, the other belonging to the network socket.

In the case of a proxy, one socket belongs to the backend connection,
and the other belongs to a frontend network socket.

And descriptors might need to be polled for read, or for write, or both
(SSL).

If a mechanism existed whereby all descriptors associated with a request
could be given to the event loop, we could be completely asynchronous
throughout the server, from the reading from the backend, to the writing
to the frontend.

Regards,
Graham
--


Re: Httpd 3.0 or something else

2009-11-10 Thread Akins, Brian
On 11/9/09 3:08 PM, "Greg Stein"  wrote:

> 2) If you have 10,000 client connections, and some number of sockets
> in the system ready for read/write... how do you quickly determine
> *which* buckets to poll to get those sockets processed? You don't want
> to poll  idle connections/buckets if only one is ready for
> read/write.

Epoll/kqueue/etc. Takes care of that for you.

-- 
Brian Akins



Re: Httpd 3.0 or something else

2009-11-10 Thread Niklas Edmundsson

On Mon, 9 Nov 2009, Graham Leggett wrote:


Akins, Brian wrote:


FWIW, nginx "buffers" backend stuff to a file, then sendfiles it out -  I
think this is what perlbal does as well.  Same can be done outside apache
using X-sendfile like methods.  Seems like we could move this "inside"
apache fairly easy.  May can do it with a filter.  I tried once and got it
to filter "most" backend stuff to a temp file, but it tended to miss and
block.  That was a while ago, but I haven't learned anymore about the
filters since then to think it would work any better.

Maybe a mod_buffer that goes to a file?


mod_disk_cache can be made to do this quite trivially (it's on the list
of things to do When I Have Time(TM)).

In theory, a mod_disk_buffer could do this quite easily, on condition
upstream writes didn't block.


I'm guessing that this would be the good-looking implementation of my 
ugly-but-working making-disk-cache-work-for-large-files patchset 
(version for 2.2.9 at 
https://issues.apache.org/bugzilla/show_bug.cgi?id=39380, I'm in the 
process of respinning it for 2.2.14 but ENOTIME makes testing slow).


The main issue I had when cobbling that together was to deal with the 
fact that stuff wants to block, and it really isn't obvious in the 
current httpd core how to do this nicely when you have a one-to-many 
situation.


As you might remember, I "solved" it by spawning a thread to deal with 
caching files in the background when needed. Since our usecase is 
delivering static files it works, but it sure would be nice with an 
infrastructure that tried to help you instead of being damn near 
hostile at times.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 Quantum Trek: Time travel with a twist!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: Httpd 3.0 or something else

2009-11-09 Thread Graham Leggett
Greg Stein wrote:

>> How is "pull" different from "push"[1]?
> 
> The network loop pulls data from the content-generator.
> 
> Apache 1.x and 2.x had a handler that pushed data at the network.
> There is no loop, of course, since each worker had direct control of
> the socket to push data into.

As I said in [1], apart from the obvious ;)

>> Pull, by definition, is blocking behaviour.
> 
> You may want to check your definitions.
> 
> When you read from a serf bucket, it will return however much you ask
> for, or as much as it has without blocking. When it gives you that
> data, it can say "I have more", "I'm done", or "This is what I had
> without blocking".

Who is "you"?

Up till now, my understanding is that "you" is the core, and therefore
not under control of a module writer.

Let me put it another way. Imagine I am a cache module. I want to read
as much as possible as fast as possible from a backend, and I want to
write this data to two places simultaneously: the cache, and the
downstream network. I know the cache is always writable, but the
downstream network I am not sure of, I only want to write to the
downstream network when the downstream network is ready for me.

How would I do this in a serf model?

>> You will only run as often as you are pulled, and never more often. And
>> if the pull is controlled by how quickly the client is accepting the
>> data, which is typically orders of magnitude slower than the backend can
>> push, you have no opportunity to try speed up the server in any way.
> 
> Eh? Are you kidding me?
> 
> One single network thread can manage N client connections. As each
> becomes writable, the loop reads ("pulls") from the bucket and jams it
> into the client socket. If you're really fancy, then you know what the
> window is, and you ask the bucket for that much data.

That I understand, but it makes no difference as I see it - your loop
only reads from the bucket and jams it into the client socket if the
client socket is good and ready to accept data.

If the client socket isn't good and ready, the bucket doesn't get pulled
from, and resources used by the bucket are left in limbo until the
client is done. If the bucket wants to do something clever, like cache,
or release resources early, it can't - because as soon as it returns the
data it has to wait for the client socket to be good and ready all over
again. The server runs as slow as the browser, which in computing terms
is glacially slow.

>> Push however, gives you a choice: the push either worked (yay! go
>> browser!), or it didn't (sensible alternative behaviour, like cache it
>> for later in a connection filter). Push happens as fast the backend, not
>> as slow as the frontend.
> 
> Push means that you have a worker per connection, pushing the response
> onto the network. I really would like to see us get away from a worker
> per connection.

Only if you write it that way (which we have done till now).

There is no reason why one event loop can't handle many requests at the
same time.

One event loop handling many requests each == event MPM (speed and
resource efficient, but we'd better be bug free).
Many event loops handling many requests each == worker MPM (compromise).
Many event loops handling one request each == prefork (reliable old
workhorse).

In theory if we turn the content handler into a filter and bootstrap the
filter stack with a bucket of some kind, this may work.

In fact, using both "push" and "pull" at the same time might also make
some sense - your event loop creates a bucket from which data is
"pulled" (serf model), which is in turn "pulled" by a filter stack
(existing filter stack model) and "pushed" upstream.

Functions that work better as a "pull" (proxy and friends) can be
pulled, functions that work better as a "push" (like caching) can be
filters.

Regards,
Graham
--


Re: Httpd 3.0 or something else

2009-11-09 Thread Greg Stein
On Mon, Nov 9, 2009 at 16:19, Graham Leggett  wrote:
> Greg Stein wrote:
>> These issues are already solved by moving to a Serf core. It is fully
>> asynchronous.
>>
>> Backend handlers will no longer "push" bits towards the network. The
>> core will "pull" them from a bucket. *Which* bucket is defined by a
>> {URL,Headers}->Bucket mapping system.
>
> How is "pull" different from "push"[1]?

The network loop pulls data from the content-generator.

Apache 1.x and 2.x had a handler that pushed data at the network.
There is no loop, of course, since each worker had direct control of
the socket to push data into.

> Pull, by definition, is blocking behaviour.

You may want to check your definitions.

When you read from a serf bucket, it will return however much you ask
for, or as much as it has without blocking. When it gives you that
data, it can say "I have more", "I'm done", or "This is what I had
without blocking".

> You will only run as often as you are pulled, and never more often. And
> if the pull is controlled by how quickly the client is accepting the
> data, which is typically orders of magnitude slower than the backend can
> push, you have no opportunity to try speed up the server in any way.

Eh? Are you kidding me?

One single network thread can manage N client connections. As each
becomes writable, the loop reads ("pulls") from the bucket and jams it
into the client socket. If you're really fancy, then you know what the
window is, and you ask the bucket for that much data.

> Push however, gives you a choice: the push either worked (yay! go
> browser!), or it didn't (sensible alternative behaviour, like cache it
> for later in a connection filter). Push happens as fast the backend, not
> as slow as the frontend.

Push means that you have a worker per connection, pushing the response
onto the network. I really would like to see us get away from a worker
per connection.

Once a worker thread determines which bucket to create/build, then it
passes it along to the network thread, and returns for more work. The
network thread can then manage N connections with their associated
response buckets.

If one network thread cannot read/generate the content fast enough,
then you use multiple threads to keep the connections full.

Then you want to add in a bit of control around reading of requests in
order to manage the backlog of responses (and any potential memory
buildup that entails). If the network thread is consuming 100M and 20k
sockets, you may want to stop accepting connections or accept but read
them slowly until the pressure eases. etc...

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-09 Thread Graham Leggett
Greg Stein wrote:

> These issues are already solved by moving to a Serf core. It is fully
> asynchronous.
> 
> Backend handlers will no longer "push" bits towards the network. The
> core will "pull" them from a bucket. *Which* bucket is defined by a
> {URL,Headers}->Bucket mapping system.

How is "pull" different from "push"[1]?

Pull, by definition, is blocking behaviour.

You will only run as often as you are pulled, and never more often. And
if the pull is controlled by how quickly the client is accepting the
data, which is typically orders of magnitude slower than the backend can
push, you have no opportunity to try speed up the server in any way.

Push however, gives you a choice: the push either worked (yay! go
browser!), or it didn't (sensible alternative behaviour, like cache it
for later in a connection filter). Push happens as fast the backend, not
as slow as the frontend.

So far I'm not convinced it is a step forward, will have to think about
it more.

[1] Apart from the obvious.

Regards,
Graham
--


Re: Httpd 3.0 or something else

2009-11-09 Thread Nick Kew

Akins, Brian wrote:


What we discussed some on list some at Apachecon, was having a really good
and simple process manager.  Mod_fcgid is too much work to configure for
mere mortals.  If we just had something like:

AssociateExternal .php /path/to/my/php-cgi


Sounds interesting.  Any notes from apachecon or otherwise on
that discussion?

--
Nick Kew


Re: Httpd 3.0 or something else

2009-11-09 Thread Greg Stein
On Mon, Nov 9, 2009 at 14:21, Paul Querna  wrote:
>...
> I agree in general, a serf-based core does give us a good start.
>
> But Serf Buckets and the event loop definitely do need some more work
> -- simple things, like if the backend bucket is a socket, how do you
> tell the event loop, that a would block rvalue maps to a file
> descriptor talking to an origin server.   You don't want to just keep
> looping over it until it returns data, you want to poll on the origin
> socket, and only try to read when data is available.

The goal would be that the handler's (aka content generator, aka serf
bucket) socket would be process in the same select() as the client
connections. When the bucket has no more data from the backend, then
it returns "done for now". Eventually, all network reads/writes
finalize and control returns to the core loop. If data comes in the
backend, then the core opens and that bucket can read/return data.

There are two caveats that I can think of, right off hand:

1) Each client connection is associated with one bucket generating the
response. Ideally, you would not bother to read that bucket
unless/until the client connection is ready for reading. But that
could create a deadlock internal to the bucket -- *some* data may need
to be consumed from the backend, processed, and returned to the
backend to "unstick" the entire flow (think SSL). Even though nothing
pops out the top of the bucket, internal processing may need to
happen.

2) If you have 10,000 client connections, and some number of sockets
in the system ready for read/write... how do you quickly determine
*which* buckets to poll to get those sockets processed? You don't want
to poll  idle connections/buckets if only one is ready for
read/write. (note: there are optimizations around this; if the bucket
wants to return data, but wasn't asked to, then next-time-around it
has the same data; no need to drill way down to the source bucket to
attempt to read network data; tho this kinda sets up a busy loop until
that bucket's client is ready for writing)

Are either of these the considerations you were thinking of?

I can certainly see some kind of system to associate buckets and the
sockets that affect their behavior. Though that could get pretty crazy
since it doesn't have to be a 1:1 mapping. One backend socket might
actually service multiple buckets, and vice-versa.

> I am also concerned about the patterns of sendfile() in the current
> serf bucket archittecture, and making a whole pipeline do sendfile
> correctly seems quite difficult.

Well... it generally *is* quite difficult in the presence of SSL,
gzip, and chunking. Invariably, content is mangled before hitting the
network, so sendfile() rarely gets a chance to play ball.

But if you really are just dealing with plain files (maybe prezipped),
then the read_for_sendfile() should be workable. Most buckets can't do
squat with it, and should just use a default function. But the file
bucket can return a proper handle.
(and it is entirely possible/reasonable that the signature should be
adjusted to simplify the process)

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-09 Thread Paul Querna
On Mon, Nov 9, 2009 at 11:06 AM, Greg Stein  wrote:
> On Mon, Nov 9, 2009 at 13:59, Graham Leggett  wrote:
>> Akins, Brian wrote:
>>
> It works really well for proxy.
 Aka "static data" :)
>>>
>>> Nah, we proxy to fastcgi php stuff, http java stuff, some horrid HTTP perl
>>> stuff, etc (Full disclosure, I wrote the horrid perl stuff.)
>>
>> Doesn't matter, once httpd proxy gets hold of it, it's just shifting
>> static bits.
>>
>> Something I want to teach httpd to do is buffer up data for output, and
>> then forget about the output to focus on releasing the backend resources
>> ASAP, ready for the next request when it (eventually) comes. The fact
>> that network writes block makes this painful to achieve.
>>
>> Proxy had an optimisation that released proxied backend resources when
>> it detected EOS from the backend but before attempting to pass it to the
>> frontend, but someone refactored that away at some point. It would be
>> good if such an optimisation was available server wide.
>>
>> I want to be able to write something to the filter stack, and get an
>> EWOULDBLOCK (or similar) back if it isn't ready. I could then make
>> intelligent decisions based on this. For example, if I were a cache, I
>> would carry on reading from the backend and writing the data to the
>> cache, while the frontend was saying "not now, slow browser ahead". I
>> could have long since finished caching and closed the backend connection
>> and freed the resources, before the frontend returned "cool, ready for
>> you now", at which point I answer "no worries, have the cached content I
>> prepared earlier".
>
> These issues are already solved by moving to a Serf core. It is fully
> asynchronous.
>
> Backend handlers will no longer "push" bits towards the network. The
> core will "pull" them from a bucket. *Which* bucket is defined by a
> {URL,Headers}->Bucket mapping system.

I was talking to Aaron about this at ApacheCon.

I agree in general, a serf-based core does give us a good start.

But Serf Buckets and the event loop definitely do need some more work
-- simple things, like if the backend bucket is a socket, how do you
tell the event loop, that a would block rvalue maps to a file
descriptor talking to an origin server.   You don't want to just keep
looping over it until it returns data, you want to poll on the origin
socket, and only try to read when data is available.

I am also concerned about the patterns of sendfile() in the current
serf bucket archittecture, and making a whole pipeline do sendfile
correctly seems quite difficult.

-Paul


Re: Httpd 3.0 or something else

2009-11-09 Thread Akins, Brian
On 11/9/09 2:06 PM, "Greg Stein"  wrote:

> These issues are already solved by moving to a Serf core. It is fully
> asynchronous.

Okay that's one convert, any others? ;)

That's what Paul and I discussed a lot last week.

My ideal httpd 3.0 is:

Libev + serf + lua

-- 
Brian Akins



Re: Httpd 3.0 or something else

2009-11-09 Thread Graham Leggett
Akins, Brian wrote:

> FWIW, nginx "buffers" backend stuff to a file, then sendfiles it out -  I
> think this is what perlbal does as well.  Same can be done outside apache
> using X-sendfile like methods.  Seems like we could move this "inside"
> apache fairly easy.  May can do it with a filter.  I tried once and got it
> to filter "most" backend stuff to a temp file, but it tended to miss and
> block.  That was a while ago, but I haven't learned anymore about the
> filters since then to think it would work any better.
> 
> Maybe a mod_buffer that goes to a file?

mod_disk_cache can be made to do this quite trivially (it's on the list
of things to do When I Have Time(TM)).

In theory, a mod_disk_buffer could do this quite easily, on condition
upstream writes didn't block.

Regards,
Graham
--



Re: Httpd 3.0 or something else

2009-11-09 Thread Greg Stein
On Mon, Nov 9, 2009 at 13:59, Graham Leggett  wrote:
> Akins, Brian wrote:
>
 It works really well for proxy.
>>> Aka "static data" :)
>>
>> Nah, we proxy to fastcgi php stuff, http java stuff, some horrid HTTP perl
>> stuff, etc (Full disclosure, I wrote the horrid perl stuff.)
>
> Doesn't matter, once httpd proxy gets hold of it, it's just shifting
> static bits.
>
> Something I want to teach httpd to do is buffer up data for output, and
> then forget about the output to focus on releasing the backend resources
> ASAP, ready for the next request when it (eventually) comes. The fact
> that network writes block makes this painful to achieve.
>
> Proxy had an optimisation that released proxied backend resources when
> it detected EOS from the backend but before attempting to pass it to the
> frontend, but someone refactored that away at some point. It would be
> good if such an optimisation was available server wide.
>
> I want to be able to write something to the filter stack, and get an
> EWOULDBLOCK (or similar) back if it isn't ready. I could then make
> intelligent decisions based on this. For example, if I were a cache, I
> would carry on reading from the backend and writing the data to the
> cache, while the frontend was saying "not now, slow browser ahead". I
> could have long since finished caching and closed the backend connection
> and freed the resources, before the frontend returned "cool, ready for
> you now", at which point I answer "no worries, have the cached content I
> prepared earlier".

These issues are already solved by moving to a Serf core. It is fully
asynchronous.

Backend handlers will no longer "push" bits towards the network. The
core will "pull" them from a bucket. *Which* bucket is defined by a
{URL,Headers}->Bucket mapping system.

Cheers,
-g


Re: Httpd 3.0 or something else

2009-11-09 Thread Akins, Brian
On 11/9/09 1:59 PM, "Graham Leggett"  wrote:


> Doesn't matter, once httpd proxy gets hold of it, it's just shifting
> static bits.

True.

> Something I want to teach httpd to do is buffer up data for output, and
> then forget about the output to focus on releasing the backend resources
> ASAP, ready for the next request when it (eventually) comes. The fact
> that network writes block makes this painful to achieve.

FWIW, nginx "buffers" backend stuff to a file, then sendfiles it out -  I
think this is what perlbal does as well.  Same can be done outside apache
using X-sendfile like methods.  Seems like we could move this "inside"
apache fairly easy.  May can do it with a filter.  I tried once and got it
to filter "most" backend stuff to a temp file, but it tended to miss and
block.  That was a while ago, but I haven't learned anymore about the
filters since then to think it would work any better.

Maybe a mod_buffer that goes to a file?

Also, all these temp files are normally in tmpfs for us.

-- 
Brian Akins



Re: Httpd 3.0 or something else

2009-11-09 Thread Graham Leggett
Akins, Brian wrote:

>>> It works really well for proxy.
>> Aka "static data" :)
> 
> Nah, we proxy to fastcgi php stuff, http java stuff, some horrid HTTP perl
> stuff, etc (Full disclosure, I wrote the horrid perl stuff.)

Doesn't matter, once httpd proxy gets hold of it, it's just shifting
static bits.

Something I want to teach httpd to do is buffer up data for output, and
then forget about the output to focus on releasing the backend resources
ASAP, ready for the next request when it (eventually) comes. The fact
that network writes block makes this painful to achieve.

Proxy had an optimisation that released proxied backend resources when
it detected EOS from the backend but before attempting to pass it to the
frontend, but someone refactored that away at some point. It would be
good if such an optimisation was available server wide.

I want to be able to write something to the filter stack, and get an
EWOULDBLOCK (or similar) back if it isn't ready. I could then make
intelligent decisions based on this. For example, if I were a cache, I
would carry on reading from the backend and writing the data to the
cache, while the frontend was saying "not now, slow browser ahead". I
could have long since finished caching and closed the backend connection
and freed the resources, before the frontend returned "cool, ready for
you now", at which point I answer "no worries, have the cached content I
prepared earlier".

Regards,
Graham
--


Re: Httpd 3.0 or something else

2009-11-09 Thread Akins, Brian
On 11/9/09 1:40 PM, "Brian Akins"  wrote:

> On 11/9/09 1:36 PM, "Graham Leggett"  wrote:
> 
>>> It works really well for proxy.
>> 
>> Aka "static data" :)
> 
> Nah, we proxy to fastcgi php stuff, http java stuff, some horrid HTTP perl
> stuff, etc (Full disclosure, I wrote the horrid perl stuff.)

Replying to my own post:

What we discussed some on list some at Apachecon, was having a really good
and simple process manager.  Mod_fcgid is too much work to configure for
mere mortals.  If we just had something like:

AssociateExternal .php /path/to/my/php-cgi

And it did the sensible thing (whether fcgi, http, wscgi, etc.) then all the
"config" is in one place.  Obviously, we could have some "advanced" process
management directives.

If your app needed some special config stuff, we could easily pass it across
somehow.

-- 
Brian Akins



Re: Httpd 3.0 or something else

2009-11-09 Thread Akins, Brian
On 11/9/09 1:36 PM, "Graham Leggett"  wrote:

>> It works really well for proxy.
> 
> Aka "static data" :)

Nah, we proxy to fastcgi php stuff, http java stuff, some horrid HTTP perl
stuff, etc (Full disclosure, I wrote the horrid perl stuff.)


-- 
Brian Akins



Re: Httpd 3.0 or something else

2009-11-09 Thread Graham Leggett
Akins, Brian wrote:

>> and we know
>> from the same period of experience from others that a pure event driven
>> model is useful for shipping static data and not much further.
> 
> It works really well for proxy.

Aka "static data" :)

The key advantage to doing both prefork and event behaviour in the same
server is that operationally, it is one beast to feed and care for. You
might deploy them differently in different environments, but it is one
set of skills to manage.

Regards,
Graham
--


Re: Httpd 3.0 or something else

2009-11-09 Thread Akins, Brian
On 11/9/09 1:18 PM, "Graham Leggett"  wrote:

> and we know
> from the same period of experience from others that a pure event driven
> model is useful for shipping static data and not much further.

It works really well for proxy.

-- 
Brian Akins



Re: Httpd 3.0 or something else

2009-11-09 Thread Graham Leggett
Akins, Brian wrote:

>> This gives us the option of prefork reliability, and event driven speed,
>> as required by the admin.
> 
> I think if we try to do both, we will wind up with the worst of both worlds.
> (Or is it worse??)  Blocking/buggy "modules" should be ran out of process
> (FactCGI/HTTP/Thrift).

That is exactly what prefork means - to run something out of process, so
that it can leak and crash at will.

I disagree we'll end up with the worst of both worlds. A lot of head
banging in the cache code has been caused because we are doing blocking
reads and blocking writes on the filter stacks.

When I say "be asynchronous" I mean use non-blocking reads and writes
everywhere, in both prefork, worker and event.

We know from 15+ years of experience that prefork works, and we know
from the same period of experience from others that a pure event driven
model is useful for shipping static data and not much further. But some
people have a need to just ship static data, and there is no reason why
httpd and an event MPM can't do that job well too.

Regards,
Graham
--


Re: Httpd 3.0 or something else

2009-11-09 Thread Akins, Brian
On 11/9/09 12:52 PM, "Graham Leggett"  wrote:

> This gives us the option of prefork reliability, and event driven speed,
> as required by the admin.

I think if we try to do both, we will wind up with the worst of both worlds.
(Or is it worse??)  Blocking/buggy "modules" should be ran out of process
(FactCGI/HTTP/Thrift).

-- 
Brian Akins



Re: Httpd 3.0 or something else

2009-11-09 Thread Graham Leggett
Akins, Brian wrote:

> FWIW, nginx delivers on its performance promises, but is a horrible hairball
> of code (my opinion).  We (httpd-dev type folks) could do much better - if
> we just would. (Easy for the guy with no time to say, I know...)

I think it is entirely reasonable for the httpd v3.0 codebase to do this
as a goal:

- Be asynchronous throughout; while
- Supporting prefork as httpd does now; and
- Allow variable levels of event-driven-ness in between.

This gives us the option of prefork reliability, and event driven speed,
as required by the admin.

Regards,
Graham
--


Re: Httpd 3.0 or something else

2009-11-09 Thread Akins, Brian
On 11/9/09 12:32 AM, "Brian McCallister"  wrote:

> A 3.0, a fundamental architectural shift, would be interesting to
> discuss, I am not sure there is a ton of value in it, though, to be
> honest.

So I should continue to investigate nginx? ;)

FWIW, nginx delivers on its performance promises, but is a horrible hairball
of code (my opinion).  We (httpd-dev type folks) could do much better - if
we just would. (Easy for the guy with no time to say, I know...)

 
-- 
Brian Akins



Re: Httpd 3.0 or something else

2009-11-08 Thread Brian McCallister
On Wed, Nov 4, 2009 at 10:26 AM, Akins, Brian  wrote:
> So, after several conversations at Apachecon and on the list, we still have
> no real "vision" of how we want to move ahead with httpd "3.0."  Or, if we
> do, it's not communicated very well.
>
> Some have suggested we just create a new server project. Others want to keep
> hacking away at the current code base.
>
> Thoughts?

I see no reason to call what we have been working on anything other than 2.4.

A 3.0, a fundamental architectural shift, would be interesting to
discuss, I am not sure there is a ton of value in it, though, to be
honest.


Re: Httpd 3.0 or something else

2009-11-08 Thread Mladen Turk

On 06/11/09 20:07, Jim Jagielski wrote:




I'd like we remove the entire forwarding proxy stuff
for example.

So we have mod_forward_proxy and mod_reverse_proxy? Interesting
take. Would make some sense to make mod_proxy and top-level "framework"
and forward/reverse as submodules.



I'd like that we clear the current dependency and API mess.
E.g common code depends on balancer (which was a dirty hack
I did so we can setup the workers)
This would obviously require some decent shared memory code
that would allow dynamic config instead stealing the space
from the scoreboard. Think that shared memory rewrite was
one of the major topics inside 'Amsterdam' discussion
few years back.


Regards
--
^TM



Re: Httpd 3.0 or something else

2009-11-05 Thread Greg Stein
On a phone, so pls excuse my brevity...

I think a lot of your discussion can be easily passed off to Apache Thrift.
Let it handle all the message passing to external procceses, and its
provided multi-language support.

On Nov 5, 2009 4:31 PM, "Graham Dumpleton" 
wrote:

2009/11/5 Graham Leggett :

> Jim Jagielski wrote: > >> Let's get 2.4 out. And then let's rip it to
shreds and drop >> buckets/b...
Sorry, long post but it was inevitable that I was going to air all
this at some point. Now seems a good as time as any.

I'd like to see a more radical architecture change, one that
recognises that it isn't just about serving static files any more and
provides much better builtin support for safe hosting of content
generating web applications constructed using alternate languages.

Before anyone jumps to the conclusion that I want to start seeing even
more heavy weight applications being run direct in the Apache server
child processes that accept initial requests, know that I don't want
that and that I actually want to promote a model which is the opposite
and which would encourage people not to do that.

As first step, like Jim I would like to see the current Apache server
child processes (workers) being asynchronous. In addition to that
though, I would like to see as part of core Apache, and running in
parent process, a means for spawning and monitoring of distinct
processes outside of the set of worker processes.

There is currently support in APR and in part in Apache for 'other'
processes via 'apr_proc_other_child_???()' functions, but this is
quite basic and you still need to a large degree need to roll your own
management routines around that for (re)spawning etc. As a result, you
see modules such as mod_cgid, mod_fastcgi, mod_fcgid, mod_wsgi all
having their own process management code for managing either their
daemon processes and/or manager process.

Technically one could implement this as a distinct module called
mod_procd which had an API which could be utilised by other modules
and stop duplication of all this stuff, but perhaps needs to go a step
further than that as far as being integrated into core. This is
because at present any 'other' processes are dealt with rather harshly
on graceful restarts because they are still simply killed off after a
few seconds if they don't shutdown. Being able to extend graceful
restart semantics into other processes may be worthwhile for some
applications.

The next thing want to see is for the whole FASTCGI type ecosystem be
revisited and for a better version of this concept for hosting web
applications in disparate languages be developed which modernises it
and brings it in as a core feature of Apache. The intent here being to
simplify the task for implementers as well as those wish to deploy
applications.

An important part of this would be to switch away from the interface
being a socket protocol. Instead, let the web server control both
halves of the communication channel between Apache worker process and
the application daemon process. What would replace the socket protocol
as interface would be C API and instead of the application having to
implement the socket protocol as foreign process, specific language
support would provided as a way of a dynamically loaded plugin. That
plugin would then use embedding to access support for a particular
language and just execute code in the file that the enclosing code of
the web server system told it to execute.

By way of example, imagine languages such as Python, Perl or Ruby
which in turn now have simplified web server interfaces in the form of
WSGI, PSGI and RACK, or even PHP. In the Apache configuration one
would simply say that a specific file extension is implemented by a
specific named language plugin. One would also indicate that a
separate manager process should be started up for managing processes
for handling any requests for that language.

Only after that separate manager process had been spawned be it by
just straight fork or preferably fork/exec would the specific language
plugin be loaded. This eliminates the problems caused by complex
language modules being preloaded into Apache parent process and
causing conflicts with other languages. The existing mod_php module is
a good example for causing lots of problems because of it dragging in
libraries which aren't multithread safe.

That manager process would then spawn its own language specific worker
processes as configured for handling actual requests. When the main
asynchronous Apache worker processes receive a request and determines
that target resource file is related to specific language, it
determines then how to connect to those language specific worker
processes and proxies the request to them for handling.

On the language worker process side the web server part of the code in
that process receives the proxied request and then calls into the
plugin code to have the request handle against the target file.

Because most language solutions for web

Re: Httpd 3.0 or something else

2009-11-05 Thread Akins, Brian
On 11/5/09 4:30 PM, "Graham Dumpleton"  wrote:

> Thoughts?

Still digesting, but generally +1 to the entire post.


-- 
Brian Akins



Re: Httpd 3.0 or something else

2009-11-05 Thread Graham Dumpleton
2009/11/5 Graham Leggett :
> Jim Jagielski wrote:
>
>> Let's get 2.4 out. And then let's rip it to shreds and drop
>> buckets/brigades and fold in serf.
>
> I think we should decide on exactly what problem we're trying to solve,
> before we start thinking about how it is to be solved.
>
> I'm keen to teach httpd v3.0 to work asynchronously throughout - still
> maintaining the prefork behaviour as a sensible default[1], but being
> asynchronous and non blocking throughout.
>
> [1] The fact that dodgy module code can leak, crash and be otherwise
> unsociable, and yet the server remains functional, is one of the key
> reasons why httpd still endures.

Sorry, long post but it was inevitable that I was going to air all
this at some point. Now seems a good as time as any.

I'd like to see a more radical architecture change, one that
recognises that it isn't just about serving static files any more and
provides much better builtin support for safe hosting of content
generating web applications constructed using alternate languages.

Before anyone jumps to the conclusion that I want to start seeing even
more heavy weight applications being run direct in the Apache server
child processes that accept initial requests, know that I don't want
that and that I actually want to promote a model which is the opposite
and which would encourage people not to do that.

As first step, like Jim I would like to see the current Apache server
child processes (workers) being asynchronous. In addition to that
though, I would like to see as part of core Apache, and running in
parent process, a means for spawning and monitoring of distinct
processes outside of the set of worker processes.

There is currently support in APR and in part in Apache for 'other'
processes via 'apr_proc_other_child_???()' functions, but this is
quite basic and you still need to a large degree need to roll your own
management routines around that for (re)spawning etc. As a result, you
see modules such as mod_cgid, mod_fastcgi, mod_fcgid, mod_wsgi all
having their own process management code for managing either their
daemon processes and/or manager process.

Technically one could implement this as a distinct module called
mod_procd which had an API which could be utilised by other modules
and stop duplication of all this stuff, but perhaps needs to go a step
further than that as far as being integrated into core. This is
because at present any 'other' processes are dealt with rather harshly
on graceful restarts because they are still simply killed off after a
few seconds if they don't shutdown. Being able to extend graceful
restart semantics into other processes may be worthwhile for some
applications.

The next thing want to see is for the whole FASTCGI type ecosystem be
revisited and for a better version of this concept for hosting web
applications in disparate languages be developed which modernises it
and brings it in as a core feature of Apache. The intent here being to
simplify the task for implementers as well as those wish to deploy
applications.

An important part of this would be to switch away from the interface
being a socket protocol. Instead, let the web server control both
halves of the communication channel between Apache worker process and
the application daemon process. What would replace the socket protocol
as interface would be C API and instead of the application having to
implement the socket protocol as foreign process, specific language
support would provided as a way of a dynamically loaded plugin. That
plugin would then use embedding to access support for a particular
language and just execute code in the file that the enclosing code of
the web server system told it to execute.

By way of example, imagine languages such as Python, Perl or Ruby
which in turn now have simplified web server interfaces in the form of
WSGI, PSGI and RACK, or even PHP. In the Apache configuration one
would simply say that a specific file extension is implemented by a
specific named language plugin. One would also indicate that a
separate manager process should be started up for managing processes
for handling any requests for that language.

Only after that separate manager process had been spawned be it by
just straight fork or preferably fork/exec would the specific language
plugin be loaded. This eliminates the problems caused by complex
language modules being preloaded into Apache parent process and
causing conflicts with other languages. The existing mod_php module is
a good example for causing lots of problems because of it dragging in
libraries which aren't multithread safe.

That manager process would then spawn its own language specific worker
processes as configured for handling actual requests. When the main
asynchronous Apache worker processes receive a request and determines
that target resource file is related to specific language, it
determines then how to connect to those language specific worker
processes and proxies the request to them for

Re: Httpd 3.0 or something else

2009-11-05 Thread Mladen Turk

On 05/11/09 12:38, Graham Leggett wrote:

Jim Jagielski wrote:


Let's get 2.4 out. And then let's rip it to shreds and drop
buckets/brigades and fold in serf.


I think we should decide on exactly what problem we're trying to solve,
before we start thinking about how it is to be solved.



+1

I'd like we remove the entire forwarding proxy stuff
for example. There are also few other things that simply
doesn't fit inside 'that web server thing' thought.
Others might simply have different ideas.

So IMHO we should define what we wanna do first.


Regards
--
^TM



Re: Httpd 3.0 or something else

2009-11-05 Thread Jie Gao
How about support of openmp?

Regards,



Jie


Re: Httpd 3.0 or something else

2009-11-05 Thread Bojan Smojver
On Thu, 2009-11-05 at 13:38 +0200, Graham Leggett wrote:
> I'm keen to teach httpd v3.0 to work asynchronously throughout - still
> maintaining the prefork behaviour as a sensible default[1], but being
> asynchronous and non blocking throughout.
> 
> [1] The fact that dodgy module code can leak, crash and be otherwise
> unsociable, and yet the server remains functional, is one of the key
> reasons why httpd still endures.

+1

That the concept is not outdated, we just need to look at Google's
Chrome.

-- 
Bojan



Re: Httpd 3.0 or something else

2009-11-04 Thread Jorge Schrauwen
I'm with Jim,

Head for 2.4 first.

IIRC there was some talk about moving to a 'd' project, since httpd
now does ftp (mod_ftp), echo, pop3,... and some other protocols.
I don't remember much from it though. I did like the idea back then
but thats about the only thing I remember from that.

Maybe we could also poll the user base? I know there was a restart on
the debate about the current conf and lua/perl/whatever not so long
ago so maybe these are all things to look into again for a 3.0?

Just my .2 cents

/me off to studying windows 2008 server -_-

~Jorge



On Wed, Nov 4, 2009 at 8:30 PM, Jim Jagielski  wrote:
> Let's get 2.4 out. And then let's rip it to shreds and drop
> buckets/brigades and fold in serf.
>
> On Nov 4, 2009, at 1:26 PM, Akins, Brian wrote:
>
>> So, after several conversations at Apachecon and on the list, we still
>> have
>> no real "vision" of how we want to move ahead with httpd "3.0."  Or, if we
>> do, it's not communicated very well.
>>
>> Some have suggested we just create a new server project. Others want to
>> keep
>> hacking away at the current code base.
>>
>> Thoughts?
>>
>> --
>> Brian Akins
>>
>
>


Re: Httpd 3.0 or something else

2009-11-04 Thread Jim Jagielski

Let's get 2.4 out. And then let's rip it to shreds and drop
buckets/brigades and fold in serf.

On Nov 4, 2009, at 1:26 PM, Akins, Brian wrote:

So, after several conversations at Apachecon and on the list, we  
still have
no real "vision" of how we want to move ahead with httpd "3.0."  Or,  
if we

do, it's not communicated very well.

Some have suggested we just create a new server project. Others want  
to keep

hacking away at the current code base.

Thoughts?

--
Brian Akins





Httpd 3.0 or something else

2009-11-04 Thread Akins, Brian
So, after several conversations at Apachecon and on the list, we still have
no real "vision" of how we want to move ahead with httpd "3.0."  Or, if we
do, it's not communicated very well.

Some have suggested we just create a new server project. Others want to keep
hacking away at the current code base.

Thoughts? 

-- 
Brian Akins