Re: Is async the answer?

2008-01-28 Thread Olaf van der Spek
On Jan 25, 2008 6:18 PM, Akins, Brian [EMAIL PROTECTED] wrote:
 On 1/24/08 3:14 PM, Olaf van der Spek [EMAIL PROTECTED] wrote:


  Working on making a FastCGI based setup the recommended approach
  instead of mod_php is probably more important then async. Actually,
  it's a prerequisite.

 Fastcgi is the recommended way of using php and httpd 2, AFAIK. Isn't it???

Where can I read about that recommendation?
I can't find it in the Apache or PHP manuals.
mod_php appears to be *the* solution.

  What about a hybrid approach?
  Async for network IO and other stuff that doesn't require sync calls,
  worker threads for other parts?

 That's kind of what I was thinking after Apachecon US this year.  I won't
 speak for others, but it seemed reasonable to most.  However, after doing
 several real world tests, I just don't honestly see that async will be a
 huge improvement.  Please prove me wrong with real world results.  I'd be
 more than happy to be wrong on this, really.

I don't have real world test results.
Have you tested the 30k scenario with an async web server?
And do all platforms have such cheap threading as your test platform?

 To be honest, I don't have strong feelings either way.  I was surprised by
 my results.  I, now, think that completely rewriting the core to be async
 *may be* a waste of resources. If it fits nicely into some ideas on
 reengineering buckets and brigades (ala serf stuff), and does not actually
 decrease overall performance, then by all means do it.

 Remember, I'm partially playing devil's advocate as well...

I noticed. ;)


Re: Is async the answer?

2008-01-28 Thread Olaf van der Spek
On Jan 28, 2008 8:04 PM, Eric Covener [EMAIL PROTECTED] wrote:
 On Jan 28, 2008 12:36 PM, Olaf van der Spek [EMAIL PROTECTED] wrote:
  On Jan 25, 2008 6:18 PM, Akins, Brian [EMAIL PROTECTED] wrote:
   On 1/24/08 3:14 PM, Olaf van der Spek [EMAIL PROTECTED] wrote:
  
  
Working on making a FastCGI based setup the recommended approach
instead of mod_php is probably more important then async. Actually,
it's a prerequisite.
  
   Fastcgi is the recommended way of using php and httpd 2, AFAIK. Isn't 
   it???
 
  Where can I read about that recommendation?
  I can't find it in the Apache or PHP manuals.
  mod_php appears to be *the* solution.

 http://www.php.net/manual/en/faq.installation.php#faq.installation.apache2

If you feel you have to use a threaded MPM, look at a FastCGI
configuration where PHP is running in its own memory space.

Is that what is meant by Fastcgi is the recommended way of using php
and httpd 2, AFAIK. Isn't it
A single line seems a bit odd for the recommended approach.


Re: Is async the answer?

2008-01-28 Thread Eric Covener
On Jan 28, 2008 12:36 PM, Olaf van der Spek [EMAIL PROTECTED] wrote:
 On Jan 25, 2008 6:18 PM, Akins, Brian [EMAIL PROTECTED] wrote:
  On 1/24/08 3:14 PM, Olaf van der Spek [EMAIL PROTECTED] wrote:
 
 
   Working on making a FastCGI based setup the recommended approach
   instead of mod_php is probably more important then async. Actually,
   it's a prerequisite.
 
  Fastcgi is the recommended way of using php and httpd 2, AFAIK. Isn't it???

 Where can I read about that recommendation?
 I can't find it in the Apache or PHP manuals.
 mod_php appears to be *the* solution.

http://www.php.net/manual/en/faq.installation.php#faq.installation.apache2

-- 
Eric Covener
[EMAIL PROTECTED]


Re: Is async the answer?

2008-01-28 Thread William A. Rowe, Jr.

Olaf van der Spek wrote:


I agree that FastCGI is the better technical solution, I'm just
stating that neither the Apache documentation nor the PHP
documentation seems to state that. Even worse, they hardly document
the FastCGI way at all.


FastCGI is a technically subpar way to execute trusted, valid PHP.
So is the handler method, the most efficient is the httpd 2 filter
method which should work fine since John and I spent a bunch of time
on it.  However, only a CGI sapi or FastCGI can compartmentalize your
untrusted PHP applications.

People have always been under some preconception that it's good to run
untrusted code in-process within httpd, while numerous vulnerability
reports in the past (and many to appear over the future) all bear out
that it's a really stupid idea.

FastCGI is also a so-so way to get around libraries which aren't thread-
safe, running worker or event mpm's.  Of course, using the 21st century
equivalents of those libraries probably isn't a bad solution either.

Bill


Re: Is async the answer?

2008-01-28 Thread Rich Bowen


On Jan 28, 2008, at 15:41, Akins, Brian wrote:


On 1/28/08 3:29 PM, Olaf van der Spek [EMAIL PROTECTED] wrote:


I agree that FastCGI is the better technical solution, I'm just
stating that neither the Apache documentation nor the PHP
documentation seems to state that. Even worse, they hardly document
the FastCGI way at all.


The only reason I know is because at Apachecon in Austin (?) the  
php and
httpd guys kissed and made up and said a bunch of stuff about  
fastcgi in a

presentation.


Unfortunately, neither he (John Coggeshall) nor I followed up by  
doing anything useful to the documentation of either product to  
reflect that. :-(



--
Speech is conveniently located midway between thought and action,  
where it often substitutes for both.

John Andrew Holmes






Re: Is async the answer?

2008-01-28 Thread Akins, Brian
On 1/28/08 3:29 PM, Olaf van der Spek [EMAIL PROTECTED] wrote:
 
 I agree that FastCGI is the better technical solution, I'm just
 stating that neither the Apache documentation nor the PHP
 documentation seems to state that. Even worse, they hardly document
 the FastCGI way at all.

The only reason I know is because at Apachecon in Austin (?) the php and
httpd guys kissed and made up and said a bunch of stuff about fastcgi in a
presentation.


-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies



Re: Is async the answer?

2008-01-28 Thread Olaf van der Spek
On Jan 28, 2008 9:22 PM, Jim Jagielski [EMAIL PROTECTED] wrote:
  http://www.php.net/manual/en/
  faq.installation.php#faq.installation.apache2
 
  If you feel you have to use a threaded MPM, look at a FastCGI
  configuration where PHP is running in its own memory space.
 
  Is that what is meant by Fastcgi is the recommended way of using php
  and httpd 2, AFAIK. Isn't it
  A single line seems a bit odd for the recommended approach.
 

 Consider that, for many people, the main advantage of Apache2 over
 Apache1 is the worker MPM. Also consider that a threaded MPM and
 mod_php aren't a happy couple. If using prefork, mod_php works
 just dandy... but for other reasons you'd likely want to
 consider FastCGI anyway...

I agree that FastCGI is the better technical solution, I'm just
stating that neither the Apache documentation nor the PHP
documentation seems to state that. Even worse, they hardly document
the FastCGI way at all.


Re: Is async the answer?

2008-01-28 Thread Jim Jagielski


On Jan 28, 2008, at 2:35 PM, Olaf van der Spek wrote:


On Jan 28, 2008 8:04 PM, Eric Covener [EMAIL PROTECTED] wrote:
On Jan 28, 2008 12:36 PM, Olaf van der Spek [EMAIL PROTECTED]  
wrote:
On Jan 25, 2008 6:18 PM, Akins, Brian [EMAIL PROTECTED]  
wrote:
On 1/24/08 3:14 PM, Olaf van der Spek [EMAIL PROTECTED]  
wrote:




Working on making a FastCGI based setup the recommended approach
instead of mod_php is probably more important then async.  
Actually,

it's a prerequisite.


Fastcgi is the recommended way of using php and httpd 2, AFAIK.  
Isn't it???


Where can I read about that recommendation?
I can't find it in the Apache or PHP manuals.
mod_php appears to be *the* solution.


http://www.php.net/manual/en/ 
faq.installation.php#faq.installation.apache2


If you feel you have to use a threaded MPM, look at a FastCGI
configuration where PHP is running in its own memory space.

Is that what is meant by Fastcgi is the recommended way of using php
and httpd 2, AFAIK. Isn't it
A single line seems a bit odd for the recommended approach.



Consider that, for many people, the main advantage of Apache2 over
Apache1 is the worker MPM. Also consider that a threaded MPM and
mod_php aren't a happy couple. If using prefork, mod_php works
just dandy... but for other reasons you'd likely want to
consider FastCGI anyway...


Re: Is async the answer?

2008-01-28 Thread Olaf van der Spek
On Jan 28, 2008 9:57 PM, William A. Rowe, Jr. [EMAIL PROTECTED] wrote:
 Olaf van der Spek wrote:
 
  I agree that FastCGI is the better technical solution, I'm just
  stating that neither the Apache documentation nor the PHP
  documentation seems to state that. Even worse, they hardly document
  the FastCGI way at all.

 FastCGI is a technically subpar way to execute trusted, valid PHP.

Why?
Isn't memory (and other resource) consumption a lot lower because you
don't need a PHP 'engine' for every thread/process?

Even valid PHP code can crash, given bugs in PHP itself.
And I think tons of users sometimes run untrusted or invalid PHP.

 People have always been under some preconception that it's good to run
 untrusted code in-process within httpd, while numerous vulnerability
 reports in the past (and many to appear over the future) all bear out
 that it's a really stupid idea.

Given that the alternatives (FastCGI) isn't well documented, I don't
think that's strange.

 FastCGI is also a so-so way to get around libraries which aren't thread-
 safe, running worker or event mpm's.  Of course, using the 21st century
 equivalents of those libraries probably isn't a bad solution either.

Olaf


Re: Is async the answer?

2008-01-25 Thread Akins, Brian
On 1/24/08 3:14 PM, Olaf van der Spek [EMAIL PROTECTED] wrote:


 Working on making a FastCGI based setup the recommended approach
 instead of mod_php is probably more important then async. Actually,
 it's a prerequisite.

Fastcgi is the recommended way of using php and httpd 2, AFAIK. Isn't it???

 Having 30k threads still seems like a waste of resource to me though.

Not if system is handling the load very well and needs 30k threads.  My
point was that 30k threads did not seem to be a waste of resources.  I
doubt an async server would have used a significantly lower amount of
resources because worker did not use a significant amount of resources.

 What about a hybrid approach?
 Async for network IO and other stuff that doesn't require sync calls,
 worker threads for other parts?

That's kind of what I was thinking after Apachecon US this year.  I won't
speak for others, but it seemed reasonable to most.  However, after doing
several real world tests, I just don't honestly see that async will be a
huge improvement.  Please prove me wrong with real world results.  I'd be
more than happy to be wrong on this, really.

To be honest, I don't have strong feelings either way.  I was surprised by
my results.  I, now, think that completely rewriting the core to be async
*may be* a waste of resources. If it fits nicely into some ideas on
reengineering buckets and brigades (ala serf stuff), and does not actually
decrease overall performance, then by all means do it.

Remember, I'm partially playing devil's advocate as well...

-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies



Re: Is async the answer?

2008-01-24 Thread Olaf van der Spek
 We were using normal worker MPM with keepalives for this test.  The current
 stable event would have helped with idle keepalive threads, but the system
 didn't seem to care.

But when using mod_php, worker is not recommended, right?
I doubt prefork scales as well as worker.
Working on making a FastCGI based setup the recommended approach
instead of mod_php is probably more important then async. Actually,
it's a prerequisite.

Having 30k threads still seems like a waste of resource to me though.
What about a hybrid approach?
Async for network IO and other stuff that doesn't require sync calls,
worker threads for other parts?

Olaf


Re: Is async the answer

2008-01-22 Thread Akins, Brian
On 1/20/08 10:44 AM, Graham Leggett [EMAIL PROTECTED] wrote:

 In terms of space, caches are not infinite in size, but then neither are
 the majority of backend websites either.

73GB is pretty big for a reverse proxy cache.  And fast SAS drives are
pretty cheap.  

 Sure, but I think the point that Brian was making was that you could
 support the kind of large load sizes that are traditionally associated
 with event based models using a prefork or worker setup, simply by
 making sure you have enough RAM.

And to stimulate some conversation.  I just don't want us to buy into the
async is better because that's the trend in servers nowadays.  If async
truly is better, then let's us it.  Just don't want to do it just because
everyone else is.

Also, this test included all sorts of clients (slow, fast, in between).  A
blocking thread didn't seem to hurt the server.  I'm guessing that 48k
blocking threads wouldn't hurt it too bad either.

Also, I'm going to look at the serf buckets when I get time.  Story of my
life, though, no time...

-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies



Re: Is async the answer

2008-01-22 Thread Akins, Brian
On 1/19/08 6:29 PM, Davi Arnaut [EMAIL PROTECTED] wrote:

 This is true for expensive hardware and very well designed operating
 systems and file systems.. and the space is not infinite.

It depends on your definition of expensive.  All of our servers are fairly
commodity.  The new linux fileserver I built at home is faster than most
of ours, and it cost me less that $1k.  It's all the management/redundancy
stuff that makes real servers so expensive.  I dual dual-core opteron with
8GB RAM is not all that exotic.


-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies



Re: Is async the answer

2008-01-20 Thread Graham Leggett

Davi Arnaut wrote:


This is true for expensive hardware and very well designed operating
systems and file systems.. and the space is not infinite.


Not at all - commodity hardware will serve just as well.

The real killer in this case is the slow client, which can be one, two 
or three orders of magnitude slower than the average client. This means 
that it will hog one, two or three orders of magnitude more of the 
server backend's resources than the average request, and this is where a 
cache can be most effective.


In terms of space, caches are not infinite in size, but then neither are 
the majority of backend websites either.



But... OK. Back to the topic I thought that one of the key points of
async/event based servers were that we use software to scale and not
hardware (so that hardware is not the bottleneck)... like serving
thousands of slow clients from commodity hardware.


Sure, but I think the point that Brian was making was that you could 
support the kind of large load sizes that are traditionally associated 
with event based models using a prefork or worker setup, simply by 
making sure you have enough RAM.


Very useful information to know.

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Is async the answer

2008-01-19 Thread Niklas Edmundsson

On Fri, 18 Jan 2008, Ruediger Pluem wrote:


The proxy that the LiveJournal folks wrote, I think, copies all the data
from the origin server into a file and then uses sendfile to send to the
client...


Erm, so does the one we wrote, mod_disk_cache ;p


IMHO it doesn't for the first request of the entity (the request that causes
the entity to be cached)


Which is why it doesn't scale with large files, and I hacked it to do 
that to be usable with DVD images on ftp.acc.umu.se 
(http://issues.apache.org/bugzilla/show_bug.cgi?id=39380 - you might 
remember the first try to merge some of it). Yes, it has its flaws, 
but it solves the problem for us. I think that some people has tried 
it in a proxy setting too with pretty OK result. But this was really 
off-topic ;)


Getting to the point, I share Brians concerns with going async just 
for the async sake, for similar reasons:


- People are having problems with making modules even thread safe
  (see mod_example), forcibly adding async to the mix will raise
  the bar even higher for people who needs to whip up a simple module.
- Callback semantics are messy when they go wrong, debugging can be a
  pain.
- Threads are rather cheap, even on linux since the advent of NPTL.
- Performance benefits are unclear.

Given that, there are obvious optimisations that can be, and have 
been, made. The ones in trunk aimed at not hogging a worker thread for 
simply writing the remaining data to the client for example. From what 
I've understood this class of changes doesn't really affect modules.


Also, if there is a way of adding async having it optional in modules 
then I see no problem with adding it as long as there are cases where 
it actually helps, other than adding it to the supported buzzwords 
list ;)


/Nikke - who probably ended up off topic after all ;)
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I'd love to, but I'm worried about my vertical hold.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: Is async the answer

2008-01-19 Thread Davi Arnaut
Akins, Brian wrote:
 On 1/18/08 3:07 PM, Colm MacCarthaigh [EMAIL PROTECTED] wrote:
 That's not even a consideration,
 async is really for dynamic content, proxies, and other non-sendfile
 content.
 
 For dynamic stuff, X-sendfile works well. (Just really starting to play
 with that, liking it so far).
 
 The proxy that the LiveJournal folks wrote, I think, copies all the data
 from the origin server into a file and then uses sendfile to send to the
 client...

Doesn't this limit the network bandwidth to the bandwidth of the disk
and/or file system?

--
Davi Arnaut


Re: Is async the answer?

2008-01-19 Thread Jim Jagielski


On Jan 18, 2008, at 12:03 PM, Akins, Brian wrote:

This is just some ramblings based on some observations, theories,  
and tests.

Partially devil's advocate as well.

Most of us seem to have convinced our self that high performance  
network
applications (including web servers) must be asynchronous in order  
to scale.

Is this still valid? For that matter, was it ever?



http://www.jimjag.com/imo/index.php?/archives/150-Long-time.html


Re: Is async the answer

2008-01-19 Thread Jim Jagielski


On Jan 18, 2008, at 2:16 PM, Justin Erenkrantz wrote:


On Jan 18, 2008 10:52 AM, Akins, Brian [EMAIL PROTECTED] wrote:
Which is why I hate to see a ton of work go into async core if it  
actually
does very little to help performance (or if it hurts it) and makes  
writing
modules harder.  It braindead simple nowadays to write well  
behaved high
performance modules (well, mostly) bcs you rarely worry about  
threads,

reads/writes, etc.  Full async programming is just as challenging as
handling a ton of threads yourself.


Speaking for myself, I think writing and using buckets with serf is
more straightforward than our complicated bucket brigade system with
mixed push/pull paradigms.



+1... Although the whole concept of buckets and their brigades
has some cool advantages, they are also a semi-constant source
of issues...


Re: Is async the answer

2008-01-19 Thread Graham Leggett

Davi Arnaut wrote:


The proxy that the LiveJournal folks wrote, I think, copies all the data
from the origin server into a file and then uses sendfile to send to the
client...


Doesn't this limit the network bandwidth to the bandwidth of the disk
and/or file system?


Yes, and the effective bandwidth of the disk can be significantly higher 
than both the cache backend (which is often expensive) and the network 
frontend (which has slow potential slow clients typing up your resources).


Don't forget that your cache disk is most often RAM backed, meaning 
effectively your cache disk is a ramdisk, with all the speed advantages 
that go with it.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Is async the answer

2008-01-19 Thread Henrik Nordström
fre 2008-01-18 klockan 16:17 -0500 skrev Akins, Brian:

 Paul Q and I have been kicking around the idea that even if we go to a
 completely async core, etc. that modules could mark some hooks as blocking
 and they would run basically how they do today. (One day, Paul, I'll
 actually think about this more...)

In the end you need a bit of mixture between the models to work out.
threads or even processes for complex processing or libraries outside
your control, and async for the basic core to keep it lightweight in
resources/request/connection.

There is quite a bit of research in programming models supporting mixed
async/threaded/tasklet scheduling without forcing the programmer to know
all details. Quite interesting reading if you haven't read those papers
yet.

For example the tame approach (C++ preprocessor using libasync) used by
OKWS and it's related cousin tamer (more lightweight library) is quite
fun to work with, at least in theory.


Regarding CPU performance then you need a more complex workload than
pure sendfile() shuffling of data to see much of a difference between
threaded or async models. Especially if you look at smaller requests
where the two almost diverge to the same model.. (N threads doing fast
successive batch processing one request at a time with no wait time, or
a event loop doing pretty much the same batching..).

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Is async the answer

2008-01-19 Thread Henrik Nordström

lör 2008-01-19 klockan 09:57 -0500 skrev Davi Arnaut:

 Doesn't this limit the network bandwidth to the bandwidth of the disk
 and/or file system?

Depends on the working set and your amount of memory.

If it's just temporary storage then no, as most data won't even hit the
disk.

If it's more of a cache then partially. Updates will use write bandwidth
to the disks, and not so frequently accessed objects will use read
bandwidth as well.

the filesystem bandwidth is pretty negiable these days. Very close to
raw I/O + memory cache.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: Is async the answer

2008-01-19 Thread Davi Arnaut
Graham Leggett wrote:
 Davi Arnaut wrote:
 
 The proxy that the LiveJournal folks wrote, I think, copies all the data
 from the origin server into a file and then uses sendfile to send to the
 client...
 Doesn't this limit the network bandwidth to the bandwidth of the disk
 and/or file system?
 
 Yes, and the effective bandwidth of the disk can be significantly higher 
 than both the cache backend (which is often expensive) and the network 
 frontend (which has slow potential slow clients typing up your resources).
 
 Don't forget that your cache disk is most often RAM backed, meaning 
 effectively your cache disk is a ramdisk, with all the speed advantages 
 that go with it.
 

This is true for expensive hardware and very well designed operating
systems and file systems.. and the space is not infinite.

But... OK. Back to the topic I thought that one of the key points of
async/event based servers were that we use software to scale and not
hardware (so that hardware is not the bottleneck)... like serving
thousands of slow clients from commodity hardware.

--
Davi Arnaut


Re: Is async the answer?

2008-01-18 Thread Colm MacCarthaigh
On Fri, Jan 18, 2008 at 12:03:02PM -0500, Akins, Brian wrote:
 Most of us seem to have convinced our self that high performance network
 applications (including web servers) must be asynchronous in order to scale.
 Is this still valid? For that matter, was it ever?

Hmmm, it depends what you mean by scale really. Async doesn't help a
daemon scale in terms of concurrency or throughput, if anything it might
even impede it, but it certainly can help improve latency and
responsivity greatly. On the whole, it's easy to see how it might make
the end user experience of a very busy server much more pleasant.

 It seems that modern OS's (this was Linux 2.6.something) deal with the
 thread overhead and all the context switches very well. All the stuff
 mentioned in the the c10k problem ( http://www.kegel.com/c10k.html) didn't
 seem to apply.  We could have easily doubled the amount of connections to
 the server, I think.

The c10k page has been hopelessly out of date for a long long time, I
wrote to Dan Kegel some time about (maybe 3 or 4 years) pointing this
out, but there's been no update :/

 Response time never increased in any measurable amount.

I suspect it might though if the scheduler became bound, async would
route the interupts more efficiently. 

 Yes, we are using sendfile, mmap, etc., so zero-copy helps us a lot.
 
 So, do we need apache 3 (or whatever it's called) to be fully asynchronous?
 Is that just us reacting to the market trends, ie, lighttpd?

Who knows, no harm in doing it anyway, if it's what interests people,
cool. Personally I find comparisons between webservers and most
discussions on scalability baffling, the reality is that modern hardware
can outscale pretty much any amount of bandwidth you can buy regardless
of the software. And to that end, the software is all near identical in
the pipelines of syscall's used (hell even IIS) - which is what really
matters. 

Most discussions seem to centre on some mindlessly ignorant comparison
based on the suitable of defaults to a particular set of circumstances
coupled with religion. The scalability wars should really be over,
everyone won - kernel's rule :-)

 All the apache httpd is bloated and slow is just plain horse crap.  It's
 not that hard to configure apache to be fast.  C programming is my
 hobby, and it's not that hard to write modules that don't do stupid things
 and kill the performance.

Yep!

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: Is async the answer

2008-01-18 Thread Justin Erenkrantz
On Jan 18, 2008 2:30 PM, Ruediger Pluem [EMAIL PROTECTED] wrote:
 IMHO it doesn't for the first request of the entity (the request that causes
 the entity to be cached)

I'd expect the predominance of large numbers would reduce the impact
of the one-time performance hit...but that conversion away from a fd
and into flat buffers, I feel, is more of a bucket brigade problem
than an intrinsic fault of mod_disk_cache.  -- justin


Re: Is async the answer

2008-01-18 Thread Ruediger Pluem


On 01/18/2008 10:29 PM, Colm MacCarthaigh wrote:
 On Fri, Jan 18, 2008 at 04:17:16PM -0500, Akins, Brian wrote:
 For dynamic stuff, X-sendfile works well. (Just really starting to play
 with that, liking it so far).
 
 It's not a solve-all though, I mean even though CGI's or whatever
 /could/ write their output to a file and then call X-sendfile, it'd be a
 disaster latency-wise. Ironically enough the only way to solve that is
 ... async ;-)
 
 The proxy that the LiveJournal folks wrote, I think, copies all the data
 from the origin server into a file and then uses sendfile to send to the
 client...
 
 Erm, so does the one we wrote, mod_disk_cache ;p

IMHO it doesn't for the first request of the entity (the request that causes
the entity to be cached)

Regards

Rüdiger




Re: Is async the answer

2008-01-18 Thread Colm MacCarthaigh
On Fri, Jan 18, 2008 at 04:17:16PM -0500, Akins, Brian wrote:
 For dynamic stuff, X-sendfile works well. (Just really starting to play
 with that, liking it so far).

It's not a solve-all though, I mean even though CGI's or whatever
/could/ write their output to a file and then call X-sendfile, it'd be a
disaster latency-wise. Ironically enough the only way to solve that is
... async ;-)

 The proxy that the LiveJournal folks wrote, I think, copies all the data
 from the origin server into a file and then uses sendfile to send to the
 client...

Erm, so does the one we wrote, mod_disk_cache ;p

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: Is async the answer

2008-01-18 Thread Akins, Brian
On 1/18/08 2:16 PM, Justin Erenkrantz [EMAIL PROTECTED] wrote:

 Speaking for myself, I think writing and using buckets with serf is
 more straightforward than our complicated bucket brigade system with
 mixed push/pull paradigms.

It very well may be.

Async may be easy.  Except when my db connection blocks.. On stat calls..
Etc.

I am by no means defending the buckets!  Or anything for that matter... Just
some observations.  I just no longer buy into the idea that async is somehow
inherently superior.  It sounds good in theory, but in the real world I am
just not seeing it.

The whole reason I brought this up was to stimulate discussion.  I really
really would hate for us to spend many months porting everything over to
async to discover that it made no positive impact on performance. Worse, it
made extending httpd (or D) much harder.



-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies



Re: Is async the answer

2008-01-18 Thread Akins, Brian
On 1/18/08 3:07 PM, Colm MacCarthaigh [EMAIL PROTECTED] wrote:
 That's not even a consideration,
 async is really for dynamic content, proxies, and other non-sendfile
 content.

For dynamic stuff, X-sendfile works well. (Just really starting to play
with that, liking it so far).

The proxy that the LiveJournal folks wrote, I think, copies all the data
from the origin server into a file and then uses sendfile to send to the
client...

Also, we have driven apache as a proxy as far as we have squid...

Paul Q and I have been kicking around the idea that even if we go to a
completely async core, etc. that modules could mark some hooks as blocking
and they would run basically how they do today. (One day, Paul, I'll
actually think about this more...)

Having a request tied to one thread for its lifetime does make some things
easier.  If the underlying IO is asynchronous and its faster/scalable/fun,
then, all the better.  I just am not a big fan of the callback method that
squid uses (or used last time I looked at it).  Yes, its doable, but just
seems not quite right to me. That's just my opinion.  I'd like to be able
to say, hey httpd, write this stuff to the client and it just happen
wonderfully fast :)  Currently, worker is doing a great job for us.  Maybe
async would be fine as well, especially if the serf buckets are as easy to
use as Justin says.  I just don't want us to say we must be async with no
real reason other than we must.


-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies



Re: Is async the answer

2008-01-18 Thread Colm MacCarthaigh
On Fri, Jan 18, 2008 at 02:31:11PM -0500, Akins, Brian wrote:
 On 1/18/08 2:20 PM, Colm MacCarthaigh [EMAIL PROTECTED] wrote:
  
  I think so, in some environments anyway. If you have a server tuned for
  high throughput accross large bandwidth-delay product links then you
  have the general problem of equal-priority threads sitting around with
  quite a lot of large impending writes.
 
 Doesn't sendfile (and others) help in that case?  Also RAM is cheap,
 bandwidth isn't :)

Oh if you can use sendfile, you use it sure, and whether its used async
or not isn't going to make a big deal, all of the benefits are the zero
copy, the DMA, the TOE, and so on. That's not even a consideration,
async is really for dynamic content, proxies, and other non-sendfile
content.

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: Is async the answer

2008-01-18 Thread Akins, Brian
On 1/18/08 2:20 PM, Colm MacCarthaigh [EMAIL PROTECTED] wrote:
 
 I think so, in some environments anyway. If you have a server tuned for
 high throughput accross large bandwidth-delay product links then you
 have the general problem of equal-priority threads sitting around with
 quite a lot of large impending writes.

Doesn't sendfile (and others) help in that case?  Also RAM is cheap,
bandwidth isn't :)



-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies



Re: Is async the answer

2008-01-18 Thread Colm MacCarthaigh
On Fri, Jan 18, 2008 at 01:52:02PM -0500, Akins, Brian wrote:
 On 1/18/08 12:18 PM, Colm MacCarthaigh [EMAIL PROTECTED] wrote:
  Hmmm, it depends what you mean by scale really. Async doesn't help a
  daemon scale in terms of concurrency or throughput, if anything it might
  even impede it, but it certainly can help improve latency and
  responsivity greatly. On the whole, it's easy to see how it might make
  the end user experience of a very busy server much more pleasant.
 
 I also wonder is that has actually been tested or if it's just a factoid?

I've tested, and it met my expectations on Linux 2.6 on Itanium, but I
can't guarantee that the experiments were free from my own bias I guess. 

  Response time never increased in any measurable amount.
  
  I suspect it might though if the scheduler became bound, async would
  route the interupts more efficiently.
 
 But, I wonder if the scheduler would become bound in a reasonable amount
 of traffic.

I think so, in some environments anyway. If you have a server tuned for
high throughput accross large bandwidth-delay product links then you
have the general problem of equal-priority threads sitting around with
quite a lot of large impending writes. Having them all in the polling
loop is inefficient, and async is going to reduce the latency a little,
though granted these days we may be talking about nanoseconds. And
I guess responsivity and high BDP don't go together anyway, due to the
speed of light.

  The scalability wars should really be over,
  everyone won - kernel's rule :-)
 
 Which is why I hate to see a ton of work go into async core if it actually
 does very little to help performance (or if it hurts it) and makes writing
 modules harder.  It braindead simple nowadays to write well behaved high
 performance modules (well, mostly) bcs you rarely worry about threads,
 reads/writes, etc.  Full async programming is just as challenging as
 handling a ton of threads yourself.

I think if it interests people and they want to work on it, cool stuff,
but don't neccessarily expect any actual pay-off in terms of
performance. One of the great things about an open source project is
that sometimes what gets worked on isn't driven by considerations other
than what people feel like working on. 

I'd be less worried about the effect on modules, many module authors 
already can't be bothered to make their modules thread-safe, but 
prefork still exists (and scales quite well, on many platforms).

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: Is async the answer

2008-01-18 Thread Justin Erenkrantz
On Jan 18, 2008 10:52 AM, Akins, Brian [EMAIL PROTECTED] wrote:
 Which is why I hate to see a ton of work go into async core if it actually
 does very little to help performance (or if it hurts it) and makes writing
 modules harder.  It braindead simple nowadays to write well behaved high
 performance modules (well, mostly) bcs you rarely worry about threads,
 reads/writes, etc.  Full async programming is just as challenging as
 handling a ton of threads yourself.

Speaking for myself, I think writing and using buckets with serf is
more straightforward than our complicated bucket brigade system with
mixed push/pull paradigms.

YMMV.  -- justin


Re: Is async the answer

2008-01-18 Thread Akins, Brian
On 1/18/08 12:18 PM, Colm MacCarthaigh [EMAIL PROTECTED] wrote:


 Hmmm, it depends what you mean by scale really. Async doesn't help a
 daemon scale in terms of concurrency or throughput, if anything it might
 even impede it, but it certainly can help improve latency and
 responsivity greatly. On the whole, it's easy to see how it might make
 the end user experience of a very busy server much more pleasant.


I also wonder is that has actually been tested or if it's just a factoid?


 Response time never increased in any measurable amount.
 
 I suspect it might though if the scheduler became bound, async would
 route the interupts more efficiently.


But, I wonder if the scheduler would become bound in a reasonable amount
of traffic.


 discussions on scalability baffling, the reality is that modern hardware
 can outscale pretty much any amount of bandwidth you can buy regardless
 of the software. 

Bandwidth generally isn't an issue for us anymore (thanks to gzip).  We can
still overrun the CPU with small objects requests/responses.  On large
objects (ie, over 16k or so), the CPU is bored when multiple gig interfaces
are full.



 The scalability wars should really be over,
 everyone won - kernel's rule :-)

Which is why I hate to see a ton of work go into async core if it actually
does very little to help performance (or if it hurts it) and makes writing
modules harder.  It braindead simple nowadays to write well behaved high
performance modules (well, mostly) bcs you rarely worry about threads,
reads/writes, etc.  Full async programming is just as challenging as
handling a ton of threads yourself.


My $.02 US worth (which ain't much).


-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies

-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies