Re: The Case for a Universal Web Server Load Value

2012-11-15 Thread Tim Bannister
On 15 Nov 2012, at 07:01, Issac Goldstand wrote:
 On 15/11/2012 00:48, Tim Bannister wrote:
 On 14 Nov 2012, at 22:19, Ask Bjørn Hansen wrote:
 The backend should/can know if it can take more requests.  When it can't it 
 shouldn't and the load balancer shouldn't pass that back to the end-user 
 but rather just find another available server or hold on to the request 
 until one becomes available (or some timeout value is met if things are 
 that bad).
 
 This only makes sense for idempotent requests. What about a POST or PUT?
 
 What's the problem?  LB will get the request, send OPTIONS * to the backends 
 to find an available one and only then push the POST/PUT back to it...

Sorry; I was trying to be brief but that meant skipping some details.

We have to assume that at some point we have uneven loading and that there is a 
backend with spare capacity (otherwise, yeah, no loadbalancer will help). A 
backend that started off responsive may slow down due to load but still be able 
to keep the TCP connection alive. With GET, we can just chuck requests at the 
backends and only decide what to do when a request goes bad or the response is 
late. GET's idempotency means we can retry the same request with a different 
backend. This strategy doesn't work with POST etc.

Uneven load could arise through imperfect balancing by a reverse proxy, or it 
could be exogenous – maybe one of the backends has fired off an expensive 
scheduled task?


PS. If we are doing load skewing or otherwise managing the number of active 
backends, we definitely want a way to learn the load on each backend. A bit of 
standardisation would be nice here (de facto or otherwise). Apache httpd is a 
good place to start off, because of its market share, even if this goes beyond 
the scope of httpd itself.

-- 
Tim Bannister – is...@jellybaby.net



smime.p7s
Description: S/MIME cryptographic signature


Re: The Case for a Universal Web Server Load Value

2012-11-15 Thread Jim Jagielski

On Nov 14, 2012, at 5:19 PM, Ask Bjørn Hansen a...@develooper.com wrote:

 
 On Nov 14, 2012, at 11:01, Tim Bannister is...@jellybaby.net wrote:
 
 I really like how Perlbal does it:
 
 It opens a connection when it thinks it needs more and issues a (by 
 default, it's configurable) OPTIONS * request and only after getting a 
 successful response to the test will it send real requests on that 
 connection (and then it will keep the connection open with Keep-Alive for 
 further requests).
 
 X-Server-Load: would still be an improvement, eg with this response to 
 OPTIONS:
 HTTP/1.1 200 OK
 Date: Wed, 14 Nov 2012 19:00:00 GMT
 Server: Apache/2.5.x
 X-Server-Load: 0.999
 
 …the balancer might decide to use a backend that is reporting a lower load.
 
 I know I am fighting the tide here, but it's really the wrong smarts to put 
 in the load balancer.
 
 The backend should/can know if it can take more requests.  When it can't it 
 shouldn't and the load balancer shouldn't pass that back to the end-user but 
 rather just find another available server or hold on to the request until one 
 becomes available (or some timeout value is met if things are that bad).
 

Without a doubt, I agree that the load info should not be passed
back to the end user (I state as much in the blog).



Re: The Case for a Universal Web Server Load Value

2012-11-14 Thread Ask Bjørn Hansen
I really like how Perlbal does it:

It opens a connection when it thinks it needs more and issues a (by default, 
it's configurable) OPTIONS * request and only after getting a successful 
response to the test will it send real requests on that connection (and then it 
will keep the connection open with Keep-Alive for further requests).


Ask

Re: The Case for a Universal Web Server Load Value

2012-11-14 Thread Tim Bannister
On 14 Nov 2012, at 18:49, Ask Bjørn Hansen wrote:

 I really like how Perlbal does it:
 
 It opens a connection when it thinks it needs more and issues a (by default, 
 it's configurable) OPTIONS * request and only after getting a successful 
 response to the test will it send real requests on that connection (and then 
 it will keep the connection open with Keep-Alive for further requests).

X-Server-Load: would still be an improvement, eg with this response to OPTIONS:
HTTP/1.1 200 OK
Date: Wed, 14 Nov 2012 19:00:00 GMT
Server: Apache/2.5.x
X-Server-Load: 0.999

…the balancer might decide to use a backend that is reporting a lower load.

-- 
Tim Bannister – is...@jellybaby.net



smime.p7s
Description: S/MIME cryptographic signature


Re: The Case for a Universal Web Server Load Value

2012-11-14 Thread Ask Bjørn Hansen

On Nov 14, 2012, at 11:01, Tim Bannister is...@jellybaby.net wrote:

 I really like how Perlbal does it:
 
 It opens a connection when it thinks it needs more and issues a (by default, 
 it's configurable) OPTIONS * request and only after getting a successful 
 response to the test will it send real requests on that connection (and then 
 it will keep the connection open with Keep-Alive for further requests).
 
 X-Server-Load: would still be an improvement, eg with this response to 
 OPTIONS:
 HTTP/1.1 200 OK
 Date: Wed, 14 Nov 2012 19:00:00 GMT
 Server: Apache/2.5.x
 X-Server-Load: 0.999
 
 …the balancer might decide to use a backend that is reporting a lower load.

I know I am fighting the tide here, but it's really the wrong smarts to put in 
the load balancer.

The backend should/can know if it can take more requests.  When it can't it 
shouldn't and the load balancer shouldn't pass that back to the end-user but 
rather just find another available server or hold on to the request until one 
becomes available (or some timeout value is met if things are that bad).

With the Perlbal-model the backend can control how much work it will take on 
and the load balancer will never send traffic to an overloaded or hung server, 
so the users will always get to the first truly available backend.

The load balancer smarts should be in managing these let's see if you are 
ready requests and pending connections.


Ask

-- 
Ask Bjørn Hansen, http://askask.com/





Re: The Case for a Universal Web Server Load Value

2012-11-14 Thread Tim Bannister
On 14 Nov 2012, at 22:19, Ask Bjørn Hansen wrote:

 I know I am fighting the tide here, but it's really the wrong smarts to put 
 in the load balancer.
 
 The backend should/can know if it can take more requests.  When it can't it 
 shouldn't and the load balancer shouldn't pass that back to the end-user but 
 rather just find another available server or hold on to the request until one 
 becomes available (or some timeout value is met if things are that bad).

This only makes sense for idempotent requests. What about a POST or PUT?


For a plausible example that mixes POST and GET: a cluster of N webservers 
providing SPARQL HTTP access to a triplestore. Most queries will use GET but 
some might use POST, either because they are too long for GET or because the 
query is an update.

The reverse proxy / balancer manager might want to:
 • balance query workload across the active set of webservers
 • spin up an extra backend as required by load
 • skew load onto the minimum number of webservers (and suspend any spares)

SPARQL is an example of a varying workload where none of httpd's existing 
lbmethods is perfect. One complex query can punish a backend whilst its peers 
are idle handling multiple concurrent requests. SPARQL sometimes means POST 
requests; a subset of these are safely repeatable but determining which ones is 
too complex for any HTTP proxy.

-- 
Tim Bannister – is...@jellybaby.net



smime.p7s
Description: S/MIME cryptographic signature


Re: The Case for a Universal Web Server Load Value

2012-11-14 Thread Graham Leggett
On 15 Nov 2012, at 12:48 AM, Tim Bannister is...@jellybaby.net wrote:

 This only makes sense for idempotent requests. What about a POST or PUT?
 
 
 For a plausible example that mixes POST and GET: a cluster of N webservers 
 providing SPARQL HTTP access to a triplestore. Most queries will use GET but 
 some might use POST, either because they are too long for GET or because the 
 query is an update.
 
 The reverse proxy / balancer manager might want to:
 • balance query workload across the active set of webservers
 • spin up an extra backend as required by load
 • skew load onto the minimum number of webservers (and suspend any spares)
 
 SPARQL is an example of a varying workload where none of httpd's existing 
 lbmethods is perfect. One complex query can punish a backend whilst its peers 
 are idle handling multiple concurrent requests. SPARQL sometimes means POST 
 requests; a subset of these are safely repeatable but determining which ones 
 is too complex for any HTTP proxy.

There is no reason why a load balancer can't take into account existing 
requests in addition to new requests when making load balancing decisions. When 
there are a number of connections to a backend that are in flight but taking a 
while to complete, this is a sign the backend may be busy and should be avoided.

That said if you have pathologically expensive requests coming into your 
backends, no load balancer is going to help you.

Regards,
Graham
--



smime.p7s
Description: S/MIME cryptographic signature


Re: The Case for a Universal Web Server Load Value

2012-11-14 Thread Issac Goldstand

On 15/11/2012 00:48, Tim Bannister wrote:

On 14 Nov 2012, at 22:19, Ask Bjørn Hansen wrote:


I know I am fighting the tide here, but it's really the wrong smarts to put in 
the load balancer.

The backend should/can know if it can take more requests.  When it can't it 
shouldn't and the load balancer shouldn't pass that back to the end-user but 
rather just find another available server or hold on to the request until one 
becomes available (or some timeout value is met if things are that bad).


This only makes sense for idempotent requests. What about a POST or PUT?



What's the problem?  LB will get the request, send OPTIONS * to the 
backends to find an available one and only then push the POST/PUT back 
to it...


  Issac


Re: The Case for a Universal Web Server Load Value

2012-11-13 Thread Tim Bannister
On 12 Nov 2012, at 15:04, Jim Jagielski wrote:

 Booting the discussion:
 
   
 http://www.jimjag.com/imo/index.php?/archives/248-The-Case-for-a-Universal-Web-Server-Load-Value.html


There's bound to be more than one way to do it :-)

I'm afraid I don't favour providing status data in every response. Doing it 
that way means that the reverse proxy has to filter something out and it isn't 
really clean HTTP. Would a strict implementation need to throw in a Vary: * as 
well?


Instead, I would rather have load information provided via something broadly 
RESTful. httpd already has server-status and a machine readable variant, but 
there's room to improve it. I'd start with offering status via JSON and / or 
XML. I'd prefer XML because of the designed-in extensibility.

With this approach, peers that want frequent server-status updates can request 
this status as often as they like, and can use the usual HTTP performance 
tweaks such as keepalive. A load-balancing reverse proxy can read this 
information, or a separate tool can track it and update the load balancer's 
weightings.



As for how to express load? How about a float where 0.0 represents idle and 1.0 
represents running flat out. A trivial implementation for Unix would take the 
load average and divide it by the number of CPUs.




I would keep all of this separate from whether or not the backend has outright 
failed. Perlbal, and maybe some other software, will check an HTTP connection 
via an initial “OPTIONS *”, and will of course remember when a connection goes 
bad either via a TCP close or a 5xx response.


-- 
Tim Bannister – is...@jellybaby.net



smime.p7s
Description: S/MIME cryptographic signature


Re: The Case for a Universal Web Server Load Value

2012-11-13 Thread Nick Kew
A protocol for backends to communicate load to balancers in real time
has appeal.  You could hack it in HTTP or similar with
X-Server-Load: 0.1234
Perhaps a series of numbers representing different moving averages, etc.

As to what that represents, that must surely depend on the bottlenecks
in a particular system.  A backend doing heavy number-crunching and
one doing lots of complex SQL queries have different loads, and a
good load measure for one may be meaningless if applied to the other.
How would a 'universal' measure reflect that kind of difference?

Where I think you could usefully focus is on standardising a protocol
for backends to communicate loads to balancers.  That then becomes
something we can implement in an lb method module in HTTPD.
But it has to be left to individual backend systems exactly how they
measure their own loads.

-- 
Nick Kew


Re: The Case for a Universal Web Server Load Value

2012-11-13 Thread Graham Leggett
On 12 Nov 2012, at 5:04 PM, Jim Jagielski j...@jagunet.com wrote:

   
 http://www.jimjag.com/imo/index.php?/archives/248-The-Case-for-a-Universal-Web-Server-Load-Value.html

+1 to the idea of a header, it is simple and unobtrusive, and doesn't give you 
any security headaches that any out-of-band header would give you.

As to the format of the header, perhaps a application/x-www-form-urlencoded 
string of some kind? It allows us to be extensible if we need to be. For 
example:

X-Server-Load: av1=5.76av5=0.44av15=0.10

or

X-Server-Load: av1=5.76av5=0.44av15=0.10going-offline-in=22

(You get the idea)

If a load balancer wants to query a server that might be offline, it might send 
an OPTIONS request and if the X-Server-Load permits, the load balancer might 
ramp that server back up again.

Regards,
Graham
--



smime.p7s
Description: S/MIME cryptographic signature


Re: The Case for a Universal Web Server Load Value

2012-11-13 Thread Nick Kew

On 13 Nov 2012, at 11:17, Graham Leggett wrote:

 As to the format of the header,

If Jim's thinking Universal, then the forum for discussion at
that level of detail isn't going to be this list!  Head over to IETF 

-- 
Nick Kew


Re: The Case for a Universal Web Server Load Value

2012-11-13 Thread Graham Leggett
On 13 Nov 2012, at 1:20 PM, Nick Kew n...@webthing.com wrote:

 As to the format of the header,
 
 If Jim's thinking Universal, then the forum for discussion at
 that level of detail isn't going to be this list!  Head over to IETF ….

I would love that such a thing ended up at the IETF, but it needs to start 
somewhere.

Regards,
Graham
--



smime.p7s
Description: S/MIME cryptographic signature


Re: The Case for a Universal Web Server Load Value

2012-11-13 Thread Jim Jagielski
That's the idea...

On Nov 13, 2012, at 7:05 AM, Graham Leggett minf...@sharp.fm wrote:

 On 13 Nov 2012, at 1:20 PM, Nick Kew n...@webthing.com wrote:
 
 As to the format of the header,
 
 If Jim's thinking Universal, then the forum for discussion at
 that level of detail isn't going to be this list!  Head over to IETF ….
 
 I would love that such a thing ended up at the IETF, but it needs to start 
 somewhere.
 
 Regards,
 Graham
 --
 



Re: The Case for a Universal Web Server Load Value

2012-11-13 Thread Jim Jagielski

On Nov 13, 2012, at 5:58 AM, Nick Kew n...@webthing.com wrote:

 As to what that represents, that must surely depend on the bottlenecks
 in a particular system.  A backend doing heavy number-crunching and
 one doing lots of complex SQL queries have different loads, and a
 good load measure for one may be meaningless if applied to the other.
 How would a 'universal' measure reflect that kind of difference?
 
 Where I think you could usefully focus is on standardising a protocol
 for backends to communicate loads to balancers.  That then becomes
 something we can implement in an lb method module in HTTPD.
 But it has to be left to individual backend systems exactly how they
 measure their own loads.

Yeah, and that can be tricky because it opens it up to
becoming a marketing tool, rather than a useful load balancing
tool. That's why I like a simple 0.0-1.0 scale at a minimum.

Also, let's not forget that, at least which httpd, we also have
load-factors and so we can weigh even what's returned by the
backend. So even if they skew the results, we can always
adjust for the real world.


The Case for a Universal Web Server Load Value

2012-11-12 Thread Jim Jagielski
Booting the discussion:

   
http://www.jimjag.com/imo/index.php?/archives/248-The-Case-for-a-Universal-Web-Server-Load-Value.html



Re: The Case for a Universal Web Server Load Value

2012-11-12 Thread Graham Dumpleton
You say:

I have traditional Unix-type load-average and the percentage of how
idle and busy the web-server is. But is that enough info? Or is that
too much? How much data should the front-end want or need? Maybe a single
agreed-upon value (ala load average) is best... maybe not. These are the
kinds of questions to answer.

How is 'idle' and 'busy' measure being calculated?

Now to deviate a bit in to related topic .

One of the concerns I have had when looking over how MPMs work of late is
that the measure of how many threads are busy used to determine whether
processes should be created or destroyed is a spot measure. At least that
is how I interpret the code and I could well be wrong, so please correct me
if I am :-)

That is, how many threads are in use only at the time the maintenance cycle
is run is taken into consideration.

In the Python world where one cannot preload in the Apache parent the
Python interpreter or your application for various reasons, and need to
defer it to child worker processes, recycling processes can be an expensive
exercise as everything is done in the child after the fork.

What worries me is that the current MPM calculation with using a spot
measure isn't really a true indication of how much the server is being
utilised over time. Imagine the worst case where you were under load and
had a large number of concurrent requests and a commensurate number of
processes, but a substantial number finished just before the maintenance
cycle ran. The spot measure could use a quite low number which doesn't
truly reflect the request load on the server in the period just before
that, and what may therefore come after.

As a result of a low number for a specific maintenance cycle, it could
think it had more idle threads than needed and kill off one process. On
next cycle one second later the maintenance cycle may hit again when high
number of concurrent request and think it has to create a process again.

Another case is where you had a momentary network issue and so requests
were getting through and so for a short period the busy measure was low and
number of processes progressively get killed off at a rate of one a second.

Using a spot measure rather than looking at business over an extended
window of time, especially when killing processes, could cause process
recycling when not warranted or when it would be better that it simply
didn't do it.

The potential for this is in part avoided by what the min/max idle threads
is set to. That is, it effectively smooths out small fluctuations, but
because the busy measure is a spot metric, am still concerned that the
randomness of when requests run means that the spot metric could still jump
around quite a lot between maintenance cycles to the extent that could
exceed min/max levels and so kill off processes.

Now for a Python site where recycling processes is expensive, the solution
is to reconfigure the MPM settings to start more servers at the outset and
allow a lot more idle capacity. But we know how many people actually bother
to tune these settings properly.

Anyway, that has had me wondering, and why I ask how you are calculating
'idle' and 'busy', whether such busy measures should not perhaps be done
differently so that it can look back in time at prior traffic during the
period since last maintenance cycle or even beyond that.

One way of doing this is looking at one I call thread utilisation or what
some also refer to as instance busy.

At this point it is going to be easier for me to refer to:

http://blog.newrelic.com/2012/09/11/introducing-capacity-analysis-for-python/

which has some nice pictures and description to help explain this thread
utilisation measure.

The thread utilisation over time since last maintenance cycle could
therefore be used, perhaps weighted in some way with current spot busy
value and also prior time periods to better smooth the value being used in
the decision.

I am guessing perhaps that some systems do have something more elaborate
than the simplistic mechanism that MPM appears to use by my reading of the
code. So what for example does mod_fcgid do?

Even using thread utilisation, one thing it cannot capture is queueing
time. That is, how long was a request sitting in the listener queue waiting
to be accepted.

Unfortunately I don't know of anyway to calculate this direct from the
operating system and so it generally relies on some front end sticking in a
header with a time stamp and looking at the elapsed time when it hits the
backend server. If the machines are on different servers though then you
got issues of clock skew to deal with.

Anyway, sorry for the long ramble.

I guess am just curious how busy is being calculated. Are there better ways
of calculating what busy is which are more accurate? Or does it mostly not
matter because when you start to reach higher levels of utilisations the
spot metric will tend towards becoming more reflective of actual
utilisation. Can additional measures, if