Re: The Case for a Universal Web Server Load Value
On 15 Nov 2012, at 07:01, Issac Goldstand wrote: On 15/11/2012 00:48, Tim Bannister wrote: On 14 Nov 2012, at 22:19, Ask Bjørn Hansen wrote: The backend should/can know if it can take more requests. When it can't it shouldn't and the load balancer shouldn't pass that back to the end-user but rather just find another available server or hold on to the request until one becomes available (or some timeout value is met if things are that bad). This only makes sense for idempotent requests. What about a POST or PUT? What's the problem? LB will get the request, send OPTIONS * to the backends to find an available one and only then push the POST/PUT back to it... Sorry; I was trying to be brief but that meant skipping some details. We have to assume that at some point we have uneven loading and that there is a backend with spare capacity (otherwise, yeah, no loadbalancer will help). A backend that started off responsive may slow down due to load but still be able to keep the TCP connection alive. With GET, we can just chuck requests at the backends and only decide what to do when a request goes bad or the response is late. GET's idempotency means we can retry the same request with a different backend. This strategy doesn't work with POST etc. Uneven load could arise through imperfect balancing by a reverse proxy, or it could be exogenous – maybe one of the backends has fired off an expensive scheduled task? PS. If we are doing load skewing or otherwise managing the number of active backends, we definitely want a way to learn the load on each backend. A bit of standardisation would be nice here (de facto or otherwise). Apache httpd is a good place to start off, because of its market share, even if this goes beyond the scope of httpd itself. -- Tim Bannister – is...@jellybaby.net smime.p7s Description: S/MIME cryptographic signature
Re: The Case for a Universal Web Server Load Value
On Nov 14, 2012, at 5:19 PM, Ask Bjørn Hansen a...@develooper.com wrote: On Nov 14, 2012, at 11:01, Tim Bannister is...@jellybaby.net wrote: I really like how Perlbal does it: It opens a connection when it thinks it needs more and issues a (by default, it's configurable) OPTIONS * request and only after getting a successful response to the test will it send real requests on that connection (and then it will keep the connection open with Keep-Alive for further requests). X-Server-Load: would still be an improvement, eg with this response to OPTIONS: HTTP/1.1 200 OK Date: Wed, 14 Nov 2012 19:00:00 GMT Server: Apache/2.5.x X-Server-Load: 0.999 …the balancer might decide to use a backend that is reporting a lower load. I know I am fighting the tide here, but it's really the wrong smarts to put in the load balancer. The backend should/can know if it can take more requests. When it can't it shouldn't and the load balancer shouldn't pass that back to the end-user but rather just find another available server or hold on to the request until one becomes available (or some timeout value is met if things are that bad). Without a doubt, I agree that the load info should not be passed back to the end user (I state as much in the blog).
Re: The Case for a Universal Web Server Load Value
I really like how Perlbal does it: It opens a connection when it thinks it needs more and issues a (by default, it's configurable) OPTIONS * request and only after getting a successful response to the test will it send real requests on that connection (and then it will keep the connection open with Keep-Alive for further requests). Ask
Re: The Case for a Universal Web Server Load Value
On 14 Nov 2012, at 18:49, Ask Bjørn Hansen wrote: I really like how Perlbal does it: It opens a connection when it thinks it needs more and issues a (by default, it's configurable) OPTIONS * request and only after getting a successful response to the test will it send real requests on that connection (and then it will keep the connection open with Keep-Alive for further requests). X-Server-Load: would still be an improvement, eg with this response to OPTIONS: HTTP/1.1 200 OK Date: Wed, 14 Nov 2012 19:00:00 GMT Server: Apache/2.5.x X-Server-Load: 0.999 …the balancer might decide to use a backend that is reporting a lower load. -- Tim Bannister – is...@jellybaby.net smime.p7s Description: S/MIME cryptographic signature
Re: The Case for a Universal Web Server Load Value
On Nov 14, 2012, at 11:01, Tim Bannister is...@jellybaby.net wrote: I really like how Perlbal does it: It opens a connection when it thinks it needs more and issues a (by default, it's configurable) OPTIONS * request and only after getting a successful response to the test will it send real requests on that connection (and then it will keep the connection open with Keep-Alive for further requests). X-Server-Load: would still be an improvement, eg with this response to OPTIONS: HTTP/1.1 200 OK Date: Wed, 14 Nov 2012 19:00:00 GMT Server: Apache/2.5.x X-Server-Load: 0.999 …the balancer might decide to use a backend that is reporting a lower load. I know I am fighting the tide here, but it's really the wrong smarts to put in the load balancer. The backend should/can know if it can take more requests. When it can't it shouldn't and the load balancer shouldn't pass that back to the end-user but rather just find another available server or hold on to the request until one becomes available (or some timeout value is met if things are that bad). With the Perlbal-model the backend can control how much work it will take on and the load balancer will never send traffic to an overloaded or hung server, so the users will always get to the first truly available backend. The load balancer smarts should be in managing these let's see if you are ready requests and pending connections. Ask -- Ask Bjørn Hansen, http://askask.com/
Re: The Case for a Universal Web Server Load Value
On 14 Nov 2012, at 22:19, Ask Bjørn Hansen wrote: I know I am fighting the tide here, but it's really the wrong smarts to put in the load balancer. The backend should/can know if it can take more requests. When it can't it shouldn't and the load balancer shouldn't pass that back to the end-user but rather just find another available server or hold on to the request until one becomes available (or some timeout value is met if things are that bad). This only makes sense for idempotent requests. What about a POST or PUT? For a plausible example that mixes POST and GET: a cluster of N webservers providing SPARQL HTTP access to a triplestore. Most queries will use GET but some might use POST, either because they are too long for GET or because the query is an update. The reverse proxy / balancer manager might want to: • balance query workload across the active set of webservers • spin up an extra backend as required by load • skew load onto the minimum number of webservers (and suspend any spares) SPARQL is an example of a varying workload where none of httpd's existing lbmethods is perfect. One complex query can punish a backend whilst its peers are idle handling multiple concurrent requests. SPARQL sometimes means POST requests; a subset of these are safely repeatable but determining which ones is too complex for any HTTP proxy. -- Tim Bannister – is...@jellybaby.net smime.p7s Description: S/MIME cryptographic signature
Re: The Case for a Universal Web Server Load Value
On 15 Nov 2012, at 12:48 AM, Tim Bannister is...@jellybaby.net wrote: This only makes sense for idempotent requests. What about a POST or PUT? For a plausible example that mixes POST and GET: a cluster of N webservers providing SPARQL HTTP access to a triplestore. Most queries will use GET but some might use POST, either because they are too long for GET or because the query is an update. The reverse proxy / balancer manager might want to: • balance query workload across the active set of webservers • spin up an extra backend as required by load • skew load onto the minimum number of webservers (and suspend any spares) SPARQL is an example of a varying workload where none of httpd's existing lbmethods is perfect. One complex query can punish a backend whilst its peers are idle handling multiple concurrent requests. SPARQL sometimes means POST requests; a subset of these are safely repeatable but determining which ones is too complex for any HTTP proxy. There is no reason why a load balancer can't take into account existing requests in addition to new requests when making load balancing decisions. When there are a number of connections to a backend that are in flight but taking a while to complete, this is a sign the backend may be busy and should be avoided. That said if you have pathologically expensive requests coming into your backends, no load balancer is going to help you. Regards, Graham -- smime.p7s Description: S/MIME cryptographic signature
Re: The Case for a Universal Web Server Load Value
On 15/11/2012 00:48, Tim Bannister wrote: On 14 Nov 2012, at 22:19, Ask Bjørn Hansen wrote: I know I am fighting the tide here, but it's really the wrong smarts to put in the load balancer. The backend should/can know if it can take more requests. When it can't it shouldn't and the load balancer shouldn't pass that back to the end-user but rather just find another available server or hold on to the request until one becomes available (or some timeout value is met if things are that bad). This only makes sense for idempotent requests. What about a POST or PUT? What's the problem? LB will get the request, send OPTIONS * to the backends to find an available one and only then push the POST/PUT back to it... Issac
Re: The Case for a Universal Web Server Load Value
On 12 Nov 2012, at 15:04, Jim Jagielski wrote: Booting the discussion: http://www.jimjag.com/imo/index.php?/archives/248-The-Case-for-a-Universal-Web-Server-Load-Value.html There's bound to be more than one way to do it :-) I'm afraid I don't favour providing status data in every response. Doing it that way means that the reverse proxy has to filter something out and it isn't really clean HTTP. Would a strict implementation need to throw in a Vary: * as well? Instead, I would rather have load information provided via something broadly RESTful. httpd already has server-status and a machine readable variant, but there's room to improve it. I'd start with offering status via JSON and / or XML. I'd prefer XML because of the designed-in extensibility. With this approach, peers that want frequent server-status updates can request this status as often as they like, and can use the usual HTTP performance tweaks such as keepalive. A load-balancing reverse proxy can read this information, or a separate tool can track it and update the load balancer's weightings. As for how to express load? How about a float where 0.0 represents idle and 1.0 represents running flat out. A trivial implementation for Unix would take the load average and divide it by the number of CPUs. I would keep all of this separate from whether or not the backend has outright failed. Perlbal, and maybe some other software, will check an HTTP connection via an initial “OPTIONS *”, and will of course remember when a connection goes bad either via a TCP close or a 5xx response. -- Tim Bannister – is...@jellybaby.net smime.p7s Description: S/MIME cryptographic signature
Re: The Case for a Universal Web Server Load Value
A protocol for backends to communicate load to balancers in real time has appeal. You could hack it in HTTP or similar with X-Server-Load: 0.1234 Perhaps a series of numbers representing different moving averages, etc. As to what that represents, that must surely depend on the bottlenecks in a particular system. A backend doing heavy number-crunching and one doing lots of complex SQL queries have different loads, and a good load measure for one may be meaningless if applied to the other. How would a 'universal' measure reflect that kind of difference? Where I think you could usefully focus is on standardising a protocol for backends to communicate loads to balancers. That then becomes something we can implement in an lb method module in HTTPD. But it has to be left to individual backend systems exactly how they measure their own loads. -- Nick Kew
Re: The Case for a Universal Web Server Load Value
On 12 Nov 2012, at 5:04 PM, Jim Jagielski j...@jagunet.com wrote: http://www.jimjag.com/imo/index.php?/archives/248-The-Case-for-a-Universal-Web-Server-Load-Value.html +1 to the idea of a header, it is simple and unobtrusive, and doesn't give you any security headaches that any out-of-band header would give you. As to the format of the header, perhaps a application/x-www-form-urlencoded string of some kind? It allows us to be extensible if we need to be. For example: X-Server-Load: av1=5.76av5=0.44av15=0.10 or X-Server-Load: av1=5.76av5=0.44av15=0.10going-offline-in=22 (You get the idea) If a load balancer wants to query a server that might be offline, it might send an OPTIONS request and if the X-Server-Load permits, the load balancer might ramp that server back up again. Regards, Graham -- smime.p7s Description: S/MIME cryptographic signature
Re: The Case for a Universal Web Server Load Value
On 13 Nov 2012, at 11:17, Graham Leggett wrote: As to the format of the header, If Jim's thinking Universal, then the forum for discussion at that level of detail isn't going to be this list! Head over to IETF -- Nick Kew
Re: The Case for a Universal Web Server Load Value
On 13 Nov 2012, at 1:20 PM, Nick Kew n...@webthing.com wrote: As to the format of the header, If Jim's thinking Universal, then the forum for discussion at that level of detail isn't going to be this list! Head over to IETF …. I would love that such a thing ended up at the IETF, but it needs to start somewhere. Regards, Graham -- smime.p7s Description: S/MIME cryptographic signature
Re: The Case for a Universal Web Server Load Value
That's the idea... On Nov 13, 2012, at 7:05 AM, Graham Leggett minf...@sharp.fm wrote: On 13 Nov 2012, at 1:20 PM, Nick Kew n...@webthing.com wrote: As to the format of the header, If Jim's thinking Universal, then the forum for discussion at that level of detail isn't going to be this list! Head over to IETF …. I would love that such a thing ended up at the IETF, but it needs to start somewhere. Regards, Graham --
Re: The Case for a Universal Web Server Load Value
On Nov 13, 2012, at 5:58 AM, Nick Kew n...@webthing.com wrote: As to what that represents, that must surely depend on the bottlenecks in a particular system. A backend doing heavy number-crunching and one doing lots of complex SQL queries have different loads, and a good load measure for one may be meaningless if applied to the other. How would a 'universal' measure reflect that kind of difference? Where I think you could usefully focus is on standardising a protocol for backends to communicate loads to balancers. That then becomes something we can implement in an lb method module in HTTPD. But it has to be left to individual backend systems exactly how they measure their own loads. Yeah, and that can be tricky because it opens it up to becoming a marketing tool, rather than a useful load balancing tool. That's why I like a simple 0.0-1.0 scale at a minimum. Also, let's not forget that, at least which httpd, we also have load-factors and so we can weigh even what's returned by the backend. So even if they skew the results, we can always adjust for the real world.
The Case for a Universal Web Server Load Value
Booting the discussion: http://www.jimjag.com/imo/index.php?/archives/248-The-Case-for-a-Universal-Web-Server-Load-Value.html
Re: The Case for a Universal Web Server Load Value
You say: I have traditional Unix-type load-average and the percentage of how idle and busy the web-server is. But is that enough info? Or is that too much? How much data should the front-end want or need? Maybe a single agreed-upon value (ala load average) is best... maybe not. These are the kinds of questions to answer. How is 'idle' and 'busy' measure being calculated? Now to deviate a bit in to related topic . One of the concerns I have had when looking over how MPMs work of late is that the measure of how many threads are busy used to determine whether processes should be created or destroyed is a spot measure. At least that is how I interpret the code and I could well be wrong, so please correct me if I am :-) That is, how many threads are in use only at the time the maintenance cycle is run is taken into consideration. In the Python world where one cannot preload in the Apache parent the Python interpreter or your application for various reasons, and need to defer it to child worker processes, recycling processes can be an expensive exercise as everything is done in the child after the fork. What worries me is that the current MPM calculation with using a spot measure isn't really a true indication of how much the server is being utilised over time. Imagine the worst case where you were under load and had a large number of concurrent requests and a commensurate number of processes, but a substantial number finished just before the maintenance cycle ran. The spot measure could use a quite low number which doesn't truly reflect the request load on the server in the period just before that, and what may therefore come after. As a result of a low number for a specific maintenance cycle, it could think it had more idle threads than needed and kill off one process. On next cycle one second later the maintenance cycle may hit again when high number of concurrent request and think it has to create a process again. Another case is where you had a momentary network issue and so requests were getting through and so for a short period the busy measure was low and number of processes progressively get killed off at a rate of one a second. Using a spot measure rather than looking at business over an extended window of time, especially when killing processes, could cause process recycling when not warranted or when it would be better that it simply didn't do it. The potential for this is in part avoided by what the min/max idle threads is set to. That is, it effectively smooths out small fluctuations, but because the busy measure is a spot metric, am still concerned that the randomness of when requests run means that the spot metric could still jump around quite a lot between maintenance cycles to the extent that could exceed min/max levels and so kill off processes. Now for a Python site where recycling processes is expensive, the solution is to reconfigure the MPM settings to start more servers at the outset and allow a lot more idle capacity. But we know how many people actually bother to tune these settings properly. Anyway, that has had me wondering, and why I ask how you are calculating 'idle' and 'busy', whether such busy measures should not perhaps be done differently so that it can look back in time at prior traffic during the period since last maintenance cycle or even beyond that. One way of doing this is looking at one I call thread utilisation or what some also refer to as instance busy. At this point it is going to be easier for me to refer to: http://blog.newrelic.com/2012/09/11/introducing-capacity-analysis-for-python/ which has some nice pictures and description to help explain this thread utilisation measure. The thread utilisation over time since last maintenance cycle could therefore be used, perhaps weighted in some way with current spot busy value and also prior time periods to better smooth the value being used in the decision. I am guessing perhaps that some systems do have something more elaborate than the simplistic mechanism that MPM appears to use by my reading of the code. So what for example does mod_fcgid do? Even using thread utilisation, one thing it cannot capture is queueing time. That is, how long was a request sitting in the listener queue waiting to be accepted. Unfortunately I don't know of anyway to calculate this direct from the operating system and so it generally relies on some front end sticking in a header with a time stamp and looking at the elapsed time when it hits the backend server. If the machines are on different servers though then you got issues of clock skew to deal with. Anyway, sorry for the long ramble. I guess am just curious how busy is being calculated. Are there better ways of calculating what busy is which are more accurate? Or does it mostly not matter because when you start to reach higher levels of utilisations the spot metric will tend towards becoming more reflective of actual utilisation. Can additional measures, if