Re: Backend per-server rate limiting

2012-08-08 Thread Willy Tarreau
Hi Andrew,

On Tue, Aug 07, 2012 at 11:44:53PM -0600, Andrew Davidoff wrote:
 Hi,
 
 I'm trying to determine if haproxy can be configured to solve a rate
 limiting based problem I have. I believe that it can, but that I am not
 seeing how to put the configuration together to get it done. Here's what
 I'm trying to do:
 
 I have a set of servers (backends) that can each handle a specific number
 of requests per second (the same rate for each backend). I'd like haproxy
 to accept requests and farm them out to these backends so that each request
 is sent to the first backend that isn't over its rate limit. If all
 backends are over their rate limits, ideally the client connection would
 just block and wait, but if haproxy has to return a rejection, I think I
 can deal with this.
 
 My first thought was to use frontend's rate-limit sessions, setting it to
 n*rate-limit where n is the number of backends I have to serve these
 requests. Additionally, those backends would be balanced round-robin.
 
 The problem with this is that if a backend falls out, the front end rate
 limit is then too high since there are less backends available than there
 were when it was originally configured. The only way I see that I could
 dynamically change the frontend rate-limit as backends rise and fall is to
 write something that watches the logs for rise/fall messages and uses the
 global rate limit setting via the haproxy socket. This might work, but the
 biggest drawback is that one instance of haproxy could only handle requests
 of a single rate limit, since modifications after starting would have to be
 global (not per frontend).
 
 I guess in other words, I am trying to apply rate limits to individual
 backend servers, and to have a front end cycle through all available
 backend servers until it either finds one that can handle the request, or
 exhausts them all, at which time it'd ideally just block and keep trying,
 or less ideally send some sort of failure/rejection to the client.
 
 I feel like there's a simple solution here that I'm not seeing. Any help is
 appreciated.

What you're asking for is in the 1.6 roadmap and the road will be long before
we reach this point.

Maybe in the mean time we could develop a new LB algorithm which considers
each server's request rate, and forwards the traffic to the least used one.
In parallel, having an ACL which computes the average per-server request
rate would allow requests to be rejected when there's a risk to overload
the servers. But that doesn't seem trivial and I have doubts about its real
usefulness.

What is needed is to convert a rate into a concurrency in order to queue
excess requests. What you can do at the moment, if you don't have too many
servers, is to have one proxy per server with its own rate limit. This way
you will be able to smooth the load in the first stage between all servers,
and even reject requests when the load is too high. You have to check the
real servers though, otherwise the health-checks would cause flapping when
the second level proxies are saturated. This would basically look like this :

   listen front
  bind :80
  balance leastconn
  server srv1 127.0.0.1:8000 maxconn 100 track back1/srv
  server srv2 127.0.0.2:8000 maxconn 100 track back2/srv
  server srv3 127.0.0.3:8000 maxconn 100 track back3/srv

   listen back1
  bind 127.0.0.1:8000
  rate-limit 10
  server srv 192.168.0.1:80 check

   listen back2
  bind 127.0.0.2:8000
  rate-limit 10
  server srv 192.168.0.2:80 check

   listen back3
  bind 127.0.0.3:8000
  rate-limit 10
  server srv 192.168.0.3:80 check

Then you have to play with the maxconn, maxqueue and timeout queue in
order to evict requests that are queued for too long a time, but you
get the idea.

Could I know what use case makes your servers sensible to the request rate ?
This is something totally abnormal since it should necessarily translate into
a concurrent number of connections at any place in the server. If the server
responds quickly, there should be no reason it cannot accept high request
rates. It's important to understand the complete model in order to build a
rock-solid configuration that will not just be a workaround for a symptom.

Regards,
Willy




Re: HA queues when it should not (yet)?

2012-08-08 Thread Baptiste
I strongly advice you to read the documentation about minconn (and
fullconn as well, they are linked together) :)
minconn is the value at which HAProxy will start queueing.
your fullconn values is useless to me: it should be greater. The
fullconn value is the value at which you want your maxconn to be
reached on your servers. Below this value, HAProxy will increase it on
a ramp between the minconn and the maxconn: its purpose is to avoid
using too much capacity of your server when you know you're already in
the dangerous area (the capacity zone where your servers starts
slowing down but have still enough capacty).

In your case, as soon as you have 4 requests running on a server the
next one might be queued, depending on the status of the other servers
(queueing or not queueing).
HAProxy won't send a request to a server which is queueing if some
other servers has still some slots available (unless you have a
predictible algorithm or persistence enabled).

Baptiste


On Tue, Aug 7, 2012 at 4:52 PM, Christian Parpart tra...@gmail.com wrote:
 On Tue, Aug 7, 2012 at 4:44 PM, Baptiste bed...@gmail.com wrote:

 Hi,

 If you have enabled minconn, it's an expected behavior :)
 otherwise, sharing your conf, screenshot and haproxy version would help a
 lot.


 Sorry for that, yep, we're using minconn (don't ask me why, I did not set it
 up) and
 maxconn to not overload the backends, and requests are only queued per LB
 not per backend.

 our $fullconn is the sum of all $maxconn's:

 listen  dawanda_cluster *:8700
 option httpchk GET /home/alive
 option httpclose
 balance roundrobin
 fullconn 361
 server c5 192.168.2.25: weight 1 minconn 4 maxconn 15 check inter 1
 server app2 192.168.3.2: weight 1 minconn 4 maxconn 8 check inter 1
 server app3 192.168.3.3: weight 1 minconn 4 maxconn 8 check inter 1
 server app5 192.168.3.5: weight 1 minconn 4 maxconn 12 check inter 1
 server app6 192.168.3.6: weight 1 minconn 4 maxconn 12 check inter 1
 server app11 192.168.3.11: weight 1 minconn 4 maxconn 12 check inter
 1
 server app16 192.168.3.16: weight 1 minconn 4 maxconn 12 check inter
 1
 server app17 192.168.3.17: weight 1 minconn 4 maxconn 12 check inter
 1
 server app31 192.168.3.31: weight 1 minconn 4 maxconn 30 check inter
 1
 server app32 192.168.3.32: weight 1 minconn 4 maxconn 30 check inter
 1
 server app33 192.168.3.33: weight 1 minconn 4 maxconn 30 check inter
 1
 server app35 192.168.3.35: weight 1 minconn 4 maxconn 30 check inter
 1
 server app36 192.168.3.36: weight 1 minconn 4 maxconn 30 check inter
 1
 server app37 192.168.3.37: weight 1 minconn 4 maxconn 30 check inter
 1
 server app50 10.10.40.9: weight 1 minconn 4 maxconn 30 check inter 1
 server app51 10.10.40.10: weight 1 minconn 4 maxconn 30 check inter
 1
 server app52 10.10.40.5: weight 1 minconn 4 maxconn 30 check inter 1

 Regards,
 Christian.



Re: Backend per-server rate limiting

2012-08-08 Thread Andrew Davidoff
Willy,

Thanks for the quick response. I haven't fully digested your example
suggestion yet but I will sit down with it and the haproxy configuration
documentation and sort it out in my brain.

Here's the basic idea of the use case. Let me go ahead and state that maybe
haproxy just isn't the right solution here. There are many ways to solve
this, It just seemed to me like haproxy might have been a magic answer.

We make a bunch of requests to an API that rate limits based on source IP.
To maximize our overall request rate, we utilize proxies to afford us more
source IPs. Even if those proxies can handle a ton of work themselves, if
we push them, individually, over the API's rate limits, they can be
temporarily or permanently disallowed from accessing the API.

Right now our API clients (scripts) handle rate limiting themselves. The
way they currently do this involves knowledge of the per-source-IP rate
limit for the API they're talking to, and how many proxies live behind a
squid instance that all their requests go through. That squid instance
hands out proxies round-robin, which is what makes the request rate work.

Based on how the scripts currently handle the rate limiting, we start
running into problems if we want multiple scripts accessing the same API to
run at the same time. Basically, each running script must then know about
any other scripts that are running and talking to the same API, so it can
adjust its request rate accordingly, and anything already running needs be
informed that more scripts access the same API have started up, so it can
do the same.

Additionally, we run into the problem of proxies failing. If a proxy fails
and the scripts don't learn then and adjust their rate limits, then the
per-proxy rate limit has inadvertently increased across all proxies.

So, again, there are many ways to solve this and maybe haproxy just isn't
the answer, but I thought maybe it would be. At the moment I'm very much in
don't reinvent the wheel mode, and I thought maybe haproxy had solved
this.

Thanks again for your help.
Andy


On Wed, Aug 8, 2012 at 12:11 AM, Willy Tarreau w...@1wt.eu wrote:

 Hi Andrew,

 On Tue, Aug 07, 2012 at 11:44:53PM -0600, Andrew Davidoff wrote:
  Hi,
 
  I'm trying to determine if haproxy can be configured to solve a rate
  limiting based problem I have. I believe that it can, but that I am not
  seeing how to put the configuration together to get it done. Here's what
  I'm trying to do:
 
  I have a set of servers (backends) that can each handle a specific number
  of requests per second (the same rate for each backend). I'd like haproxy
  to accept requests and farm them out to these backends so that each
 request
  is sent to the first backend that isn't over its rate limit. If all
  backends are over their rate limits, ideally the client connection would
  just block and wait, but if haproxy has to return a rejection, I think I
  can deal with this.
 
  My first thought was to use frontend's rate-limit sessions, setting it to
  n*rate-limit where n is the number of backends I have to serve these
  requests. Additionally, those backends would be balanced round-robin.
 
  The problem with this is that if a backend falls out, the front end rate
  limit is then too high since there are less backends available than there
  were when it was originally configured. The only way I see that I could
  dynamically change the frontend rate-limit as backends rise and fall is
 to
  write something that watches the logs for rise/fall messages and uses the
  global rate limit setting via the haproxy socket. This might work, but
 the
  biggest drawback is that one instance of haproxy could only handle
 requests
  of a single rate limit, since modifications after starting would have to
 be
  global (not per frontend).
 
  I guess in other words, I am trying to apply rate limits to individual
  backend servers, and to have a front end cycle through all available
  backend servers until it either finds one that can handle the request, or
  exhausts them all, at which time it'd ideally just block and keep trying,
  or less ideally send some sort of failure/rejection to the client.
 
  I feel like there's a simple solution here that I'm not seeing. Any help
 is
  appreciated.

 What you're asking for is in the 1.6 roadmap and the road will be long
 before
 we reach this point.

 Maybe in the mean time we could develop a new LB algorithm which considers
 each server's request rate, and forwards the traffic to the least used one.
 In parallel, having an ACL which computes the average per-server request
 rate would allow requests to be rejected when there's a risk to overload
 the servers. But that doesn't seem trivial and I have doubts about its real
 usefulness.

 What is needed is to convert a rate into a concurrency in order to queue
 excess requests. What you can do at the moment, if you don't have too many
 servers, is to have one proxy per server with its own rate limit. This way
 you 

Re: Backend per-server rate limiting

2012-08-08 Thread Willy Tarreau
On Wed, Aug 08, 2012 at 12:51:23AM -0600, Andrew Davidoff wrote:
 Willy,
 
 Thanks for the quick response. I haven't fully digested your example
 suggestion yet but I will sit down with it and the haproxy configuration
 documentation and sort it out in my brain.
 
 Here's the basic idea of the use case. Let me go ahead and state that maybe
 haproxy just isn't the right solution here. There are many ways to solve
 this, It just seemed to me like haproxy might have been a magic answer.
 
 We make a bunch of requests to an API that rate limits based on source IP.
 To maximize our overall request rate, we utilize proxies to afford us more
 source IPs. Even if those proxies can handle a ton of work themselves, if
 we push them, individually, over the API's rate limits, they can be
 temporarily or permanently disallowed from accessing the API.

OK I see now. It's not a performance limit but a contractual limit you
must not go over if you don't want to be banned.

Why don't you use a larger number of IP addresses to access your API then ?

 Right now our API clients (scripts) handle rate limiting themselves. The
 way they currently do this involves knowledge of the per-source-IP rate
 limit for the API they're talking to, and how many proxies live behind a
 squid instance that all their requests go through. That squid instance
 hands out proxies round-robin, which is what makes the request rate work.
 
 Based on how the scripts currently handle the rate limiting, we start
 running into problems if we want multiple scripts accessing the same API to
 run at the same time. Basically, each running script must then know about
 any other scripts that are running and talking to the same API, so it can
 adjust its request rate accordingly, and anything already running needs be
 informed that more scripts access the same API have started up, so it can
 do the same.
 
 Additionally, we run into the problem of proxies failing. If a proxy fails
 and the scripts don't learn then and adjust their rate limits, then the
 per-proxy rate limit has inadvertently increased across all proxies.

These are precisely the types of issues that you won't have with the
two-stage LB I gave in the example, because the load sent to the servers
will be smoothed as much as possible and will never go beyond the configured
limit. In my tests I have always observed 3-4 digits stability in rate-limits.

The first stage will ensure a global distribution and the second stage will
ensure you never go over the limit for each server.

You can even use this to increase the number of source addresses, with all
servers going to the same address (some people use this to browse via
multiple outgoing ADSL lines). For instance, let's say your API is on
192.168.0.1:80 and you want to use 3 different source IP addresses
(192.168.0.11..13) each limited to exactly 10 requests per second, and
you want to distribute the load across them in order to always use them
to the max possible rate :

   listen front
  bind :80
  balance leastconn
  server srv1 127.0.0.1:8000 maxconn 100 track back1/srv
  server srv2 127.0.0.2:8000 maxconn 100 track back2/srv
  server srv3 127.0.0.3:8000 maxconn 100 track back3/srv

   listen back1
  bind 127.0.0.1:8000
  rate-limit 10
  server srv 192.168.0.1:80 source 192.168.0.11 check

   listen back2
  bind 127.0.0.2:8000
  rate-limit 10
  server srv 192.168.0.1:80 source 192.168.0.12 check

   listen back3
  bind 127.0.0.3:8000
  rate-limit 10
  server srv 192.168.0.1:80 source 192.168.0.13 check

Regards,
Willy




Re: Backend per-server rate limiting

2012-08-08 Thread David Birdsong
On Tue, Aug 7, 2012 at 11:51 PM, Andrew Davidoff david...@qedmf.net wrote:
 Willy,

 Thanks for the quick response. I haven't fully digested your example
 suggestion yet but I will sit down with it and the haproxy configuration
 documentation and sort it out in my brain.

 Here's the basic idea of the use case. Let me go ahead and state that maybe
 haproxy just isn't the right solution here. There are many ways to solve
 this, It just seemed to me like haproxy might have been a magic answer.

 We make a bunch of requests to an API that rate limits based on source IP.
 To maximize our overall request rate, we utilize proxies to afford us more
 source IPs. Even if those proxies can handle a ton of work themselves, if we
 push them, individually, over the API's rate limits, they can be temporarily
 or permanently disallowed from accessing the API.

You could also consider solving this socially instead. Have you tried
reaching out to this service to ask for a higher rate limit? Your work
would be wasted if they improved their scraper detection beyond source
ip addresses.

What service is it?


 Right now our API clients (scripts) handle rate limiting themselves. The way
 they currently do this involves knowledge of the per-source-IP rate limit
 for the API they're talking to, and how many proxies live behind a squid
 instance that all their requests go through. That squid instance hands out
 proxies round-robin, which is what makes the request rate work.

 Based on how the scripts currently handle the rate limiting, we start
 running into problems if we want multiple scripts accessing the same API to
 run at the same time. Basically, each running script must then know about
 any other scripts that are running and talking to the same API, so it can
 adjust its request rate accordingly, and anything already running needs be
 informed that more scripts access the same API have started up, so it can do
 the same.

 Additionally, we run into the problem of proxies failing. If a proxy fails
 and the scripts don't learn then and adjust their rate limits, then the
 per-proxy rate limit has inadvertently increased across all proxies.

 So, again, there are many ways to solve this and maybe haproxy just isn't
 the answer, but I thought maybe it would be. At the moment I'm very much in
 don't reinvent the wheel mode, and I thought maybe haproxy had solved
 this.

 Thanks again for your help.
 Andy


 On Wed, Aug 8, 2012 at 12:11 AM, Willy Tarreau w...@1wt.eu wrote:

 Hi Andrew,

 On Tue, Aug 07, 2012 at 11:44:53PM -0600, Andrew Davidoff wrote:
  Hi,
 
  I'm trying to determine if haproxy can be configured to solve a rate
  limiting based problem I have. I believe that it can, but that I am not
  seeing how to put the configuration together to get it done. Here's what
  I'm trying to do:
 
  I have a set of servers (backends) that can each handle a specific
  number
  of requests per second (the same rate for each backend). I'd like
  haproxy
  to accept requests and farm them out to these backends so that each
  request
  is sent to the first backend that isn't over its rate limit. If all
  backends are over their rate limits, ideally the client connection would
  just block and wait, but if haproxy has to return a rejection, I think I
  can deal with this.
 
  My first thought was to use frontend's rate-limit sessions, setting it
  to
  n*rate-limit where n is the number of backends I have to serve these
  requests. Additionally, those backends would be balanced round-robin.
 
  The problem with this is that if a backend falls out, the front end rate
  limit is then too high since there are less backends available than
  there
  were when it was originally configured. The only way I see that I could
  dynamically change the frontend rate-limit as backends rise and fall is
  to
  write something that watches the logs for rise/fall messages and uses
  the
  global rate limit setting via the haproxy socket. This might work, but
  the
  biggest drawback is that one instance of haproxy could only handle
  requests
  of a single rate limit, since modifications after starting would have to
  be
  global (not per frontend).
 
  I guess in other words, I am trying to apply rate limits to individual
  backend servers, and to have a front end cycle through all available
  backend servers until it either finds one that can handle the request,
  or
  exhausts them all, at which time it'd ideally just block and keep
  trying,
  or less ideally send some sort of failure/rejection to the client.
 
  I feel like there's a simple solution here that I'm not seeing. Any help
  is
  appreciated.

 What you're asking for is in the 1.6 roadmap and the road will be long
 before
 we reach this point.

 Maybe in the mean time we could develop a new LB algorithm which considers
 each server's request rate, and forwards the traffic to the least used
 one.
 In parallel, having an ACL which computes the average per-server request
 rate would allow 

Re: Backend per-server rate limiting

2012-08-08 Thread Andrew Davidoff
On Wed, Aug 8, 2012 at 1:12 AM, Willy Tarreau w...@1wt.eu wrote:


 These are precisely the types of issues that you won't have with the
 two-stage LB I gave in the example, because the load sent to the servers
 will be smoothed as much as possible and will never go beyond the
 configured
 limit. In my tests I have always observed 3-4 digits stability in
 rate-limits.


Thanks Willy. I have taken a longer look at your example and the
configuration guide and I think I understand how it works, and how it'd
solve my problems. I will give it a try soon.

Thanks again for your quick and thorough responses.
Andy