Deny http connection

2011-11-25 Thread Sander Klein

Hi,

I was wondering if it is possible to start rate-limiting or deny a 
connection based on response codes from the backend.


For instance, I would like to start rejecting or rate limit a HTTP 
connection when a client triggers more than 20 HTTP 500's within a 
certain time frame.


It this possible?

Greets,

Sander



Re: Deny http connection

2011-11-25 Thread Baptiste
Hi,

You could do that using a stick table and the option http_err_rate.

cheers


On Fri, Nov 25, 2011 at 1:50 PM, Sander Klein roe...@roedie.nl wrote:
 Hi,

 I was wondering if it is possible to start rate-limiting or deny a
 connection based on response codes from the backend.

 For instance, I would like to start rejecting or rate limit a HTTP
 connection when a client triggers more than 20 HTTP 500's within a certain
 time frame.

 It this possible?

 Greets,

 Sander





Re: hashing + roundrobin algorithm

2011-11-25 Thread Rerngvit Yanggratoke
Thank you, Baptiste.  Let me elaborate more details then. We have over
three millions of files. Each static file is rather small ( 5MB) and has a
unique identifier used as well as an URL. I referred to this URL as a file
key. The performance goal we want from our system is high availability and
maximizing throughput under a predefined constant latency, for example,
300ms.

I think what you suggest would work if I could change the file's URL. I
would be able to manually divide all files into different groups and create
separated directory paths for each group. Then, based on ACL matching each
group, HAProxy would forward a request to a particular backend running the
roundrobin strategy.

However, I couldn't change the file's URL by adding a prefix. The client
software requesting the files from this cluster are a legacy software. It
is simply not feasible to change it at the moment.


On Tue, Nov 22, 2011 at 10:05 PM, Baptiste bed...@gmail.com wrote:

 Hi,

 As long as you don't share more details on how your files are accessed
 and what makes each URL unique, I can't help.
 As I said, splitting your files by directory path or by Host header may be
 good.

 Concerning example in haproxy, having the following in your frontend
 will do the stuff:
  acl dir1 path_beg /dir1/
  usebackend bk_dir1 if dir1
  acl dir1 path_beg /dir2/
  usebackend bk_dir2 if dir2
 ...

 then create the backends:
  bk_dir1
balance roundrobin
server srv1
server srv2
  bk_dir2
balance roundrobin
server srv3
server srv4
 ...

 Hope this helps

 On Mon, Nov 21, 2011 at 3:24 PM, Rerngvit Yanggratoke rerng...@kth.se
 wrote:
  Dear Baptiste,
  Could you please exemplify a criterion that would reduce the
  number of files per backends? And, if possible, how to implement that
 with
  HAProxy?
 
  On Sat, Nov 19, 2011 at 8:29 PM, Baptiste bed...@gmail.com wrote:
 
  On Fri, Nov 18, 2011 at 5:48 PM, Rerngvit Yanggratoke rerng...@kth.se
  wrote:
   Hello All,
   First of all, pardon me if I'm not communicating very well.
   English
   is not my native language. We are running a static file distribution
   cluster. The cluster consists of many web servers serving static files
   over
   HTTP.  We have very large number of files such that a single server
   simply
   can not keep all files (don't have enough disk space). In particular,
 a
   file
   can be served only from a subset of servers. Each file is uniquely
   identified by a file's URI. I would refer to this URI later as a key.
   I am investigating deploying HAProxy as a front end to this
   cluster.
   We want HAProxy to provide load balancing and automatic fail over. In
   other
   words, a request comes first to HAProxy and HAProxy should forward the
   request to appropriate backend server. More precisely, for a
 particular
   key,
   there should be at least two servers being forwarded to from HAProxy
 for
   the
   sake of load balancing. My question is what load
   balancing strategy should I
   use?
   I could use hashing(based on key) or consistent hashing.
   However,
   each file would end up being served by a single server on a particular
   moment. That means I wouldn't have load balancing and fail over for a
   particular key.
  Is there something like a combination of hashing and
   roundrobin strategy? In particular, for a particular key, there would
 be
   multiple servers serving the requests and HAProxy selects one of them
   according to roundrobin policy. If there isn't such a stretegy, any
   suggestions on how to implement this into HAProxy? Any other comments
   are
   welcome as well.
  
   --
   Best Regards,
   Rerngvit Yanggratoke
  
 
  Hi,
 
  You could create several backends and redirect requests based on an
  arbitrary criteria to reduce the number of files per backends. Using
  URL path prefix might be a good idea.
  Then inside a backend, you can use the hash url load-balancing
 algorithm.
 
  cheers
 
 
 
  --
  Best Regards,
  Rerngvit Yanggratoke
 




-- 
Best Regards,
Rerngvit Yanggratoke


Re: hashing + roundrobin algorithm

2011-11-25 Thread Rerngvit Yanggratoke
Dear Willy,
   Thank you for your help. We have a clear performance goal for our
cluster. The goal is high availability and maximizing throughput under a
predefined constant latency. However, we don't have a clear idea what
architecture or software would allow us to achieve that yet.  Let me
provide more details and try to answers your questions then.
We have over three millions of files. Each static file is rather
small ( 5MB) and has a unique identifier used as well as an URL. As a
result, we are in the second case you mentioned. In particular, we should
concern about if everybody downloads the same file simultaneously. We
replicate each file at least two servers to provide fail over and load
balancing. In particular, if a server temporary fails, users can retrieve
the files kept on the failing server from another server.
   We do not have caching layer at the moments. More precisely, every
request is served directly from the web servers. We want the system to
scale linearly with the system size. In particular, when a new server is
added, we want traffic to be equally channeled to the new server compared
to existing servers.
   I will investigate Varnish cache and see if it fits to our system
then.

On Wed, Nov 23, 2011 at 8:15 AM, Willy Tarreau w...@1wt.eu wrote:

 Hi,

 On Fri, Nov 18, 2011 at 05:48:54PM +0100, Rerngvit Yanggratoke wrote:
  Hello All,
  First of all, pardon me if I'm not communicating very well.
 English
  is not my native language. We are running a static file distribution
  cluster. The cluster consists of many web servers serving static files
 over
  HTTP.  We have very large number of files such that a single server
 simply
  can not keep all files (don't have enough disk space). In particular, a
  file can be served only from a subset of servers. Each file is uniquely
  identified by a file's URI. I would refer to this URI later as a key.
  I am investigating deploying HAProxy as a front end to this
  cluster. We want HAProxy to provide load balancing and automatic fail
 over.
  In other words, a request comes first to HAProxy and HAProxy should
 forward
  the request to appropriate backend server. More precisely, for a
 particular
  key, there should be at least two servers being forwarded to from HAProxy
  for the sake of load balancing. My question is what load
  balancing strategy should I use?
  I could use hashing(based on key) or consistent hashing. However,
  each file would end up being served by a single server on a particular
  moment. That means I wouldn't have load balancing and fail over for a
  particular key.

 This question is much more a question of architecture than of
 configuration.
 What is important is not what you can do with haproxy, but how you want
 your
 service to run. I suspect that if you acquired hardware and bandwidth to
 build
 your service, you have pretty clear ideas of how your files will be
 distributed
 and/or replicated between your servers. You also know whether you'll serve
 millions of files or just a few tens, which means in the first case that
 you
 can safely have one server per URL, and in the later that you would risk
 overloading a server if everybody downloads the same file at a time. Maybe
 you have installed caches to avoid overloading some servers. You have
 probably
 planned what will happen when you add new servers, and what is supposed to
 happen when a server temporarily fails.

 All of these are very important questions, they determine whether your site
 will work or fail.

 Once you're able to respond to these questions, it becomes much more
 obvious
 what the LB strategy can be, if you want to dedicate server farms to some
 URLs, or load-balance each hash among a few servers because you have a
 particular replication strategy. And once you know what you need, then we
 can study how haproxy can respond to this need. Maybe it can't at all,
 maybe
 it's easy to modify it to respond to your needs, maybe it does respond
 pretty
 well.

 My guess from what you describe is that it could make a lot of sense to
 have one layer of haproxy in front of Varnish caches. The first layer of
 haproxy chooses a cache based on a consistent hash of the URL, and each
 varnish is then configured to address a small bunch of servers in round
 robin. But this means that you need to assign servers to farms, and that
 if you lose a varnish, all the servers behind it are lost too.

 If your files are present on all servers, it might make sense to use
 varnish as explained above but which would round-robin across all servers.
 That way you make the cache layer and the server layer independant of each
 other. But this can imply complex replication strategies.

 As you see, there is no single response, you really need to define how you
 want your architecture to work and to scale first.

 Regards,
 Willy





-- 
Best Regards,
Rerngvit Yanggratoke


Re: Re: hashing + roundrobin algorithm

2011-11-25 Thread Rerngvit Yanggratoke
Hello wsq003,
   That sounds very interesting. It would be great if you could share
your patch. If that is not possible, providing guideline on how to
implement that would be helpful as well. Thank you!

2011/11/23 wsq003 wsq...@sina.com

 **

 I've made a private patch to haproxy (just a few lines of code, but not
 elegant), which can support this feature.

 My condition is just like your imagination: consistent-hashing to a group
 then round-robin in this group.

 Our design is that several 'server' will share a physical machine, and
 'severs' of one group will be distributed to several physical machine.
 So, if one physical machine is down, nothing will pass through the cache
 layer, because every group still works. Then we will get a chance to
 recover the cluster as we want.


  *From:* Willy Tarreau w...@1wt.eu
 *Date:* 2011-11-23 15:15
 *To:* Rerngvit Yanggratoke rerng...@kth.se
 *CC:* haproxy haproxy@formilux.org; Baptiste bed...@gmail.com
 *Subject:* Re: hashing + roundrobin algorithm
  Hi,

 On Fri, Nov 18, 2011 at 05:48:54PM +0100, Rerngvit Yanggratoke wrote:
  Hello All,

  First of all, pardon me if I'm not communicating very well. English
  is not my native language. We are running a static file distribution

  cluster. The cluster consists of many web servers serving static files over

  HTTP.  We have very large number of files such that a single server simply
  can not keep all files (don't have enough disk space). In particular, a
  file can be served only from a subset of servers. Each file is uniquely
  identified by a file's URI. I would refer to this URI later as a key.
  I am investigating deploying HAProxy as a front end to this

  cluster. We want HAProxy to provide load balancing and automatic fail over.

  In other words, a request comes first to HAProxy and HAProxy should forward

  the request to appropriate backend server. More precisely, for a particular
  key, there should be at least two servers being forwarded to from HAProxy
  for the sake of load balancing. My question is what load
  balancing strategy should I use?
  I could use hashing(based on key) or consistent hashing. However,
  each file would end up being served by a single server on a particular
  moment. That means I wouldn't have load balancing and fail over for a
  particular key.


 This question is much more a question of architecture than of configuration.

 What is important is not what you can do with haproxy, but how you want your

 service to run. I suspect that if you acquired hardware and bandwidth to build

 your service, you have pretty clear ideas of how your files will be 
 distributed
 and/or replicated between your servers. You also know whether you'll serve

 millions of files or just a few tens, which means in the first case that you
 can safely have one server per URL, and in the later that you would risk
 overloading a server if everybody downloads the same file at a time. Maybe

 you have installed caches to avoid overloading some servers. You have probably
 planned what will happen when you add new servers, and what is supposed to
 happen when a server temporarily fails.

 All of these are very important questions, they determine whether your site
 will work or fail.


 Once you're able to respond to these questions, it becomes much more obvious
 what the LB strategy can be, if you want to dedicate server farms to some
 URLs, or load-balance each hash among a few servers because you have a
 particular replication strategy. And once you know what you need, then we

 can study how haproxy can respond to this need. Maybe it can't at all, maybe

 it's easy to modify it to respond to your needs, maybe it does respond pretty
 well.

 My guess from what you describe is that it could make a lot of sense to
 have one layer of haproxy in front of Varnish caches. The first layer of
 haproxy chooses a cache based on a consistent hash of the URL, and each
 varnish is then configured to address a small bunch of servers in round
 robin. But this means that you need to assign servers to farms, and that
 if you lose a varnish, all the servers behind it are lost too.

 If your files are present on all servers, it might make sense to use
 varnish as explained above but which would round-robin across all servers.
 That way you make the cache layer and the server layer independant of each
 other. But this can imply complex replication strategies.

 As you see, there is no single response, you really need to define how you
 want your architecture to work and to scale first.

 Regards,
 Willy







-- 
Best Regards,
Rerngvit Yanggratoke