Deny http connection
Hi, I was wondering if it is possible to start rate-limiting or deny a connection based on response codes from the backend. For instance, I would like to start rejecting or rate limit a HTTP connection when a client triggers more than 20 HTTP 500's within a certain time frame. It this possible? Greets, Sander
Re: Deny http connection
Hi, You could do that using a stick table and the option http_err_rate. cheers On Fri, Nov 25, 2011 at 1:50 PM, Sander Klein roe...@roedie.nl wrote: Hi, I was wondering if it is possible to start rate-limiting or deny a connection based on response codes from the backend. For instance, I would like to start rejecting or rate limit a HTTP connection when a client triggers more than 20 HTTP 500's within a certain time frame. It this possible? Greets, Sander
Re: hashing + roundrobin algorithm
Thank you, Baptiste. Let me elaborate more details then. We have over three millions of files. Each static file is rather small ( 5MB) and has a unique identifier used as well as an URL. I referred to this URL as a file key. The performance goal we want from our system is high availability and maximizing throughput under a predefined constant latency, for example, 300ms. I think what you suggest would work if I could change the file's URL. I would be able to manually divide all files into different groups and create separated directory paths for each group. Then, based on ACL matching each group, HAProxy would forward a request to a particular backend running the roundrobin strategy. However, I couldn't change the file's URL by adding a prefix. The client software requesting the files from this cluster are a legacy software. It is simply not feasible to change it at the moment. On Tue, Nov 22, 2011 at 10:05 PM, Baptiste bed...@gmail.com wrote: Hi, As long as you don't share more details on how your files are accessed and what makes each URL unique, I can't help. As I said, splitting your files by directory path or by Host header may be good. Concerning example in haproxy, having the following in your frontend will do the stuff: acl dir1 path_beg /dir1/ usebackend bk_dir1 if dir1 acl dir1 path_beg /dir2/ usebackend bk_dir2 if dir2 ... then create the backends: bk_dir1 balance roundrobin server srv1 server srv2 bk_dir2 balance roundrobin server srv3 server srv4 ... Hope this helps On Mon, Nov 21, 2011 at 3:24 PM, Rerngvit Yanggratoke rerng...@kth.se wrote: Dear Baptiste, Could you please exemplify a criterion that would reduce the number of files per backends? And, if possible, how to implement that with HAProxy? On Sat, Nov 19, 2011 at 8:29 PM, Baptiste bed...@gmail.com wrote: On Fri, Nov 18, 2011 at 5:48 PM, Rerngvit Yanggratoke rerng...@kth.se wrote: Hello All, First of all, pardon me if I'm not communicating very well. English is not my native language. We are running a static file distribution cluster. The cluster consists of many web servers serving static files over HTTP. We have very large number of files such that a single server simply can not keep all files (don't have enough disk space). In particular, a file can be served only from a subset of servers. Each file is uniquely identified by a file's URI. I would refer to this URI later as a key. I am investigating deploying HAProxy as a front end to this cluster. We want HAProxy to provide load balancing and automatic fail over. In other words, a request comes first to HAProxy and HAProxy should forward the request to appropriate backend server. More precisely, for a particular key, there should be at least two servers being forwarded to from HAProxy for the sake of load balancing. My question is what load balancing strategy should I use? I could use hashing(based on key) or consistent hashing. However, each file would end up being served by a single server on a particular moment. That means I wouldn't have load balancing and fail over for a particular key. Is there something like a combination of hashing and roundrobin strategy? In particular, for a particular key, there would be multiple servers serving the requests and HAProxy selects one of them according to roundrobin policy. If there isn't such a stretegy, any suggestions on how to implement this into HAProxy? Any other comments are welcome as well. -- Best Regards, Rerngvit Yanggratoke Hi, You could create several backends and redirect requests based on an arbitrary criteria to reduce the number of files per backends. Using URL path prefix might be a good idea. Then inside a backend, you can use the hash url load-balancing algorithm. cheers -- Best Regards, Rerngvit Yanggratoke -- Best Regards, Rerngvit Yanggratoke
Re: hashing + roundrobin algorithm
Dear Willy, Thank you for your help. We have a clear performance goal for our cluster. The goal is high availability and maximizing throughput under a predefined constant latency. However, we don't have a clear idea what architecture or software would allow us to achieve that yet. Let me provide more details and try to answers your questions then. We have over three millions of files. Each static file is rather small ( 5MB) and has a unique identifier used as well as an URL. As a result, we are in the second case you mentioned. In particular, we should concern about if everybody downloads the same file simultaneously. We replicate each file at least two servers to provide fail over and load balancing. In particular, if a server temporary fails, users can retrieve the files kept on the failing server from another server. We do not have caching layer at the moments. More precisely, every request is served directly from the web servers. We want the system to scale linearly with the system size. In particular, when a new server is added, we want traffic to be equally channeled to the new server compared to existing servers. I will investigate Varnish cache and see if it fits to our system then. On Wed, Nov 23, 2011 at 8:15 AM, Willy Tarreau w...@1wt.eu wrote: Hi, On Fri, Nov 18, 2011 at 05:48:54PM +0100, Rerngvit Yanggratoke wrote: Hello All, First of all, pardon me if I'm not communicating very well. English is not my native language. We are running a static file distribution cluster. The cluster consists of many web servers serving static files over HTTP. We have very large number of files such that a single server simply can not keep all files (don't have enough disk space). In particular, a file can be served only from a subset of servers. Each file is uniquely identified by a file's URI. I would refer to this URI later as a key. I am investigating deploying HAProxy as a front end to this cluster. We want HAProxy to provide load balancing and automatic fail over. In other words, a request comes first to HAProxy and HAProxy should forward the request to appropriate backend server. More precisely, for a particular key, there should be at least two servers being forwarded to from HAProxy for the sake of load balancing. My question is what load balancing strategy should I use? I could use hashing(based on key) or consistent hashing. However, each file would end up being served by a single server on a particular moment. That means I wouldn't have load balancing and fail over for a particular key. This question is much more a question of architecture than of configuration. What is important is not what you can do with haproxy, but how you want your service to run. I suspect that if you acquired hardware and bandwidth to build your service, you have pretty clear ideas of how your files will be distributed and/or replicated between your servers. You also know whether you'll serve millions of files or just a few tens, which means in the first case that you can safely have one server per URL, and in the later that you would risk overloading a server if everybody downloads the same file at a time. Maybe you have installed caches to avoid overloading some servers. You have probably planned what will happen when you add new servers, and what is supposed to happen when a server temporarily fails. All of these are very important questions, they determine whether your site will work or fail. Once you're able to respond to these questions, it becomes much more obvious what the LB strategy can be, if you want to dedicate server farms to some URLs, or load-balance each hash among a few servers because you have a particular replication strategy. And once you know what you need, then we can study how haproxy can respond to this need. Maybe it can't at all, maybe it's easy to modify it to respond to your needs, maybe it does respond pretty well. My guess from what you describe is that it could make a lot of sense to have one layer of haproxy in front of Varnish caches. The first layer of haproxy chooses a cache based on a consistent hash of the URL, and each varnish is then configured to address a small bunch of servers in round robin. But this means that you need to assign servers to farms, and that if you lose a varnish, all the servers behind it are lost too. If your files are present on all servers, it might make sense to use varnish as explained above but which would round-robin across all servers. That way you make the cache layer and the server layer independant of each other. But this can imply complex replication strategies. As you see, there is no single response, you really need to define how you want your architecture to work and to scale first. Regards, Willy -- Best Regards, Rerngvit Yanggratoke
Re: Re: hashing + roundrobin algorithm
Hello wsq003, That sounds very interesting. It would be great if you could share your patch. If that is not possible, providing guideline on how to implement that would be helpful as well. Thank you! 2011/11/23 wsq003 wsq...@sina.com ** I've made a private patch to haproxy (just a few lines of code, but not elegant), which can support this feature. My condition is just like your imagination: consistent-hashing to a group then round-robin in this group. Our design is that several 'server' will share a physical machine, and 'severs' of one group will be distributed to several physical machine. So, if one physical machine is down, nothing will pass through the cache layer, because every group still works. Then we will get a chance to recover the cluster as we want. *From:* Willy Tarreau w...@1wt.eu *Date:* 2011-11-23 15:15 *To:* Rerngvit Yanggratoke rerng...@kth.se *CC:* haproxy haproxy@formilux.org; Baptiste bed...@gmail.com *Subject:* Re: hashing + roundrobin algorithm Hi, On Fri, Nov 18, 2011 at 05:48:54PM +0100, Rerngvit Yanggratoke wrote: Hello All, First of all, pardon me if I'm not communicating very well. English is not my native language. We are running a static file distribution cluster. The cluster consists of many web servers serving static files over HTTP. We have very large number of files such that a single server simply can not keep all files (don't have enough disk space). In particular, a file can be served only from a subset of servers. Each file is uniquely identified by a file's URI. I would refer to this URI later as a key. I am investigating deploying HAProxy as a front end to this cluster. We want HAProxy to provide load balancing and automatic fail over. In other words, a request comes first to HAProxy and HAProxy should forward the request to appropriate backend server. More precisely, for a particular key, there should be at least two servers being forwarded to from HAProxy for the sake of load balancing. My question is what load balancing strategy should I use? I could use hashing(based on key) or consistent hashing. However, each file would end up being served by a single server on a particular moment. That means I wouldn't have load balancing and fail over for a particular key. This question is much more a question of architecture than of configuration. What is important is not what you can do with haproxy, but how you want your service to run. I suspect that if you acquired hardware and bandwidth to build your service, you have pretty clear ideas of how your files will be distributed and/or replicated between your servers. You also know whether you'll serve millions of files or just a few tens, which means in the first case that you can safely have one server per URL, and in the later that you would risk overloading a server if everybody downloads the same file at a time. Maybe you have installed caches to avoid overloading some servers. You have probably planned what will happen when you add new servers, and what is supposed to happen when a server temporarily fails. All of these are very important questions, they determine whether your site will work or fail. Once you're able to respond to these questions, it becomes much more obvious what the LB strategy can be, if you want to dedicate server farms to some URLs, or load-balance each hash among a few servers because you have a particular replication strategy. And once you know what you need, then we can study how haproxy can respond to this need. Maybe it can't at all, maybe it's easy to modify it to respond to your needs, maybe it does respond pretty well. My guess from what you describe is that it could make a lot of sense to have one layer of haproxy in front of Varnish caches. The first layer of haproxy chooses a cache based on a consistent hash of the URL, and each varnish is then configured to address a small bunch of servers in round robin. But this means that you need to assign servers to farms, and that if you lose a varnish, all the servers behind it are lost too. If your files are present on all servers, it might make sense to use varnish as explained above but which would round-robin across all servers. That way you make the cache layer and the server layer independant of each other. But this can imply complex replication strategies. As you see, there is no single response, you really need to define how you want your architecture to work and to scale first. Regards, Willy -- Best Regards, Rerngvit Yanggratoke