Re: clarification on peers and inclusion into 1.4 soon?
Hi David, On Thu, Jul 12, 2012 at 11:56:40AM -0700, David Birdsong wrote: On Tue, Apr 24, 2012 at 2:33 PM, David Birdsong david.birds...@gmail.com wrote: On Tue, Apr 24, 2012 at 2:18 PM, David Birdsong david.birds...@gmail.com wrote: On Tue, Apr 24, 2012 at 12:21 PM, Willy Tarreau w...@1wt.eu wrote: Hi David, On Tue, Apr 24, 2012 at 11:46:52AM -0700, David Birdsong wrote: i'm not seeing my response that swear i sent last night... I swear I didn't see it :-) so strange, gmail doesn't even have a saved draft to recall from. yes, this would solve our issues and would be very, very useful. OK, I'll have to think about it. Now I remember what the hard part was, an ACL is a list of possible expressions, and each expression supports multiple pattern sources. On the socket we could only reload one pattern source of one type at a time, of course. So the complexity comes in naming what we want to reload. For instance : oh, i forgot to ask if these future efforts could include support for other match types, like arbitrary string or regex comparison? acl bad_guys src 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3 acl bad_guys hdr_ip(x-forwarded-for) 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3 acl bad_guys hdr_ip(x-forwarded-for) -f manual.lst -f automatic.lst Now when trying to reload the bad_guys ACL, in fact we'd like to reload one of the files. Probably that we should find a way to name a specific expression (one ACL statement) that will have to be reloaded, I don't know. Or maybe in a first time we could reject reload requests for ACLs that have more than one statement. wow, yeah this could get tough to implement without adding some sort of name or keyword which i'm sure is a deal-breaker for backwards compatibility reasons. would the operations be read/update/insert or read/replace? the latter might be easier to implement; we(the user) could be asked to supply the filename exactly as it was named in the configuration which would then be followed the new data(replace). i admit this is makes it an odd case where haproxy config on disk doesn't match the running state; where a restart would revert back to values found on disk--but perhaps haproxy has already gone down that road with weights being set'able over the socket and in the config file. I also remember that some pattern parsing errors were sent to stderr and will have to be disabled when fed this way. In summary, nothing terribly complex but nothing trivial either. is that because stderr is long gone after haproxy has deamonized? could you send parsing errors to syslog? have you ever considered implementing a ring buffer to log the way that varnish logs? it's easy to solve the 'never block on disk' problem by putting the buffer file in a tmpfs, and it would provide a place to send errors for these odd corner cases. i'm mostly thinking aloud here. i really do love that haproxy never touches the filesystem after it's up and running. Regards, Willy Willy, have you given any cycles to including a support to add/remove acl's via the socket interface? I'm eagerly awaiting being able to inject/remove ip addresses to a continuously running haproxy. No I haven't made progress, I'm around 500 mails late in my mbox and around 50 of them need a reply, I'm trying to balance work at customers with work at my company and mail reading (around 1/3 each). So no progress at the moment. I'm considering taking holidays soon to catch up with e-mails soon and hopefully to progress on the connection management in haproxy which is much more important since I begun breaking it and can't continuously work on it. I must absolutely finish before switching to anything else. Can't promise more, I'm really sorry. Days are only 19 hours long :-( Willy
Re: clarification on peers and inclusion into 1.4 soon?
On Wed, Jul 18, 2012 at 3:01 PM, Willy Tarreau w...@1wt.eu wrote: Hi David, On Thu, Jul 12, 2012 at 11:56:40AM -0700, David Birdsong wrote: On Tue, Apr 24, 2012 at 2:33 PM, David Birdsong david.birds...@gmail.com wrote: On Tue, Apr 24, 2012 at 2:18 PM, David Birdsong david.birds...@gmail.com wrote: On Tue, Apr 24, 2012 at 12:21 PM, Willy Tarreau w...@1wt.eu wrote: Hi David, On Tue, Apr 24, 2012 at 11:46:52AM -0700, David Birdsong wrote: i'm not seeing my response that swear i sent last night... I swear I didn't see it :-) so strange, gmail doesn't even have a saved draft to recall from. yes, this would solve our issues and would be very, very useful. OK, I'll have to think about it. Now I remember what the hard part was, an ACL is a list of possible expressions, and each expression supports multiple pattern sources. On the socket we could only reload one pattern source of one type at a time, of course. So the complexity comes in naming what we want to reload. For instance : oh, i forgot to ask if these future efforts could include support for other match types, like arbitrary string or regex comparison? acl bad_guys src 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3 acl bad_guys hdr_ip(x-forwarded-for) 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3 acl bad_guys hdr_ip(x-forwarded-for) -f manual.lst -f automatic.lst Now when trying to reload the bad_guys ACL, in fact we'd like to reload one of the files. Probably that we should find a way to name a specific expression (one ACL statement) that will have to be reloaded, I don't know. Or maybe in a first time we could reject reload requests for ACLs that have more than one statement. wow, yeah this could get tough to implement without adding some sort of name or keyword which i'm sure is a deal-breaker for backwards compatibility reasons. would the operations be read/update/insert or read/replace? the latter might be easier to implement; we(the user) could be asked to supply the filename exactly as it was named in the configuration which would then be followed the new data(replace). i admit this is makes it an odd case where haproxy config on disk doesn't match the running state; where a restart would revert back to values found on disk--but perhaps haproxy has already gone down that road with weights being set'able over the socket and in the config file. I also remember that some pattern parsing errors were sent to stderr and will have to be disabled when fed this way. In summary, nothing terribly complex but nothing trivial either. is that because stderr is long gone after haproxy has deamonized? could you send parsing errors to syslog? have you ever considered implementing a ring buffer to log the way that varnish logs? it's easy to solve the 'never block on disk' problem by putting the buffer file in a tmpfs, and it would provide a place to send errors for these odd corner cases. i'm mostly thinking aloud here. i really do love that haproxy never touches the filesystem after it's up and running. Regards, Willy Willy, have you given any cycles to including a support to add/remove acl's via the socket interface? I'm eagerly awaiting being able to inject/remove ip addresses to a continuously running haproxy. No I haven't made progress, I'm around 500 mails late in my mbox and around 50 of them need a reply, I'm trying to balance work at customers with work at my company and mail reading (around 1/3 each). So no progress at the moment. Ok, thanks for the update. I'm considering taking holidays soon to catch up with e-mails soon and hopefully to progress on the connection management in haproxy which is much more important since I begun breaking it and can't continuously work on it. I must absolutely finish before switching to anything else. I totally understand. I've been working in start ups for a few years and can empathize with the lack of time in the day to get to everything. I think for the time being I may take a look at ngx-resty and use global lua tables which means an authorized POST can update the IP acl list. If this is ever available in haproxy, I'd switch back in a hearbeat. Can't promise more, I'm really sorry. Days are only 19 hours long :-(. Willy
Re: clarification on peers and inclusion into 1.4 soon?
On Tue, Apr 24, 2012 at 2:33 PM, David Birdsong david.birds...@gmail.com wrote: On Tue, Apr 24, 2012 at 2:18 PM, David Birdsong david.birds...@gmail.com wrote: On Tue, Apr 24, 2012 at 12:21 PM, Willy Tarreau w...@1wt.eu wrote: Hi David, On Tue, Apr 24, 2012 at 11:46:52AM -0700, David Birdsong wrote: i'm not seeing my response that swear i sent last night... I swear I didn't see it :-) so strange, gmail doesn't even have a saved draft to recall from. yes, this would solve our issues and would be very, very useful. OK, I'll have to think about it. Now I remember what the hard part was, an ACL is a list of possible expressions, and each expression supports multiple pattern sources. On the socket we could only reload one pattern source of one type at a time, of course. So the complexity comes in naming what we want to reload. For instance : oh, i forgot to ask if these future efforts could include support for other match types, like arbitrary string or regex comparison? acl bad_guys src 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3 acl bad_guys hdr_ip(x-forwarded-for) 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3 acl bad_guys hdr_ip(x-forwarded-for) -f manual.lst -f automatic.lst Now when trying to reload the bad_guys ACL, in fact we'd like to reload one of the files. Probably that we should find a way to name a specific expression (one ACL statement) that will have to be reloaded, I don't know. Or maybe in a first time we could reject reload requests for ACLs that have more than one statement. wow, yeah this could get tough to implement without adding some sort of name or keyword which i'm sure is a deal-breaker for backwards compatibility reasons. would the operations be read/update/insert or read/replace? the latter might be easier to implement; we(the user) could be asked to supply the filename exactly as it was named in the configuration which would then be followed the new data(replace). i admit this is makes it an odd case where haproxy config on disk doesn't match the running state; where a restart would revert back to values found on disk--but perhaps haproxy has already gone down that road with weights being set'able over the socket and in the config file. I also remember that some pattern parsing errors were sent to stderr and will have to be disabled when fed this way. In summary, nothing terribly complex but nothing trivial either. is that because stderr is long gone after haproxy has deamonized? could you send parsing errors to syslog? have you ever considered implementing a ring buffer to log the way that varnish logs? it's easy to solve the 'never block on disk' problem by putting the buffer file in a tmpfs, and it would provide a place to send errors for these odd corner cases. i'm mostly thinking aloud here. i really do love that haproxy never touches the filesystem after it's up and running. Regards, Willy Willy, have you given any cycles to including a support to add/remove acl's via the socket interface? I'm eagerly awaiting being able to inject/remove ip addresses to a continuously running haproxy.
Re: clarification on peers and inclusion into 1.4 soon?
On Mon, Apr 23, 2012 at 5:07 PM, Kevin Heatwole ke...@heatwoles.us wrote: On Apr 23, 2012, at 7:31 PM, David Birdsong wrote: ... - nginx is already in front of haproxy, but nginx is not the first listener, so it sees the IP addresses as HTTP headers too. the last time I checked nginx only blocks IP addresses from layer 4 connections. any other blocking would require nginx to compare the IP addresses as strings or regexes which I want to avoid doing on every single request. if the list grows long, every request suffers. ip comparison on long lists of IP's is one area where haproxy is the clear winner I'm no nginx expert, but I use the geo keyword to quickly search for banned ups: great idea, this could be very simple to get running. geo $is_banned_ip { default 0; include /www/nginx/banned_ips.conf; } where the banned_ips.conf lists the IPs followed by a 1 as in: X.X.X.X/32 1; X.X.X.Y/32 1; Then, inside the location, I simply test $is_banned_ip and reject those requests, as in: location / { if ($is_banned_id) { return 401; } index index.php index.html; } The geo keyword is primarily used for getting the geolocation of an IP so I think you can have a pretty large list of IPs and still be very efficient. Kevin On Mon, Apr 23, 2012 at 2:48 PM, Kevin Heatwole ke...@heatwoles.us wrote: You might want to block the IPs before they get into haproxy. Maybe put an nginx reverse proxy in front of haproxy? I use nginx to dynamically block/allow HTTP requests by IP. Another possibility, if you just need to block a list of IPs would be to use a firewall/iptables in front of haproxy to do the blocking. - nginx is already in front of haproxy, but nginx is not the first listener, so it sees the IP addresses as HTTP headers too. the last time I checked nginx only blocks IP addresses from layer 4 connections. any other blocking would require nginx to compare the IP addresses as strings or regexes which I want to avoid doing on every single request. if the list grows long, every request suffers. ip comparison on long lists of IP's is one area where haproxy is the clear winner - iptables won't work either, iptables works on TCP/IP not HTTP i'd like to keep IP blocking in haproxy. On Apr 23, 2012, at 2:45 PM, David Birdsong wrote: Hi, I've got a situation where I need to update haproxy every 1-2 mins to apprise it of a new list of ip addresses to tarpit. I've rigged up a fairly hacky pipeline to detect scrapers on our site based on entries found X-Forwarded-For. To get around the fact the stick-table entries are only keyed off of protocols lower than http currently, I need to reload haproxy for every new IP address that I detect. It's ugly, but I've decided to reload haproxy on our site every 2 minutes. This means that all load balancing info is lost very frequently, maxconns per backend are reset, and during deploy time when we rely on slowstart to warm our backend's, we have to completely disable reloads of haproxy altogether which opens us up to heavy scraping for ~1-3 hours per day during our code deploy which makes it tough to differentiate between slowness induced by recent code changes and scrapers sucking up resources. I'd love to not reload haproxy and let it learn about IP's to block internally, but my understanding is that IP addresses found at the HTTP level will not work their way into stick tables for some time. Will peers help to maintain state inside of haproxy between graceful reloads? Will connection counts to backends be maintained? Are any stats back populated to the new process? Also, how is the stability of peer mode? It's going to take some arguing and hand-wringing to convince others in our organization to put a 1.5 version out in front of the site despite the fact that haproxy is generally the most stable piece of software in our stack dev version or not. Are there any efforts to port peer mode into 1.4 soon? Thanks again for one of the most useful, fast, and stable tools that the community has come to rely so heavily on.
RE: clarification on peers and inclusion into 1.4 soon?
The map module is for comparing strings. The geo module is for comparing IP's or multiple IP's in CIDR notation. .. it took a while for me to figure this out. Regards, Jens Dueholm Christensen -Original Message- From: Aleksandar Lazic [mailto:al-hapr...@none.at] Sent: Tuesday, April 24, 2012 2:04 AM To: haproxy@formilux.org Subject: Re: clarification on peers and inclusion into 1.4 soon? Dear David, On 24-04-2012 01:31, David Birdsong wrote: On Mon, Apr 23, 2012 at 2:48 PM, Kevin Heatwole ke...@heatwoles.us wrote: You might want to block the IPs before they get into haproxy. Maybe put an nginx reverse proxy in front of haproxy? I use nginx to dynamically block/allow HTTP requests by IP. Another possibility, if you just need to block a list of IPs would be to use a firewall/iptables in front of haproxy to do the blocking. - nginx is already in front of haproxy, but nginx is not the first listener, so it sees the IP addresses as HTTP headers too. the last time I checked nginx only blocks IP addresses from layer 4 connections. any other blocking would require nginx to compare the IP addresses as strings or regexes which I want to avoid doing on every single request. if the list grows long, every request suffers. ip comparison on long lists of IP's is one area where haproxy is the clear winner Depend on the list size maybe you can use the map module from nginx. http://nginx.org/en/docs/http/ngx_http_map_module.html The map module can also handle regex matches. For example: http://serverfault.com/questions/316541/check-several-user-agent-in-nginx http://www.ruby-forum.com/topic/2440219 http://redant.com.au/blog/manage-ssl-redirection-in-nginx-using-maps-and-save-the-universe/ - iptables won't work either, iptables works on TCP/IP not HTTP Depend of your iptables setup maybe you can use the string matching module. http://spamcleaner.org/en/misc/w00tw00t.html i'd like to keep IP blocking in haproxy. ok BR Aleks
Re: clarification on peers and inclusion into 1.4 soon?
On Mon, Apr 23, 2012 at 10:47 PM, Willy Tarreau w...@1wt.eu wrote: Hi David, On Mon, Apr 23, 2012 at 11:45:51AM -0700, David Birdsong wrote: Hi, I've got a situation where I need to update haproxy every 1-2 mins to apprise it of a new list of ip addresses to tarpit. I've rigged up a fairly hacky pipeline to detect scrapers on our site based on entries found X-Forwarded-For. To get around the fact the stick-table entries are only keyed off of protocols lower than http currently, I need to reload haproxy for every new IP address that I detect. It's ugly, but I've decided to reload haproxy on our site every 2 minutes. This means that all load balancing info is lost very frequently, maxconns per backend are reset, and during deploy time when we rely on slowstart to warm our backend's, we have to completely disable reloads of haproxy altogether which opens us up to heavy scraping for ~1-3 hours per day during our code deploy which makes it tough to differentiate between slowness induced by recent code changes and scrapers sucking up resources. I see, this is quite dirty. I'd love to not reload haproxy and let it learn about IP's to block internally, but my understanding is that IP addresses found at the HTTP level will not work their way into stick tables for some time. That's true for now and will possibly change in a few days, though the resulting code will have to be handled with some care ! Will peers help to maintain state inside of haproxy between graceful reloads? No, because counters are not synchronized over peers, only stickiness information is (eg: server id). Will connection counts to backends be maintained? Are any stats back populated to the new process? Similarly, there is no such thing either. Also, how is the stability of peer mode? It's going to take some arguing and hand-wringing to convince others in our organization to put a 1.5 version out in front of the site despite the fact that haproxy is generally the most stable piece of software in our stack dev version or not. well, I would see it two ways : - all fixes in stable are backported from -dev, so there is no bug that is fixed in stable but not in -dev - most new regressions are found implemented in -dev, so it regularly happens that we have some annoying bugs in -dev. Overall, -dev works fine when you wait a bit after a release, because I know that there are a number of users running it in production and for this reason if I spot problematic issues after a release, I emit a new one with these issues fixed. But I certainly understand why a number of people would feel uneasy with such a version in production, because precisely it's being actively developped and subject to bugs. Are there any efforts to port peer mode into 1.4 soon? No, there is no such plan and it would require a huge amount of changes that are 1.5-specific. However, there is something I've long wanted to be able to add, it's the ability to add/remove ACL patterns from the socket interface. I think it should exactly match your needs, and it should not be too much complex so that we could backport it into 1.4. I think that in your config you have an ACL which loads its IP addresses from an external file, something like this : acl scrapper hdr_ip(x-forwarded-for) -f scrappers.txt The idea is that we should be able to add/remove entries in this ACL. I even planned to be able to re-load the whole file over the CLI so that it's easier to maintain coherent. I remember there was something difficult in doing it but I don't precisely remember what. Do you think that it would solve your issues ? i'm not seeing my response that swear i sent last night... yes, this would solve our issues and would be very, very useful. Regards, Willy
Re: clarification on peers and inclusion into 1.4 soon?
On Tue, Apr 24, 2012 at 12:21 PM, Willy Tarreau w...@1wt.eu wrote: Hi David, On Tue, Apr 24, 2012 at 11:46:52AM -0700, David Birdsong wrote: i'm not seeing my response that swear i sent last night... I swear I didn't see it :-) so strange, gmail doesn't even have a saved draft to recall from. yes, this would solve our issues and would be very, very useful. OK, I'll have to think about it. Now I remember what the hard part was, an ACL is a list of possible expressions, and each expression supports multiple pattern sources. On the socket we could only reload one pattern source of one type at a time, of course. So the complexity comes in naming what we want to reload. For instance : acl bad_guys src 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3 acl bad_guys hdr_ip(x-forwarded-for) 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3 acl bad_guys hdr_ip(x-forwarded-for) -f manual.lst -f automatic.lst Now when trying to reload the bad_guys ACL, in fact we'd like to reload one of the files. Probably that we should find a way to name a specific expression (one ACL statement) that will have to be reloaded, I don't know. Or maybe in a first time we could reject reload requests for ACLs that have more than one statement. wow, yeah this could get tough to implement without adding some sort of name or keyword which i'm sure is a deal-breaker for backwards compatibility reasons. would the operations be read/update/insert or read/replace? the latter might be easier to implement; we(the user) could be asked to supply the filename exactly as it was named in the configuration which would then be followed the new data(replace). i admit this is makes it an odd case where haproxy config on disk doesn't match the running state; where a restart would revert back to values found on disk--but perhaps haproxy has already gone down that road with weights being set'able over the socket and in the config file. I also remember that some pattern parsing errors were sent to stderr and will have to be disabled when fed this way. In summary, nothing terribly complex but nothing trivial either. is that because stderr is long gone after haproxy has deamonized? could you send parsing errors to syslog? have you ever considered implementing a ring buffer to log the way that varnish logs? it's easy to solve the 'never block on disk' problem by putting the buffer file in a tmpfs, and it would provide a place to send errors for these odd corner cases. i'm mostly thinking aloud here. i really do love that haproxy never touches the filesystem after it's up and running. Regards, Willy
Re: clarification on peers and inclusion into 1.4 soon?
You might want to block the IPs before they get into haproxy. Maybe put an nginx reverse proxy in front of haproxy? I use nginx to dynamically block/allow HTTP requests by IP. Another possibility, if you just need to block a list of IPs would be to use a firewall/iptables in front of haproxy to do the blocking. On Apr 23, 2012, at 2:45 PM, David Birdsong wrote: Hi, I've got a situation where I need to update haproxy every 1-2 mins to apprise it of a new list of ip addresses to tarpit. I've rigged up a fairly hacky pipeline to detect scrapers on our site based on entries found X-Forwarded-For. To get around the fact the stick-table entries are only keyed off of protocols lower than http currently, I need to reload haproxy for every new IP address that I detect. It's ugly, but I've decided to reload haproxy on our site every 2 minutes. This means that all load balancing info is lost very frequently, maxconns per backend are reset, and during deploy time when we rely on slowstart to warm our backend's, we have to completely disable reloads of haproxy altogether which opens us up to heavy scraping for ~1-3 hours per day during our code deploy which makes it tough to differentiate between slowness induced by recent code changes and scrapers sucking up resources. I'd love to not reload haproxy and let it learn about IP's to block internally, but my understanding is that IP addresses found at the HTTP level will not work their way into stick tables for some time. Will peers help to maintain state inside of haproxy between graceful reloads? Will connection counts to backends be maintained? Are any stats back populated to the new process? Also, how is the stability of peer mode? It's going to take some arguing and hand-wringing to convince others in our organization to put a 1.5 version out in front of the site despite the fact that haproxy is generally the most stable piece of software in our stack dev version or not. Are there any efforts to port peer mode into 1.4 soon? Thanks again for one of the most useful, fast, and stable tools that the community has come to rely so heavily on.
Re: clarification on peers and inclusion into 1.4 soon?
On Mon, Apr 23, 2012 at 2:48 PM, Kevin Heatwole ke...@heatwoles.us wrote: You might want to block the IPs before they get into haproxy. Maybe put an nginx reverse proxy in front of haproxy? I use nginx to dynamically block/allow HTTP requests by IP. Another possibility, if you just need to block a list of IPs would be to use a firewall/iptables in front of haproxy to do the blocking. - nginx is already in front of haproxy, but nginx is not the first listener, so it sees the IP addresses as HTTP headers too. the last time I checked nginx only blocks IP addresses from layer 4 connections. any other blocking would require nginx to compare the IP addresses as strings or regexes which I want to avoid doing on every single request. if the list grows long, every request suffers. ip comparison on long lists of IP's is one area where haproxy is the clear winner - iptables won't work either, iptables works on TCP/IP not HTTP i'd like to keep IP blocking in haproxy. On Apr 23, 2012, at 2:45 PM, David Birdsong wrote: Hi, I've got a situation where I need to update haproxy every 1-2 mins to apprise it of a new list of ip addresses to tarpit. I've rigged up a fairly hacky pipeline to detect scrapers on our site based on entries found X-Forwarded-For. To get around the fact the stick-table entries are only keyed off of protocols lower than http currently, I need to reload haproxy for every new IP address that I detect. It's ugly, but I've decided to reload haproxy on our site every 2 minutes. This means that all load balancing info is lost very frequently, maxconns per backend are reset, and during deploy time when we rely on slowstart to warm our backend's, we have to completely disable reloads of haproxy altogether which opens us up to heavy scraping for ~1-3 hours per day during our code deploy which makes it tough to differentiate between slowness induced by recent code changes and scrapers sucking up resources. I'd love to not reload haproxy and let it learn about IP's to block internally, but my understanding is that IP addresses found at the HTTP level will not work their way into stick tables for some time. Will peers help to maintain state inside of haproxy between graceful reloads? Will connection counts to backends be maintained? Are any stats back populated to the new process? Also, how is the stability of peer mode? It's going to take some arguing and hand-wringing to convince others in our organization to put a 1.5 version out in front of the site despite the fact that haproxy is generally the most stable piece of software in our stack dev version or not. Are there any efforts to port peer mode into 1.4 soon? Thanks again for one of the most useful, fast, and stable tools that the community has come to rely so heavily on.
Re: clarification on peers and inclusion into 1.4 soon?
Dear David, On 24-04-2012 01:31, David Birdsong wrote: On Mon, Apr 23, 2012 at 2:48 PM, Kevin Heatwole ke...@heatwoles.us wrote: You might want to block the IPs before they get into haproxy. Maybe put an nginx reverse proxy in front of haproxy? I use nginx to dynamically block/allow HTTP requests by IP. Another possibility, if you just need to block a list of IPs would be to use a firewall/iptables in front of haproxy to do the blocking. - nginx is already in front of haproxy, but nginx is not the first listener, so it sees the IP addresses as HTTP headers too. the last time I checked nginx only blocks IP addresses from layer 4 connections. any other blocking would require nginx to compare the IP addresses as strings or regexes which I want to avoid doing on every single request. if the list grows long, every request suffers. ip comparison on long lists of IP's is one area where haproxy is the clear winner Depend on the list size maybe you can use the map module from nginx. http://nginx.org/en/docs/http/ngx_http_map_module.html The map module can also handle regex matches. For example: http://serverfault.com/questions/316541/check-several-user-agent-in-nginx http://www.ruby-forum.com/topic/2440219 http://redant.com.au/blog/manage-ssl-redirection-in-nginx-using-maps-and-save-the-universe/ - iptables won't work either, iptables works on TCP/IP not HTTP Depend of your iptables setup maybe you can use the string matching module. http://spamcleaner.org/en/misc/w00tw00t.html i'd like to keep IP blocking in haproxy. ok BR Aleks
Re: clarification on peers and inclusion into 1.4 soon?
On Apr 23, 2012, at 7:31 PM, David Birdsong wrote: ... - nginx is already in front of haproxy, but nginx is not the first listener, so it sees the IP addresses as HTTP headers too. the last time I checked nginx only blocks IP addresses from layer 4 connections. any other blocking would require nginx to compare the IP addresses as strings or regexes which I want to avoid doing on every single request. if the list grows long, every request suffers. ip comparison on long lists of IP's is one area where haproxy is the clear winner I'm no nginx expert, but I use the geo keyword to quickly search for banned ups: geo $is_banned_ip { default 0; include /www/nginx/banned_ips.conf; } where the banned_ips.conf lists the IPs followed by a 1 as in: X.X.X.X/32 1; X.X.X.Y/32 1; Then, inside the location, I simply test $is_banned_ip and reject those requests, as in: location / { if ($is_banned_id) { return 401; } index index.php index.html; } The geo keyword is primarily used for getting the geolocation of an IP so I think you can have a pretty large list of IPs and still be very efficient. Kevin On Mon, Apr 23, 2012 at 2:48 PM, Kevin Heatwole ke...@heatwoles.us wrote: You might want to block the IPs before they get into haproxy. Maybe put an nginx reverse proxy in front of haproxy? I use nginx to dynamically block/allow HTTP requests by IP. Another possibility, if you just need to block a list of IPs would be to use a firewall/iptables in front of haproxy to do the blocking. - nginx is already in front of haproxy, but nginx is not the first listener, so it sees the IP addresses as HTTP headers too. the last time I checked nginx only blocks IP addresses from layer 4 connections. any other blocking would require nginx to compare the IP addresses as strings or regexes which I want to avoid doing on every single request. if the list grows long, every request suffers. ip comparison on long lists of IP's is one area where haproxy is the clear winner - iptables won't work either, iptables works on TCP/IP not HTTP i'd like to keep IP blocking in haproxy. On Apr 23, 2012, at 2:45 PM, David Birdsong wrote: Hi, I've got a situation where I need to update haproxy every 1-2 mins to apprise it of a new list of ip addresses to tarpit. I've rigged up a fairly hacky pipeline to detect scrapers on our site based on entries found X-Forwarded-For. To get around the fact the stick-table entries are only keyed off of protocols lower than http currently, I need to reload haproxy for every new IP address that I detect. It's ugly, but I've decided to reload haproxy on our site every 2 minutes. This means that all load balancing info is lost very frequently, maxconns per backend are reset, and during deploy time when we rely on slowstart to warm our backend's, we have to completely disable reloads of haproxy altogether which opens us up to heavy scraping for ~1-3 hours per day during our code deploy which makes it tough to differentiate between slowness induced by recent code changes and scrapers sucking up resources. I'd love to not reload haproxy and let it learn about IP's to block internally, but my understanding is that IP addresses found at the HTTP level will not work their way into stick tables for some time. Will peers help to maintain state inside of haproxy between graceful reloads? Will connection counts to backends be maintained? Are any stats back populated to the new process? Also, how is the stability of peer mode? It's going to take some arguing and hand-wringing to convince others in our organization to put a 1.5 version out in front of the site despite the fact that haproxy is generally the most stable piece of software in our stack dev version or not. Are there any efforts to port peer mode into 1.4 soon? Thanks again for one of the most useful, fast, and stable tools that the community has come to rely so heavily on.
Re: clarification on peers and inclusion into 1.4 soon?
Hi David, On Mon, Apr 23, 2012 at 11:45:51AM -0700, David Birdsong wrote: Hi, I've got a situation where I need to update haproxy every 1-2 mins to apprise it of a new list of ip addresses to tarpit. I've rigged up a fairly hacky pipeline to detect scrapers on our site based on entries found X-Forwarded-For. To get around the fact the stick-table entries are only keyed off of protocols lower than http currently, I need to reload haproxy for every new IP address that I detect. It's ugly, but I've decided to reload haproxy on our site every 2 minutes. This means that all load balancing info is lost very frequently, maxconns per backend are reset, and during deploy time when we rely on slowstart to warm our backend's, we have to completely disable reloads of haproxy altogether which opens us up to heavy scraping for ~1-3 hours per day during our code deploy which makes it tough to differentiate between slowness induced by recent code changes and scrapers sucking up resources. I see, this is quite dirty. I'd love to not reload haproxy and let it learn about IP's to block internally, but my understanding is that IP addresses found at the HTTP level will not work their way into stick tables for some time. That's true for now and will possibly change in a few days, though the resulting code will have to be handled with some care ! Will peers help to maintain state inside of haproxy between graceful reloads? No, because counters are not synchronized over peers, only stickiness information is (eg: server id). Will connection counts to backends be maintained? Are any stats back populated to the new process? Similarly, there is no such thing either. Also, how is the stability of peer mode? It's going to take some arguing and hand-wringing to convince others in our organization to put a 1.5 version out in front of the site despite the fact that haproxy is generally the most stable piece of software in our stack dev version or not. well, I would see it two ways : - all fixes in stable are backported from -dev, so there is no bug that is fixed in stable but not in -dev - most new regressions are found implemented in -dev, so it regularly happens that we have some annoying bugs in -dev. Overall, -dev works fine when you wait a bit after a release, because I know that there are a number of users running it in production and for this reason if I spot problematic issues after a release, I emit a new one with these issues fixed. But I certainly understand why a number of people would feel uneasy with such a version in production, because precisely it's being actively developped and subject to bugs. Are there any efforts to port peer mode into 1.4 soon? No, there is no such plan and it would require a huge amount of changes that are 1.5-specific. However, there is something I've long wanted to be able to add, it's the ability to add/remove ACL patterns from the socket interface. I think it should exactly match your needs, and it should not be too much complex so that we could backport it into 1.4. I think that in your config you have an ACL which loads its IP addresses from an external file, something like this : acl scrapper hdr_ip(x-forwarded-for) -f scrappers.txt The idea is that we should be able to add/remove entries in this ACL. I even planned to be able to re-load the whole file over the CLI so that it's easier to maintain coherent. I remember there was something difficult in doing it but I don't precisely remember what. Do you think that it would solve your issues ? Regards, Willy