Re: clarification on peers and inclusion into 1.4 soon?

2012-07-18 Thread Willy Tarreau
Hi David,

On Thu, Jul 12, 2012 at 11:56:40AM -0700, David Birdsong wrote:
 On Tue, Apr 24, 2012 at 2:33 PM, David Birdsong
 david.birds...@gmail.com wrote:
 
 
  On Tue, Apr 24, 2012 at 2:18 PM, David Birdsong david.birds...@gmail.com
  wrote:
 
 
 
  On Tue, Apr 24, 2012 at 12:21 PM, Willy Tarreau w...@1wt.eu wrote:
 
  Hi David,
 
  On Tue, Apr 24, 2012 at 11:46:52AM -0700, David Birdsong wrote:
   i'm not seeing my response that swear i sent last night...
 
  I swear I didn't see it :-)
 
 
  so strange, gmail doesn't even have a saved draft to recall from.
 
 
 
 
   yes, this would solve our issues and would be very, very useful.
 
  OK, I'll have to think about it. Now I remember what the hard part
  was, an ACL is a list of possible expressions, and each expression
  supports multiple pattern sources. On the socket we could only reload
  one pattern source of one type at a time, of course. So the complexity
  comes in naming what we want to reload. For instance :
 
 
  oh, i forgot to ask if these future efforts could include support for other
  match types, like arbitrary string or regex comparison?
 
 
 
   acl bad_guys src 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3
   acl bad_guys hdr_ip(x-forwarded-for) 0.0.0.0/8 127.0.0.0/8
  224.0.0.0/3
   acl bad_guys hdr_ip(x-forwarded-for) -f manual.lst -f automatic.lst
 
  Now when trying to reload the bad_guys ACL, in fact we'd like to reload
  one
  of the files. Probably that we should find a way to name a specific
  expression
  (one ACL statement) that will have to be reloaded, I don't know. Or maybe
  in a
  first time we could reject reload requests for ACLs that have more than
  one
  statement.
 
 
  wow, yeah this could get tough to implement without adding some sort of
  name or keyword which i'm sure is a deal-breaker for backwards 
  compatibility
  reasons.
 
  would the operations be read/update/insert or read/replace? the latter
  might be easier to implement; we(the user) could be asked to supply the
  filename exactly as it was named in the configuration which would then be
  followed the new data(replace).
 
  i admit this is makes it an odd case where haproxy config on disk doesn't
  match the running state; where a restart would revert back to values found
  on disk--but perhaps haproxy has already gone down that road with weights
  being set'able over the socket and in the config file.
 
  I also remember that some pattern parsing errors were sent to stderr and
  will
  have to be disabled when fed this way. In summary, nothing terribly
  complex
  but nothing trivial either.
 
 
  is that because stderr is long gone after haproxy has deamonized? could
  you send parsing errors to syslog?
 
  have you ever considered implementing a ring buffer to log the way that
  varnish logs? it's easy to solve the 'never block on disk' problem by
  putting the buffer file in a tmpfs, and it would provide a place to send
  errors for these odd corner cases. i'm mostly thinking aloud here. i really
  do love that haproxy never touches the filesystem after it's up and 
  running.
 
 
  Regards,
  Willy
 
 
 
 
 Willy, have you given any cycles to including a support to add/remove
 acl's via the socket interface?
 
 I'm eagerly awaiting being able to inject/remove ip addresses to a
 continuously running haproxy.

No I haven't made progress, I'm around 500 mails late in my mbox and around
50 of them need a reply, I'm trying to balance work at customers with work
at my company and mail reading (around 1/3 each).

So no progress at the moment.

I'm considering taking holidays soon to catch up with e-mails soon and
hopefully to progress on the connection management in haproxy which is
much more important since I begun breaking it and can't continuously
work on it. I must absolutely finish before switching to anything else.

Can't promise more, I'm really sorry. Days are only 19 hours long :-(

Willy




Re: clarification on peers and inclusion into 1.4 soon?

2012-07-18 Thread David Birdsong
On Wed, Jul 18, 2012 at 3:01 PM, Willy Tarreau w...@1wt.eu wrote:
 Hi David,

 On Thu, Jul 12, 2012 at 11:56:40AM -0700, David Birdsong wrote:
 On Tue, Apr 24, 2012 at 2:33 PM, David Birdsong
 david.birds...@gmail.com wrote:
 
 
  On Tue, Apr 24, 2012 at 2:18 PM, David Birdsong david.birds...@gmail.com
  wrote:
 
 
 
  On Tue, Apr 24, 2012 at 12:21 PM, Willy Tarreau w...@1wt.eu wrote:
 
  Hi David,
 
  On Tue, Apr 24, 2012 at 11:46:52AM -0700, David Birdsong wrote:
   i'm not seeing my response that swear i sent last night...
 
  I swear I didn't see it :-)
 
 
  so strange, gmail doesn't even have a saved draft to recall from.
 
 
 
 
   yes, this would solve our issues and would be very, very useful.
 
  OK, I'll have to think about it. Now I remember what the hard part
  was, an ACL is a list of possible expressions, and each expression
  supports multiple pattern sources. On the socket we could only reload
  one pattern source of one type at a time, of course. So the complexity
  comes in naming what we want to reload. For instance :
 
 
  oh, i forgot to ask if these future efforts could include support for other
  match types, like arbitrary string or regex comparison?
 
 
 
   acl bad_guys src 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3
   acl bad_guys hdr_ip(x-forwarded-for) 0.0.0.0/8 127.0.0.0/8
  224.0.0.0/3
   acl bad_guys hdr_ip(x-forwarded-for) -f manual.lst -f automatic.lst
 
  Now when trying to reload the bad_guys ACL, in fact we'd like to reload
  one
  of the files. Probably that we should find a way to name a specific
  expression
  (one ACL statement) that will have to be reloaded, I don't know. Or maybe
  in a
  first time we could reject reload requests for ACLs that have more than
  one
  statement.
 
 
  wow, yeah this could get tough to implement without adding some sort of
  name or keyword which i'm sure is a deal-breaker for backwards 
  compatibility
  reasons.
 
  would the operations be read/update/insert or read/replace? the latter
  might be easier to implement; we(the user) could be asked to supply the
  filename exactly as it was named in the configuration which would then be
  followed the new data(replace).
 
  i admit this is makes it an odd case where haproxy config on disk doesn't
  match the running state; where a restart would revert back to values found
  on disk--but perhaps haproxy has already gone down that road with weights
  being set'able over the socket and in the config file.
 
  I also remember that some pattern parsing errors were sent to stderr and
  will
  have to be disabled when fed this way. In summary, nothing terribly
  complex
  but nothing trivial either.
 
 
  is that because stderr is long gone after haproxy has deamonized? could
  you send parsing errors to syslog?
 
  have you ever considered implementing a ring buffer to log the way that
  varnish logs? it's easy to solve the 'never block on disk' problem by
  putting the buffer file in a tmpfs, and it would provide a place to send
  errors for these odd corner cases. i'm mostly thinking aloud here. i 
  really
  do love that haproxy never touches the filesystem after it's up and 
  running.
 
 
  Regards,
  Willy
 
 
 

 Willy, have you given any cycles to including a support to add/remove
 acl's via the socket interface?

 I'm eagerly awaiting being able to inject/remove ip addresses to a
 continuously running haproxy.

 No I haven't made progress, I'm around 500 mails late in my mbox and around
 50 of them need a reply, I'm trying to balance work at customers with work
 at my company and mail reading (around 1/3 each).

 So no progress at the moment.

Ok, thanks for the update.


 I'm considering taking holidays soon to catch up with e-mails soon and
 hopefully to progress on the connection management in haproxy which is
 much more important since I begun breaking it and can't continuously
 work on it. I must absolutely finish before switching to anything else.


I totally understand. I've been working in start ups for a few years
and can empathize with the lack of time in the day to get to
everything.

I think for the time being I may take a look at ngx-resty and use
global lua tables which means an authorized POST can update the IP acl
list. If this is ever available in haproxy, I'd switch back in a
hearbeat.

 Can't promise more, I'm really sorry. Days are only 19 hours long :-(.

 Willy




Re: clarification on peers and inclusion into 1.4 soon?

2012-07-12 Thread David Birdsong
On Tue, Apr 24, 2012 at 2:33 PM, David Birdsong
david.birds...@gmail.com wrote:


 On Tue, Apr 24, 2012 at 2:18 PM, David Birdsong david.birds...@gmail.com
 wrote:



 On Tue, Apr 24, 2012 at 12:21 PM, Willy Tarreau w...@1wt.eu wrote:

 Hi David,

 On Tue, Apr 24, 2012 at 11:46:52AM -0700, David Birdsong wrote:
  i'm not seeing my response that swear i sent last night...

 I swear I didn't see it :-)


 so strange, gmail doesn't even have a saved draft to recall from.




  yes, this would solve our issues and would be very, very useful.

 OK, I'll have to think about it. Now I remember what the hard part
 was, an ACL is a list of possible expressions, and each expression
 supports multiple pattern sources. On the socket we could only reload
 one pattern source of one type at a time, of course. So the complexity
 comes in naming what we want to reload. For instance :


 oh, i forgot to ask if these future efforts could include support for other
 match types, like arbitrary string or regex comparison?



  acl bad_guys src 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3
  acl bad_guys hdr_ip(x-forwarded-for) 0.0.0.0/8 127.0.0.0/8
 224.0.0.0/3
  acl bad_guys hdr_ip(x-forwarded-for) -f manual.lst -f automatic.lst

 Now when trying to reload the bad_guys ACL, in fact we'd like to reload
 one
 of the files. Probably that we should find a way to name a specific
 expression
 (one ACL statement) that will have to be reloaded, I don't know. Or maybe
 in a
 first time we could reject reload requests for ACLs that have more than
 one
 statement.


 wow, yeah this could get tough to implement without adding some sort of
 name or keyword which i'm sure is a deal-breaker for backwards compatibility
 reasons.

 would the operations be read/update/insert or read/replace? the latter
 might be easier to implement; we(the user) could be asked to supply the
 filename exactly as it was named in the configuration which would then be
 followed the new data(replace).

 i admit this is makes it an odd case where haproxy config on disk doesn't
 match the running state; where a restart would revert back to values found
 on disk--but perhaps haproxy has already gone down that road with weights
 being set'able over the socket and in the config file.

 I also remember that some pattern parsing errors were sent to stderr and
 will
 have to be disabled when fed this way. In summary, nothing terribly
 complex
 but nothing trivial either.


 is that because stderr is long gone after haproxy has deamonized? could
 you send parsing errors to syslog?

 have you ever considered implementing a ring buffer to log the way that
 varnish logs? it's easy to solve the 'never block on disk' problem by
 putting the buffer file in a tmpfs, and it would provide a place to send
 errors for these odd corner cases. i'm mostly thinking aloud here. i really
 do love that haproxy never touches the filesystem after it's up and running.


 Regards,
 Willy




Willy, have you given any cycles to including a support to add/remove
acl's via the socket interface?

I'm eagerly awaiting being able to inject/remove ip addresses to a
continuously running haproxy.



Re: clarification on peers and inclusion into 1.4 soon?

2012-04-24 Thread David Birdsong
On Mon, Apr 23, 2012 at 5:07 PM, Kevin Heatwole ke...@heatwoles.us wrote:
 On Apr 23, 2012, at 7:31 PM, David Birdsong wrote:
 ...
 - nginx is already in front of haproxy, but nginx is not the first
 listener, so it sees the IP addresses as HTTP headers too. the last
 time I checked nginx only blocks IP addresses from layer 4
 connections. any other blocking would require nginx to compare the IP
 addresses as strings or regexes which I want to avoid doing on every
 single request. if the list grows long, every request suffers. ip
 comparison on long lists of IP's is one area where haproxy is the
 clear winner

 I'm no nginx expert, but I use the geo keyword to quickly search for banned 
 ups:

great idea, this could be very simple to get running.


    geo $is_banned_ip {
        default             0;
        include /www/nginx/banned_ips.conf;
    }

 where the banned_ips.conf lists the IPs followed by a 1 as in:

 X.X.X.X/32 1;
 X.X.X.Y/32 1;

 Then, inside the location, I simply test $is_banned_ip and reject those 
 requests, as in:

       location / {
            if ($is_banned_id) { return 401; }
            index  index.php index.html;
        }

 The geo keyword is primarily used for getting the geolocation of an IP so I 
 think you can have a pretty large list of IPs and still be very efficient.

 Kevin



 On Mon, Apr 23, 2012 at 2:48 PM, Kevin Heatwole ke...@heatwoles.us wrote:
 You might want to block the IPs before they get into haproxy.  Maybe put an 
 nginx reverse proxy in front of haproxy?  I use nginx to dynamically 
 block/allow HTTP requests by IP.   Another possibility, if you just need to 
 block a list of IPs would be to use a firewall/iptables in front of haproxy 
 to do the blocking.

 - nginx is already in front of haproxy, but nginx is not the first
 listener, so it sees the IP addresses as HTTP headers too. the last
 time I checked nginx only blocks IP addresses from layer 4
 connections. any other blocking would require nginx to compare the IP
 addresses as strings or regexes which I want to avoid doing on every
 single request. if the list grows long, every request suffers. ip
 comparison on long lists of IP's is one area where haproxy is the
 clear winner

 - iptables won't work either, iptables works on TCP/IP not HTTP

 i'd like to keep IP blocking in haproxy.

 On Apr 23, 2012, at 2:45 PM, David Birdsong wrote:

 Hi, I've got a situation where I need to update haproxy every  1-2
 mins to apprise it of a new list of ip addresses to tarpit.

 I've rigged up a fairly hacky pipeline to detect scrapers on our site
 based on entries found X-Forwarded-For. To get around the fact the
 stick-table entries are only keyed off of protocols lower than http
 currently, I need to reload haproxy for every new IP address that I
 detect. It's ugly, but I've decided to reload haproxy on our site
 every 2 minutes. This means that all load balancing info is lost very
 frequently, maxconns per backend are reset, and during deploy time
 when we rely on slowstart to warm our backend's, we have to completely
 disable reloads of haproxy altogether which opens us up to heavy
 scraping for ~1-3 hours per day during our code deploy which makes it
 tough to differentiate between slowness induced by recent code changes
 and scrapers sucking up resources.

 I'd love to not reload haproxy and let it learn about IP's to block
 internally, but my understanding is that IP addresses found at the
 HTTP level will not work their way into stick tables for some time.

 Will peers help to maintain state inside of haproxy between graceful
 reloads? Will connection counts to backends be maintained? Are any
 stats back populated to the new process?

 Also, how is the stability of peer mode? It's going to take some
 arguing and hand-wringing to convince others in our organization to
 put a 1.5 version out in front of the site despite the fact that
 haproxy is generally the most stable piece of software in our stack
 dev version or not. Are there any efforts to port peer mode into 1.4
 soon?

 Thanks again for one of the most useful, fast, and stable tools that
 the community has come to rely so heavily on.






RE: clarification on peers and inclusion into 1.4 soon?

2012-04-24 Thread Jens Dueholm Christensen (JEDC)
The map module is for comparing strings.
The geo module is for comparing IP's or multiple IP's in CIDR notation.

.. it took a while for me to figure this out.

Regards,
Jens Dueholm Christensen

-Original Message-
From: Aleksandar Lazic [mailto:al-hapr...@none.at] 
Sent: Tuesday, April 24, 2012 2:04 AM
To: haproxy@formilux.org
Subject: Re: clarification on peers and inclusion into 1.4 soon?

Dear David,

On 24-04-2012 01:31, David Birdsong wrote:
 On Mon, Apr 23, 2012 at 2:48 PM, Kevin Heatwole ke...@heatwoles.us 
 wrote:
 You might want to block the IPs before they get into haproxy.
 Maybe put an nginx reverse proxy in front of haproxy?
 I use nginx to dynamically block/allow HTTP requests by IP.
 Another possibility, if you just need to block a list of IPs would 
 be to
 use a firewall/iptables in front of haproxy to do the blocking.

  - nginx is already in front of haproxy, but nginx is not the first
 listener, so it sees the IP addresses as HTTP headers too. the last
 time I checked nginx only blocks IP addresses from layer 4
 connections. any other blocking would require nginx to compare the IP
 addresses as strings or regexes which I want to avoid doing on every
 single request. if the list grows long, every request suffers. ip
 comparison on long lists of IP's is one area where haproxy is the
 clear winner

Depend on the list size maybe you can use the map module from nginx.

http://nginx.org/en/docs/http/ngx_http_map_module.html

The map module can also handle regex matches.

For example:
http://serverfault.com/questions/316541/check-several-user-agent-in-nginx
http://www.ruby-forum.com/topic/2440219
http://redant.com.au/blog/manage-ssl-redirection-in-nginx-using-maps-and-save-the-universe/

 - iptables won't work either, iptables works on TCP/IP not HTTP

Depend of your iptables setup maybe you can use the string matching 
module.

http://spamcleaner.org/en/misc/w00tw00t.html

 i'd like to keep IP blocking in haproxy.

ok

BR
Aleks



Re: clarification on peers and inclusion into 1.4 soon?

2012-04-24 Thread David Birdsong
On Mon, Apr 23, 2012 at 10:47 PM, Willy Tarreau w...@1wt.eu wrote:
 Hi David,

 On Mon, Apr 23, 2012 at 11:45:51AM -0700, David Birdsong wrote:
 Hi, I've got a situation where I need to update haproxy every  1-2
 mins to apprise it of a new list of ip addresses to tarpit.

 I've rigged up a fairly hacky pipeline to detect scrapers on our site
 based on entries found X-Forwarded-For. To get around the fact the
 stick-table entries are only keyed off of protocols lower than http
 currently, I need to reload haproxy for every new IP address that I
 detect. It's ugly, but I've decided to reload haproxy on our site
 every 2 minutes. This means that all load balancing info is lost very
 frequently, maxconns per backend are reset, and during deploy time
 when we rely on slowstart to warm our backend's, we have to completely
 disable reloads of haproxy altogether which opens us up to heavy
 scraping for ~1-3 hours per day during our code deploy which makes it
 tough to differentiate between slowness induced by recent code changes
 and scrapers sucking up resources.

 I see, this is quite dirty.

 I'd love to not reload haproxy and let it learn about IP's to block
 internally, but my understanding is that IP addresses found at the
 HTTP level will not work their way into stick tables for some time.

 That's true for now and will possibly change in a few days, though
 the resulting code will have to be handled with some care !

 Will peers help to maintain state inside of haproxy between graceful
 reloads?

 No, because counters are not synchronized over peers, only stickiness
 information is (eg: server id).

 Will connection counts to backends be maintained? Are any
 stats back populated to the new process?

 Similarly, there is no such thing either.

 Also, how is the stability of peer mode? It's going to take some
 arguing and hand-wringing to convince others in our organization to
 put a 1.5 version out in front of the site despite the fact that
 haproxy is generally the most stable piece of software in our stack
 dev version or not.

 well, I would see it two ways :
  - all fixes in stable are backported from -dev, so there is no bug
    that is fixed in stable but not in -dev

  - most new regressions are found implemented in -dev, so it regularly
    happens that we have some annoying bugs in -dev.

 Overall, -dev works fine when you wait a bit after a release, because
 I know that there are a number of users running it in production and
 for this reason if I spot problematic issues after a release, I emit
 a new one with these issues fixed. But I certainly understand why a
 number of people would feel uneasy with such a version in production,
 because precisely it's being actively developped and subject to bugs.

 Are there any efforts to port peer mode into 1.4 soon?

 No, there is no such plan and it would require a huge amount of changes
 that are 1.5-specific.

 However, there is something I've long wanted to be able to add, it's
 the ability to add/remove ACL patterns from the socket interface. I
 think it should exactly match your needs, and it should not be too
 much complex so that we could backport it into 1.4.

 I think that in your config you have an ACL which loads its IP addresses
 from an external file, something like this :

    acl scrapper hdr_ip(x-forwarded-for) -f scrappers.txt

 The idea is that we should be able to add/remove entries in this ACL.
 I even planned to be able to re-load the whole file over the CLI so
 that it's easier to maintain coherent.

 I remember there was something difficult in doing it but I don't
 precisely remember what.

 Do you think that it would solve your issues ?


i'm not seeing my response that swear i sent last night...

yes, this would solve our issues and would be very, very useful.

 Regards,
 Willy




Re: clarification on peers and inclusion into 1.4 soon?

2012-04-24 Thread David Birdsong
On Tue, Apr 24, 2012 at 12:21 PM, Willy Tarreau w...@1wt.eu wrote:

 Hi David,

 On Tue, Apr 24, 2012 at 11:46:52AM -0700, David Birdsong wrote:
  i'm not seeing my response that swear i sent last night...

 I swear I didn't see it :-)


so strange, gmail doesn't even have a saved draft to recall from.




 yes, this would solve our issues and would be very, very useful.

 OK, I'll have to think about it. Now I remember what the hard part
 was, an ACL is a list of possible expressions, and each expression
 supports multiple pattern sources. On the socket we could only reload
 one pattern source of one type at a time, of course. So the complexity
 comes in naming what we want to reload. For instance :

  acl bad_guys src 0.0.0.0/8 127.0.0.0/8 224.0.0.0/3
  acl bad_guys hdr_ip(x-forwarded-for) 0.0.0.0/8 127.0.0.0/8
 224.0.0.0/3
  acl bad_guys hdr_ip(x-forwarded-for) -f manual.lst -f automatic.lst

 Now when trying to reload the bad_guys ACL, in fact we'd like to reload
 one
 of the files. Probably that we should find a way to name a specific
 expression
 (one ACL statement) that will have to be reloaded, I don't know. Or maybe
 in a
 first time we could reject reload requests for ACLs that have more than one
 statement.


wow, yeah this could get tough to implement without adding some sort of
name or keyword which i'm sure is a deal-breaker for backwards
compatibility reasons.

would the operations be read/update/insert or read/replace? the latter
might be easier to implement; we(the user) could be asked to supply the
filename exactly as it was named in the configuration which would then be
followed the new data(replace).

i admit this is makes it an odd case where haproxy config on disk doesn't
match the running state; where a restart would revert back to values found
on disk--but perhaps haproxy has already gone down that road with weights
being set'able over the socket and in the config file.

I also remember that some pattern parsing errors were sent to stderr and
 will
 have to be disabled when fed this way. In summary, nothing terribly complex
 but nothing trivial either.


is that because stderr is long gone after haproxy has deamonized? could you
send parsing errors to syslog?

have you ever considered implementing a ring buffer to log the way that
varnish logs? it's easy to solve the 'never block on disk' problem by
putting the buffer file in a tmpfs, and it would provide a place to send
errors for these odd corner cases. i'm mostly thinking aloud here. i really
do love that haproxy never touches the filesystem after it's up and running.


Regards,
 Willy




Re: clarification on peers and inclusion into 1.4 soon?

2012-04-23 Thread Kevin Heatwole
You might want to block the IPs before they get into haproxy.  Maybe put an 
nginx reverse proxy in front of haproxy?  I use nginx to dynamically 
block/allow HTTP requests by IP.   Another possibility, if you just need to 
block a list of IPs would be to use a firewall/iptables in front of haproxy to 
do the blocking.

On Apr 23, 2012, at 2:45 PM, David Birdsong wrote:

 Hi, I've got a situation where I need to update haproxy every  1-2
 mins to apprise it of a new list of ip addresses to tarpit.
 
 I've rigged up a fairly hacky pipeline to detect scrapers on our site
 based on entries found X-Forwarded-For. To get around the fact the
 stick-table entries are only keyed off of protocols lower than http
 currently, I need to reload haproxy for every new IP address that I
 detect. It's ugly, but I've decided to reload haproxy on our site
 every 2 minutes. This means that all load balancing info is lost very
 frequently, maxconns per backend are reset, and during deploy time
 when we rely on slowstart to warm our backend's, we have to completely
 disable reloads of haproxy altogether which opens us up to heavy
 scraping for ~1-3 hours per day during our code deploy which makes it
 tough to differentiate between slowness induced by recent code changes
 and scrapers sucking up resources.
 
 I'd love to not reload haproxy and let it learn about IP's to block
 internally, but my understanding is that IP addresses found at the
 HTTP level will not work their way into stick tables for some time.
 
 Will peers help to maintain state inside of haproxy between graceful
 reloads? Will connection counts to backends be maintained? Are any
 stats back populated to the new process?
 
 Also, how is the stability of peer mode? It's going to take some
 arguing and hand-wringing to convince others in our organization to
 put a 1.5 version out in front of the site despite the fact that
 haproxy is generally the most stable piece of software in our stack
 dev version or not. Are there any efforts to port peer mode into 1.4
 soon?
 
 Thanks again for one of the most useful, fast, and stable tools that
 the community has come to rely so heavily on.
 




Re: clarification on peers and inclusion into 1.4 soon?

2012-04-23 Thread David Birdsong
On Mon, Apr 23, 2012 at 2:48 PM, Kevin Heatwole ke...@heatwoles.us wrote:
 You might want to block the IPs before they get into haproxy.  Maybe put an 
 nginx reverse proxy in front of haproxy?  I use nginx to dynamically 
 block/allow HTTP requests by IP.   Another possibility, if you just need to 
 block a list of IPs would be to use a firewall/iptables in front of haproxy 
 to do the blocking.

 - nginx is already in front of haproxy, but nginx is not the first
listener, so it sees the IP addresses as HTTP headers too. the last
time I checked nginx only blocks IP addresses from layer 4
connections. any other blocking would require nginx to compare the IP
addresses as strings or regexes which I want to avoid doing on every
single request. if the list grows long, every request suffers. ip
comparison on long lists of IP's is one area where haproxy is the
clear winner

- iptables won't work either, iptables works on TCP/IP not HTTP

i'd like to keep IP blocking in haproxy.

 On Apr 23, 2012, at 2:45 PM, David Birdsong wrote:

 Hi, I've got a situation where I need to update haproxy every  1-2
 mins to apprise it of a new list of ip addresses to tarpit.

 I've rigged up a fairly hacky pipeline to detect scrapers on our site
 based on entries found X-Forwarded-For. To get around the fact the
 stick-table entries are only keyed off of protocols lower than http
 currently, I need to reload haproxy for every new IP address that I
 detect. It's ugly, but I've decided to reload haproxy on our site
 every 2 minutes. This means that all load balancing info is lost very
 frequently, maxconns per backend are reset, and during deploy time
 when we rely on slowstart to warm our backend's, we have to completely
 disable reloads of haproxy altogether which opens us up to heavy
 scraping for ~1-3 hours per day during our code deploy which makes it
 tough to differentiate between slowness induced by recent code changes
 and scrapers sucking up resources.

 I'd love to not reload haproxy and let it learn about IP's to block
 internally, but my understanding is that IP addresses found at the
 HTTP level will not work their way into stick tables for some time.

 Will peers help to maintain state inside of haproxy between graceful
 reloads? Will connection counts to backends be maintained? Are any
 stats back populated to the new process?

 Also, how is the stability of peer mode? It's going to take some
 arguing and hand-wringing to convince others in our organization to
 put a 1.5 version out in front of the site despite the fact that
 haproxy is generally the most stable piece of software in our stack
 dev version or not. Are there any efforts to port peer mode into 1.4
 soon?

 Thanks again for one of the most useful, fast, and stable tools that
 the community has come to rely so heavily on.





Re: clarification on peers and inclusion into 1.4 soon?

2012-04-23 Thread Aleksandar Lazic

Dear David,

On 24-04-2012 01:31, David Birdsong wrote:
On Mon, Apr 23, 2012 at 2:48 PM, Kevin Heatwole ke...@heatwoles.us 
wrote:

You might want to block the IPs before they get into haproxy.
Maybe put an nginx reverse proxy in front of haproxy?
I use nginx to dynamically block/allow HTTP requests by IP.
Another possibility, if you just need to block a list of IPs would 
be to

use a firewall/iptables in front of haproxy to do the blocking.


 - nginx is already in front of haproxy, but nginx is not the first
listener, so it sees the IP addresses as HTTP headers too. the last
time I checked nginx only blocks IP addresses from layer 4
connections. any other blocking would require nginx to compare the IP
addresses as strings or regexes which I want to avoid doing on every
single request. if the list grows long, every request suffers. ip
comparison on long lists of IP's is one area where haproxy is the
clear winner


Depend on the list size maybe you can use the map module from nginx.

http://nginx.org/en/docs/http/ngx_http_map_module.html

The map module can also handle regex matches.

For example:
http://serverfault.com/questions/316541/check-several-user-agent-in-nginx
http://www.ruby-forum.com/topic/2440219
http://redant.com.au/blog/manage-ssl-redirection-in-nginx-using-maps-and-save-the-universe/


- iptables won't work either, iptables works on TCP/IP not HTTP


Depend of your iptables setup maybe you can use the string matching 
module.


http://spamcleaner.org/en/misc/w00tw00t.html


i'd like to keep IP blocking in haproxy.


ok

BR
Aleks



Re: clarification on peers and inclusion into 1.4 soon?

2012-04-23 Thread Kevin Heatwole
On Apr 23, 2012, at 7:31 PM, David Birdsong wrote:
...
 - nginx is already in front of haproxy, but nginx is not the first
 listener, so it sees the IP addresses as HTTP headers too. the last
 time I checked nginx only blocks IP addresses from layer 4
 connections. any other blocking would require nginx to compare the IP
 addresses as strings or regexes which I want to avoid doing on every
 single request. if the list grows long, every request suffers. ip
 comparison on long lists of IP's is one area where haproxy is the
 clear winner

I'm no nginx expert, but I use the geo keyword to quickly search for banned ups:

geo $is_banned_ip {
default 0;
include /www/nginx/banned_ips.conf;
}  

where the banned_ips.conf lists the IPs followed by a 1 as in:

X.X.X.X/32 1;
X.X.X.Y/32 1;

Then, inside the location, I simply test $is_banned_ip and reject those 
requests, as in:

   location / {
if ($is_banned_id) { return 401; }
index  index.php index.html;
}

The geo keyword is primarily used for getting the geolocation of an IP so I 
think you can have a pretty large list of IPs and still be very efficient. 

Kevin



 On Mon, Apr 23, 2012 at 2:48 PM, Kevin Heatwole ke...@heatwoles.us wrote:
 You might want to block the IPs before they get into haproxy.  Maybe put an 
 nginx reverse proxy in front of haproxy?  I use nginx to dynamically 
 block/allow HTTP requests by IP.   Another possibility, if you just need to 
 block a list of IPs would be to use a firewall/iptables in front of haproxy 
 to do the blocking.
 
 - nginx is already in front of haproxy, but nginx is not the first
 listener, so it sees the IP addresses as HTTP headers too. the last
 time I checked nginx only blocks IP addresses from layer 4
 connections. any other blocking would require nginx to compare the IP
 addresses as strings or regexes which I want to avoid doing on every
 single request. if the list grows long, every request suffers. ip
 comparison on long lists of IP's is one area where haproxy is the
 clear winner
 
 - iptables won't work either, iptables works on TCP/IP not HTTP
 
 i'd like to keep IP blocking in haproxy.
 
 On Apr 23, 2012, at 2:45 PM, David Birdsong wrote:
 
 Hi, I've got a situation where I need to update haproxy every  1-2
 mins to apprise it of a new list of ip addresses to tarpit.
 
 I've rigged up a fairly hacky pipeline to detect scrapers on our site
 based on entries found X-Forwarded-For. To get around the fact the
 stick-table entries are only keyed off of protocols lower than http
 currently, I need to reload haproxy for every new IP address that I
 detect. It's ugly, but I've decided to reload haproxy on our site
 every 2 minutes. This means that all load balancing info is lost very
 frequently, maxconns per backend are reset, and during deploy time
 when we rely on slowstart to warm our backend's, we have to completely
 disable reloads of haproxy altogether which opens us up to heavy
 scraping for ~1-3 hours per day during our code deploy which makes it
 tough to differentiate between slowness induced by recent code changes
 and scrapers sucking up resources.
 
 I'd love to not reload haproxy and let it learn about IP's to block
 internally, but my understanding is that IP addresses found at the
 HTTP level will not work their way into stick tables for some time.
 
 Will peers help to maintain state inside of haproxy between graceful
 reloads? Will connection counts to backends be maintained? Are any
 stats back populated to the new process?
 
 Also, how is the stability of peer mode? It's going to take some
 arguing and hand-wringing to convince others in our organization to
 put a 1.5 version out in front of the site despite the fact that
 haproxy is generally the most stable piece of software in our stack
 dev version or not. Are there any efforts to port peer mode into 1.4
 soon?
 
 Thanks again for one of the most useful, fast, and stable tools that
 the community has come to rely so heavily on.
 
 




Re: clarification on peers and inclusion into 1.4 soon?

2012-04-23 Thread Willy Tarreau
Hi David,

On Mon, Apr 23, 2012 at 11:45:51AM -0700, David Birdsong wrote:
 Hi, I've got a situation where I need to update haproxy every  1-2
 mins to apprise it of a new list of ip addresses to tarpit.
 
 I've rigged up a fairly hacky pipeline to detect scrapers on our site
 based on entries found X-Forwarded-For. To get around the fact the
 stick-table entries are only keyed off of protocols lower than http
 currently, I need to reload haproxy for every new IP address that I
 detect. It's ugly, but I've decided to reload haproxy on our site
 every 2 minutes. This means that all load balancing info is lost very
 frequently, maxconns per backend are reset, and during deploy time
 when we rely on slowstart to warm our backend's, we have to completely
 disable reloads of haproxy altogether which opens us up to heavy
 scraping for ~1-3 hours per day during our code deploy which makes it
 tough to differentiate between slowness induced by recent code changes
 and scrapers sucking up resources.

I see, this is quite dirty.

 I'd love to not reload haproxy and let it learn about IP's to block
 internally, but my understanding is that IP addresses found at the
 HTTP level will not work their way into stick tables for some time.

That's true for now and will possibly change in a few days, though
the resulting code will have to be handled with some care !

 Will peers help to maintain state inside of haproxy between graceful
 reloads?

No, because counters are not synchronized over peers, only stickiness
information is (eg: server id).

 Will connection counts to backends be maintained? Are any
 stats back populated to the new process?

Similarly, there is no such thing either.

 Also, how is the stability of peer mode? It's going to take some
 arguing and hand-wringing to convince others in our organization to
 put a 1.5 version out in front of the site despite the fact that
 haproxy is generally the most stable piece of software in our stack
 dev version or not.

well, I would see it two ways :
  - all fixes in stable are backported from -dev, so there is no bug
that is fixed in stable but not in -dev

  - most new regressions are found implemented in -dev, so it regularly
happens that we have some annoying bugs in -dev.

Overall, -dev works fine when you wait a bit after a release, because
I know that there are a number of users running it in production and
for this reason if I spot problematic issues after a release, I emit
a new one with these issues fixed. But I certainly understand why a
number of people would feel uneasy with such a version in production,
because precisely it's being actively developped and subject to bugs.

 Are there any efforts to port peer mode into 1.4 soon?

No, there is no such plan and it would require a huge amount of changes
that are 1.5-specific.

However, there is something I've long wanted to be able to add, it's
the ability to add/remove ACL patterns from the socket interface. I
think it should exactly match your needs, and it should not be too
much complex so that we could backport it into 1.4.

I think that in your config you have an ACL which loads its IP addresses
from an external file, something like this :

acl scrapper hdr_ip(x-forwarded-for) -f scrappers.txt

The idea is that we should be able to add/remove entries in this ACL.
I even planned to be able to re-load the whole file over the CLI so
that it's easier to maintain coherent.

I remember there was something difficult in doing it but I don't
precisely remember what.

Do you think that it would solve your issues ?

Regards,
Willy