Re: http load balancing with pf (apache access log)
Hej Bob, Bob Beck schrieb: * Marian Hettwer [EMAIL PROTECTED] [2007-01-29 09:49]: Hi OpenBSD'lers, I'm about to use OpenBSD's pf(4) for load balancing some webservers. So far, everything is looking just perfect. Compared to pound, pf(4) is incredibly fast with few CPU and memory usage. So I'd say: Thats great :) However, one thing is bothering me. Obviously, my apache access logs on those load balanced machines can only show the IP address of my load balancer, not the real remote ip of the request. Completely untrue. if you are doing an rdr, it will change the destination IP, not the source IP Thats true so far... however, I was told by Stuart that the connections are going like this: quote requests go like this: origin - balancer - destination replies like this: destination - origin but they need to go like this so they can be un-rdr'ed: destination - balancer - origin I'm not certain whether it will help so I won't bother posting to misc@ now, but you could try adding a NAT rule in addition to the RDR. /quote Unless in *addition* to load balancing you are doing NAT. I do, which seems I have to. My boxes are some dedicated servers with a standard network configuration. Means, official IP address, some default gateway and off they go. However, I can't change the network configuration as those boxes are rented servers with no possibility to mess around with the network config. I'm not using NAT, my load balancer looks like this: web2# more /etc/pf/webmail_servers 142.244.12.130 142.244.12.132 142.244.12.133 142.244.12.134 142.244.12.135 142.244.12.136 142.244.12.137 142.244.12.138 142.244.12.139 142.244.12.140 pf.conf: table webmail_servers persist file /etc/pf/webmail_servers WEBMAIL_IP = {129.128.98.89} rdr pass on $ext_if proto tcp to $WEBMAIL_IP port 80 - webmail_servers port 8 0 round-robin sticky-address rdr pass on $ext_if proto tcp to $WEBMAIL_IP port 443 - webmail_servers port 443 round-robin sticky-address I get the real connection IP's in my apache log. That looks interesting. I wonder why I need NAT to get the communication working... strange... How are you webmail servers configured (in regards to networking) ? Regards, ./Marian
Re: http load balancing with pf (apache access log)
Henning Brauer schrieb: * Marian Hettwer [EMAIL PROTECTED] [2007-01-29 18:46]: Ah... there we go. I can't setup the webservers with their default gateway to my load balancer. The boxes are dedicated servers and I have no possibility to change the network settings. These are rented servers (dedicated boxes) at some cheap ISP and all they have is an official IP address. Changing the default gateway isn't possible... Sorry 'bout that. nothing you can d about it then. you get what you pay for... My bad... time to watch out for another ISP ;) It wasn't my decision to go with this cheap ISP (Strato), however, I'll have to live with it for the time being. ./Marian
Re: http load balancing with pf (apache access log)
Hej Stuart, Stuart Henderson schrieb: On 2007/01/29 16:21, Marian Hettwer wrote: Is there any possible way to get the real ip addresses in my apache access log? Readers who didn't see the earlier posts about setting this up, they're here: http://marc.theaimsgroup.com/?l=openbsd-miscm=116905272009036w=2 - it's not the standard setup with PF sitting directly on the route between client and webserver. That's the drawback to this method: in order to get that information you'd need to rearrange the network so the balancer is in the IP route between the webservers and the end users so you can skip the NATs. If moving to a more... flexible... ISP isn't an option, you may be able to do something with tunneling. You need to decide which method will suck the least in your situation. You're right. Both situations suck, but for now I'll have to go with that cheap ISP and therefor live with having a castrated access.log I'll buy me some security via mod_security on those remote apaches ;) (and of course, keep my fingers crossed that no bloody botnet tries to attack). Cheers, Marian
Re: http load balancing with pf (apache access log)
On Tue, Jan 30, 2007 at 09:09:46AM +0100, Marian Hettwer wrote: | quote | requests go like this: | origin - balancer - destination | | replies like this: | destination - origin This sounds a lot like what certain loadbalancers call DSR or Direct Server Return. Basically, this is layer 2 NAT'ing. Here's how it works : You configure outside interface of the loadbalancer with a VIP, which you also configure on lo0 on your webservers. The loadbalancer receives a request on VIP and selects one of the webservers as the destination (based on variable levels of intelligent selection methods). It now forwards the IP-packet as-is to this webserver, changing the destination MAC address in the Ethernet frame. This frame is picked up by the destination webserver (as it has the correct MAC address) and is acted upon by the IP layer (as the system has the VIP configured). The webserver processes the request and returns the answer directly to the origin, without going through the loadbalancer. This can be beneficial in certain circumstances where your webservers do more outgoing b/w than incoming. Say you have a big document store (where documents are your MP3-collection or a big library of (large) PDF's or whatnot) that you wish to serve over HTTP. Many of these requests will fit in a 100MB/s connection. Not quite as many answers fit in that same 100MB/s going back to the original requestor. Aggregating 10 webservers' 100MB/s you can fill a 1GB/s link with your loadbalancer and your webservers all at 100MB/s. This also gets you the IP address of the requestor in your weblogs. It would be cool if pf could support DSR. Since I'm not a programmer, I'll shut up now because I won't be producing patches anytime soon. Cheers, Paul 'WEiRD' de Weerd -- [++-]+++.+++[---].+++[+ +++-].++[-]+.--.[-] http://www.weirdnet.nl/ [demime 1.01d removed an attachment of type application/pgp-signature]
Re: http load balancing with pf (apache access log)
Seg, 2007-01-29 C s 09:54 -0700, Bob Beck escreveu: I'm not using NAT, my load balancer looks like this: web2# more /etc/pf/webmail_servers (...) pf.conf: table webmail_servers persist file /etc/pf/webmail_servers WEBMAIL_IP = {129.128.98.89} rdr pass on $ext_if proto tcp to $WEBMAIL_IP port 80 - webmail_servers port 8 0 round-robin sticky-address rdr pass on $ext_if proto tcp to $WEBMAIL_IP port 443 - webmail_servers port 443 round-robin sticky-address By the way, what do you use/recommend in order to manage the webserver pool? 1 test/min (in cron for instance) is too large a value for many use cases, so what would be best in your opinion? It's likely I'll need this for the near future and this thread basically cut my investigation time in over 90% ;) Regards, Rui -- + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]
Re: http load balancing with pf (apache access log)
On 2007/01/30 13:06, Rui Miguel Silva Seabra wrote: By the way, what do you use/recommend in order to manage the webserver pool? hoststated.
Re: http load balancing with pf (apache access log)
On Tue, 30 Jan 2007 13:06:00 + Rui Miguel Silva Seabra [EMAIL PROTECTED] wrote: By the way, what do you use/recommend in order to manage the webserver pool? 1 test/min (in cron for instance) is too large a value for many use cases, so what would be best in your opinion? It's likely I'll need this for the near future and this thread basically cut my investigation time in over 90% ;) Maybe hoststated can suit your needs. You will need to build it from source since it's not linked in right now. See http://spootnik.org/hoststated for more information
Re: http load balancing with pf (apache access log)
On Mon, Jan 29, 2007 at 05:36:12PM +0100, Marian Hettwer wrote: Pierre-Yves Ritschard schrieb: On Mon, 29 Jan 2007 17:20:50 +0100 Marian Hettwer [EMAIL PROTECTED] wrote: Which would mean, I send a SYN to my load balancer, which forwards the SYN to one of my webservers, and the webserver would send a SYN-ACK back to me. But my machine, obviously can't do anything with a SYN-ACK from an IP address it didn't even asked... The client would assume to get a SYN-ACK from the load balancer (which he asked...) understood? no you don't get it. I believe I do get it. But I missed an important information about my load balancing setup. See below. you setup your webservers with the load balancer as default gateway then use rdr as I described in my previous mail. hence all the traffic goes through the load-balancer and real client ips are preserved. Ah... there we go. I can't setup the webservers with their default gateway to my load balancer. The boxes are dedicated servers and I have no possibility to change the network settings. These are rented servers (dedicated boxes) at some cheap ISP and all they have is an official IP address. Changing the default gateway isn't possible... Sorry 'bout that. I'm fairly sure that sufficient abuse of pf can get the webservers to send all replies to traffic to port 80/443 to your loadbalancer. Of course, that's pf, and your webservers are Linux. But I would be surprised if something similar couldn't be arranged. Joachim
Re: http load balancing with pf (apache access log)
Ter, 2007-01-30 C s 14:25 +0100, Pierre-Yves Ritschard escreveu: On Tue, 30 Jan 2007 13:06:00 + Rui Miguel Silva Seabra [EMAIL PROTECTED] wrote: By the way, what do you use/recommend in order to manage the webserver pool? 1 test/min (in cron for instance) is too large a value for many use cases, so what would be best in your opinion? It's likely I'll need this for the near future and this thread basically cut my investigation time in over 90% ;) Maybe hoststated can suit your needs. You will need to build it from source since it's not linked in right now. See http://spootnik.org/hoststated for more information Promising, it does say that it's now part of the OpenBSD system, but sine when? CURRENT? I can't seem to find it in the 4.0 CD's... Rui -- + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]
Re: http load balancing with pf (apache access log)
On Tue, 30 Jan 2007 15:20:42 + Rui Miguel Silva Seabra [EMAIL PROTECTED] wrote: Ter, 2007-01-30 `s 14:25 +0100, Pierre-Yves Ritschard escreveu: On Tue, 30 Jan 2007 13:06:00 + Rui Miguel Silva Seabra [EMAIL PROTECTED] wrote: By the way, what do you use/recommend in order to manage the webserver pool? 1 test/min (in cron for instance) is too large a value for many use cases, so what would be best in your opinion? It's likely I'll need this for the near future and this thread basically cut my investigation time in over 90% ;) Maybe hoststated can suit your needs. You will need to build it from source since it's not linked in right now. See http://spootnik.org/hoststated for more information Promising, it does say that it's now part of the OpenBSD system, but sine when? CURRENT? I can't seem to find it in the 4.0 CD's... Rui Pending the link of hoststated in the builds you can follow the instructions i just put up on http://spootnik.org/hoststated#install .
Re: http load balancing with pf (apache access log)
Ter, 2007-01-30 C s 16:44 +0100, Pierre-Yves Ritschard escreveu: On Tue, 30 Jan 2007 15:20:42 + Rui Miguel Silva Seabra [EMAIL PROTECTED] wrote: Promising, it does say that it's now part of the OpenBSD system, but sine when? CURRENT? I can't seem to find it in the 4.0 CD's... Pending the link of hoststated in the builds you can follow the instructions i just put up on http://spootnik.org/hoststated#install . Yeah, thought so, well, one more item to the compile VM :) Thanks! -- + No matter how much you do, you never do enough -- unknown + Whatever you do will be insignificant, | but it is very important that you do it -- Gandhi + So let's do it...? [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]
http load balancing with pf (apache access log)
Hi OpenBSD'lers, I'm about to use OpenBSD's pf(4) for load balancing some webservers. So far, everything is looking just perfect. Compared to pound, pf(4) is incredibly fast with few CPU and memory usage. So I'd say: Thats great :) However, one thing is bothering me. Obviously, my apache access logs on those load balanced machines can only show the IP address of my load balancer, not the real remote ip of the request. This is, to my knowledge, due to the fact that pf(4) is working on the TCP layer and is doing NAT. Is there any possible way to get the real ip addresses in my apache access log? I do need them for several reasons. - I'd like to see who's actually accessing the website - If there's some botnet attack, usually I'm using pf(4) to block the offending IP's for a specific time period. This can't be done if all I can see is the load balancers IP address. That's by any means not good and I'm thinking wether this could be a no-go for using pf as a load balancer :-( - web statistics: do look pretty bad too... Uh, see, there's only one user on our website *argh* Okay... anybody with any usable suggestions? There's the X-Forwarded-to Information in a http header, which can be set via some software load balancers. However, those are operating on the application layer, which pf isn't... too bad. best regards, ./Marian
Re: http load balancing with pf (apache access log)
On Mon, 29 Jan 2007 16:21:13 +0100 Marian Hettwer [EMAIL PROTECTED] wrote: However, one thing is bothering me. Obviously, my apache access logs on those load balanced machines can only show the IP address of my load balancer, not the real remote ip of the request. Why are you rewriting the source address ? A typical rule for redirecting web traffic would be: rdr on $ext0 from any to $www port 80 - webservers This rewrite the destination address, not the source. Your apache logs are the same than they would have been had you been directly reachable.
Re: http load balancing with pf (apache access log)
Marian Hettwer wrote: Hi OpenBSD'lers, I'm about to use OpenBSD's pf(4) for load balancing some webservers. So far, everything is looking just perfect. Compared to pound, pf(4) is incredibly fast with few CPU and memory usage. So I'd say: Thats great :) However, one thing is bothering me. Obviously, my apache access logs on those load balanced machines can only show the IP address of my load balancer, not the real remote ip of the request. This is, to my knowledge, due to the fact that pf(4) is working on the TCP layer and is doing NAT. Is there any possible way to get the real ip addresses in my apache access log? I do need them for several reasons. - I'd like to see who's actually accessing the website - If there's some botnet attack, usually I'm using pf(4) to block the offending IP's for a specific time period. This can't be done if all I can see is the load balancers IP address. That's by any means not good and I'm thinking wether this could be a no-go for using pf as a load balancer :-( - web statistics: do look pretty bad too... Uh, see, there's only one user on our website *argh* Okay... anybody with any usable suggestions? There's the X-Forwarded-to Information in a http header, which can be set via some software load balancers. However, those are operating on the application layer, which pf isn't... too bad. Uhmm... Why don't use carp(4). I think it will suit you well. -- With best regards, Gregory Edigarov
Re: http load balancing with pf (apache access log)
Marian Hettwer wrote: However, one thing is bothering me. Obviously, my apache access logs on those load balanced machines can only show the IP address of my load balancer, not the real remote ip of the request. This is, to my knowledge, due to the fact that pf(4) is working on the TCP layer and is doing NAT. Is there any possible way to get the real ip addresses in my apache access log? I don't know what you did for that balancing but surely you're doing it wrong. Take a look at the FAQ at http://www.openbsd.org/faq/pf/pools.html#incoming rdr just changes the destination address of the packets, not the source address.
Re: http load balancing with pf (apache access log)
On Mon, 29 Jan 2007 17:20:50 +0100 Marian Hettwer [EMAIL PROTECTED] wrote: Which would mean, I send a SYN to my load balancer, which forwards the SYN to one of my webservers, and the webserver would send a SYN-ACK back to me. But my machine, obviously can't do anything with a SYN-ACK from an IP address it didn't even asked... The client would assume to get a SYN-ACK from the load balancer (which he asked...) understood? no you don't get it. you setup your webservers with the load balancer as default gateway then use rdr as I described in my previous mail. hence all the traffic goes through the load-balancer and real client ips are preserved.
Re: http load balancing with pf (apache access log)
On 2007/01/29 16:21, Marian Hettwer wrote: Is there any possible way to get the real ip addresses in my apache access log? Readers who didn't see the earlier posts about setting this up, they're here: http://marc.theaimsgroup.com/?l=openbsd-miscm=116905272009036w=2 - it's not the standard setup with PF sitting directly on the route between client and webserver. That's the drawback to this method: in order to get that information you'd need to rearrange the network so the balancer is in the IP route between the webservers and the end users so you can skip the NATs. If moving to a more... flexible... ISP isn't an option, you may be able to do something with tunneling. You need to decide which method will suck the least in your situation.
Re: http load balancing with pf (apache access log)
Hej Berk, Berk D. Demir schrieb: Marian Hettwer wrote: However, one thing is bothering me. Obviously, my apache access logs on those load balanced machines can only show the IP address of my load balancer, not the real remote ip of the request. This is, to my knowledge, due to the fact that pf(4) is working on the TCP layer and is doing NAT. Is there any possible way to get the real ip addresses in my apache access log? I don't know what you did for that balancing but surely you're doing it wrong. Take a look at the FAQ at http://www.openbsd.org/faq/pf/pools.html#incoming rdr just changes the destination address of the packets, not the source address. Well, what I did was actually this: ext_if=fxp0 web_servers = { 193.99.144.85,66.135.208.93 } #int_if=int0 set skip on lo scrub in nat on $ext_if proto tcp from !($ext_if) to $web_servers port 80 - ($ext_if) rdr on $ext_if proto tcp from any to any port 80 - $web_servers \ round-robin sticky-address And it seems that I need NAT, otherwise the communication wouldn't work... see my eMails from 18.01.2007 cheers, ./Marian
Re: http load balancing with pf (apache access log)
Pierre-Yves Ritschard schrieb: On Mon, 29 Jan 2007 17:34:51 +0100 Marian Hettwer [EMAIL PROTECTED] wrote: You could also do an ugly hack which would consist of attaching a second network on your servers and load balancers (provided they are in the same (v)?lan) like 172.16.1.0/24 and use that for contacting the real, then you'll need to lookup another routing table when being contacted on the 172.16.1.0/24 network (using pf + alternate routing tables in openbsd or iproute2 in linux). Otherwise you're stuck with nat. Nah, can't do that... It looks like I'm stuck with NAT. And therefor stuck with the load balancers IP address in my access.log, right? too bad... ./Marian
Re: http load balancing with pf (apache access log)
* Marian Hettwer [EMAIL PROTECTED] [2007-01-29 09:49]: Hi OpenBSD'lers, I'm about to use OpenBSD's pf(4) for load balancing some webservers. So far, everything is looking just perfect. Compared to pound, pf(4) is incredibly fast with few CPU and memory usage. So I'd say: Thats great :) However, one thing is bothering me. Obviously, my apache access logs on those load balanced machines can only show the IP address of my load balancer, not the real remote ip of the request. Completely untrue. if you are doing an rdr, it will change the destination IP, not the source IP Unless in *addition* to load balancing you are doing NAT. I'm not using NAT, my load balancer looks like this: web2# more /etc/pf/webmail_servers 142.244.12.130 142.244.12.132 142.244.12.133 142.244.12.134 142.244.12.135 142.244.12.136 142.244.12.137 142.244.12.138 142.244.12.139 142.244.12.140 pf.conf: table webmail_servers persist file /etc/pf/webmail_servers WEBMAIL_IP = {129.128.98.89} rdr pass on $ext_if proto tcp to $WEBMAIL_IP port 80 - webmail_servers port 8 0 round-robin sticky-address rdr pass on $ext_if proto tcp to $WEBMAIL_IP port 443 - webmail_servers port 443 round-robin sticky-address I get the real connection IP's in my apache log.
Re: http load balancing with pf (apache access log)
Hi, Pierre-Yves Ritschard schrieb: On Mon, 29 Jan 2007 16:21:13 +0100 Marian Hettwer [EMAIL PROTECTED] wrote: However, one thing is bothering me. Obviously, my apache access logs on those load balanced machines can only show the IP address of my load balancer, not the real remote ip of the request. Why are you rewriting the source address ? A typical rule for redirecting web traffic would be: rdr on $ext0 from any to $www port 80 - webservers that's true, but then the communication would look like this: client -- load balancer -- webserver webserver -- client Which would mean, I send a SYN to my load balancer, which forwards the SYN to one of my webservers, and the webserver would send a SYN-ACK back to me. But my machine, obviously can't do anything with a SYN-ACK from an IP address it didn't even asked... The client would assume to get a SYN-ACK from the load balancer (which he asked...) understood? This rewrite the destination address, not the source. I know. But I have to use NAT... Your apache logs are the same than they would have been had you been directly reachable. Would be the same, yip... regards, ./Marian
Re: http load balancing with pf (apache access log)
On Mon, 29 Jan 2007 17:34:51 +0100 Marian Hettwer [EMAIL PROTECTED] wrote: Pierre-Yves Ritschard schrieb: On Mon, 29 Jan 2007 17:20:50 +0100 Marian Hettwer [EMAIL PROTECTED] wrote: Which would mean, I send a SYN to my load balancer, which forwards the SYN to one of my webservers, and the webserver would send a SYN-ACK back to me. But my machine, obviously can't do anything with a SYN-ACK from an IP address it didn't even asked... The client would assume to get a SYN-ACK from the load balancer (which he asked...) understood? no you don't get it. you setup your webservers with IPs whose default gateway is the load-balancer, then use rdr, that's how its done hence all the traffic goes through the load-balancer and real client ips are preserved. Ah... there we go. I can't setup the webservers with their default gateway to my load balancer. The boxes are scattered dedicated servers and I have no possibility to change the network settings. These are rented servers (dedicated boxes) at some cheap ISP and all they have is an official IP address. Changing the default gateway isn't possible... Sorry 'bout that. ./Marian You could also do an ugly hack which would consist of attaching a second network on your servers and load balancers (provided they are in the same (v)?lan) like 172.16.1.0/24 and use that for contacting the real, then you'll need to lookup another routing table when being contacted on the 172.16.1.0/24 network (using pf + alternate routing tables in openbsd or iproute2 in linux). Otherwise you're stuck with nat.
Re: http load balancing with pf (apache access log)
Pierre-Yves Ritschard schrieb: On Mon, 29 Jan 2007 17:20:50 +0100 Marian Hettwer [EMAIL PROTECTED] wrote: Which would mean, I send a SYN to my load balancer, which forwards the SYN to one of my webservers, and the webserver would send a SYN-ACK back to me. But my machine, obviously can't do anything with a SYN-ACK from an IP address it didn't even asked... The client would assume to get a SYN-ACK from the load balancer (which he asked...) understood? no you don't get it. I believe I do get it. But I missed an important information about my load balancing setup. See below. you setup your webservers with the load balancer as default gateway then use rdr as I described in my previous mail. hence all the traffic goes through the load-balancer and real client ips are preserved. Ah... there we go. I can't setup the webservers with their default gateway to my load balancer. The boxes are dedicated servers and I have no possibility to change the network settings. These are rented servers (dedicated boxes) at some cheap ISP and all they have is an official IP address. Changing the default gateway isn't possible... Sorry 'bout that. ./Marian
Re: http load balancing with pf (apache access log)
* Marian Hettwer [EMAIL PROTECTED] [2007-01-29 18:46]: Pierre-Yves Ritschard schrieb: On Mon, 29 Jan 2007 17:20:50 +0100 Marian Hettwer [EMAIL PROTECTED] wrote: Which would mean, I send a SYN to my load balancer, which forwards the SYN to one of my webservers, and the webserver would send a SYN-ACK back to me. But my machine, obviously can't do anything with a SYN-ACK from an IP address it didn't even asked... The client would assume to get a SYN-ACK from the load balancer (which he asked...) understood? no you don't get it. I believe I do get it. But I missed an important information about my load balancing setup. See below. you setup your webservers with the load balancer as default gateway then use rdr as I described in my previous mail. hence all the traffic goes through the load-balancer and real client ips are preserved. Ah... there we go. I can't setup the webservers with their default gateway to my load balancer. The boxes are dedicated servers and I have no possibility to change the network settings. These are rented servers (dedicated boxes) at some cheap ISP and all they have is an official IP address. Changing the default gateway isn't possible... Sorry 'bout that. nothing you can d about it then. you get what you pay for... -- Henning Brauer, [EMAIL PROTECTED], [EMAIL PROTECTED] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting - Hamburg Amsterdam