Re: Fwd: Site running slow

2010-02-12 Thread Willy Tarreau
On Fri, Feb 12, 2010 at 04:17:21PM +0100, Peter Griffin wrote:
> Hi guys,
> Just an update... had the same problem and was ordered to remove haproxy and
> install LVS with CentOs.  When I went on the console I saw lots of Conntrack
> messages and Dropped packet messages so I'm not sure whether some tuning
> wold have in fact solved the problem.

yes indeed it would have solved it. I bet you haven't tuned it at all,
so it's tuned as a workstation with very little session counts. You
should definitely either remove any conntrack module or tune it appropriately
(meaning that you should set the conntrack_max value very high, several hundred
thousands, and the hash size to approxy 1/16 to 1/4 of the conntrack_max). It's
useful to reduce the conntrack timeouts too, as most of the time they are
extremely high (eg: 5 days for established sessions, 120 seconds for TIME_WAIT,
both of which are too large for moderate to high traffic sites).

Regards,
Willy




Re: Fwd: Site running slow

2010-02-12 Thread Peter Griffin
Hi guys,
Just an update... had the same problem and was ordered to remove haproxy and
install LVS with CentOs.  When I went on the console I saw lots of Conntrack
messages and Dropped packet messages so I'm not sure whether some tuning
wold have in fact solved the problem.

Just wanted to let you know that the SeLinux disabling did not do the trick.

Thank you for your help  though.

On 8 February 2010 19:12, Peter Griffin  wrote:

> Hi there,
> I put the LB back live and it's been going strong for some 5 hours.  I
> established connections and grepped ip addresses using the netstat -antpoe
> command so see whether connections were lingering on and am happy to say
> that everything seems to be behaving normally.
>
> We haven't had loads of traffic though so the real test is the weekend
> thank you lots guys
>
>
> On 7 February 2010 11:37, Hank A. Paulson 
> wrote:
>
>> I don't know if those will solve the problem (I doubt they will), but if
>> you put the machine back into the traffic stream - try to get a few outputs
>> if things are going badly:
>>
>> * stats output from haproxy (socket or web page, pref socket)
>> * netstat -antpoe output
>> * netstat -s output
>> * free -m output
>> * haproxy http logs
>> * iptables config output, if any
>> * be sure to have a tail -f /var/log/messages running before you start the
>> test to watch for conntrack and other messages
>>
>> That will provide clues to what may be the problem(s).
>> Others will probably have ideas of other things to look for/capture while
>> trying the configuration.
>>
>>
>> On 2/7/10 2:20 AM, Peter Griffin wrote:
>>
>>> Hi there,
>>> Ok I disabled selinux and increased check inter to 30s.  I enabled an
>>> http check of an asphx file because ASP is critical to the operation of
>>> the site.  It was already there but I disabled it earlier because of the
>>> problems we were having:
>>> option httpchk HEAD /testip.ashx HTTP/1.1\r\nHost:\ www.oursite.com
>>> 
>>>
>>>
>>> With regards to free, I'm ashamed to say that yes I did go after the
>>> first line.
>>>
>>
>> It happens to people who claim to be very linux savvy, so don't worry
>> about it.
>>
>>
>> I also did a yum upgrade but will postpone 1.4rc1 until I
>>
>>> see how this change responds.  Will put the LB back online when the
>>> traffic is not that heavy as I cannot risk another outage and hence my
>>> job :)
>>>
>>> Will post a reply tomorrow afternoon.
>>>
>>> Thank you so much you've been great.
>>>
>>>
>>>
>>>
>>>
>>> On 7 February 2010 02:06, Hank A. Paulson >> > wrote:
>>>
>>>You have selinux on, so it may be unhappy with some part of haproxy
>>>- the directory it uses, the socket listeners, etc. Turn it off (if
>>>you can) until you get everything working ok. Turning it off
>>>requires a reboot.
>>>
>>>To see if it is on:
>>># sestatus
>>>google for how to turn it off
>>>
>>>I would back off the check inter to 30s or so and make it an http
>>>check of a file that you know exists, if you can have any static
>>>files on your servers. This will allow you to see that haproxy is
>>>able to find that file, get a 200 response and verify that the
>>>server is up.
>>>
>>>Also, when you say "free mem going down to 45Mb" are you looking at
>>>the first line of "free" or the second line? Ignore the first line,
>>>it is designed to cause panic. eg:
>>>
>>>$ free -m
>>> total   used   free sharedbuffers
>>>cached
>>>Mem: 32244  32069174  0  0
>>>  19578
>>>-/+ buffers/cache:  12490  19753
>>>Swap: 4095  0   4095
>>>
>>>OMG, I only have 174MB of my 32GB of memory available!?!
>>>- no, really 19.75 GB is still available.
>>>
>>>On your haproxy config, if you log errors separately then you can
>>>tail -f that error-only log and watch it as you start up haproxy.
>>>And why not do http logging if you are doing http mode? Maybe I am
>>>missing something.
>>>
>>>I would back off the check inter to 30s or so and make it an http
>>>check of a file that you know exists, if you can have any static
>>>files on your servers. This will allow you to see that haproxy is
>>>able to find that file, get a 200 response and verify that the
>>>server is really is up and responding fully, not just opening a
>>>socket. If you can switch to 1.4rc1 then you get alot more info
>>>about the health check/health status on the stats page and you can
>>>do set log-health-checks as an addition aid to troubleshooting.
>>>
>>>
>>>global
>>>log 127.0.0.1   local0
>>>log 127.0.0.1   local1 notice
>>>#log loghostlocal0 info
>>>option   log-separate-errors
>>>
>>>maxconn 4096
>>>chroot /var/lib/haproxy
>>>user haproxy
>>>group

Re: Fwd: Site running slow

2010-02-08 Thread Peter Griffin
Hi there,
I put the LB back live and it's been going strong for some 5 hours.  I
established connections and grepped ip addresses using the netstat -antpoe
command so see whether connections were lingering on and am happy to say
that everything seems to be behaving normally.

We haven't had loads of traffic though so the real test is the weekend
thank you lots guys

On 7 February 2010 11:37, Hank A. Paulson wrote:

> I don't know if those will solve the problem (I doubt they will), but if
> you put the machine back into the traffic stream - try to get a few outputs
> if things are going badly:
>
> * stats output from haproxy (socket or web page, pref socket)
> * netstat -antpoe output
> * netstat -s output
> * free -m output
> * haproxy http logs
> * iptables config output, if any
> * be sure to have a tail -f /var/log/messages running before you start the
> test to watch for conntrack and other messages
>
> That will provide clues to what may be the problem(s).
> Others will probably have ideas of other things to look for/capture while
> trying the configuration.
>
>
> On 2/7/10 2:20 AM, Peter Griffin wrote:
>
>> Hi there,
>> Ok I disabled selinux and increased check inter to 30s.  I enabled an
>> http check of an asphx file because ASP is critical to the operation of
>> the site.  It was already there but I disabled it earlier because of the
>> problems we were having:
>> option httpchk HEAD /testip.ashx HTTP/1.1\r\nHost:\ www.oursite.com
>> 
>>
>>
>> With regards to free, I'm ashamed to say that yes I did go after the
>> first line.
>>
>
> It happens to people who claim to be very linux savvy, so don't worry about
> it.
>
>
> I also did a yum upgrade but will postpone 1.4rc1 until I
>
>> see how this change responds.  Will put the LB back online when the
>> traffic is not that heavy as I cannot risk another outage and hence my
>> job :)
>>
>> Will post a reply tomorrow afternoon.
>>
>> Thank you so much you've been great.
>>
>>
>>
>>
>>
>> On 7 February 2010 02:06, Hank A. Paulson > > wrote:
>>
>>You have selinux on, so it may be unhappy with some part of haproxy
>>- the directory it uses, the socket listeners, etc. Turn it off (if
>>you can) until you get everything working ok. Turning it off
>>requires a reboot.
>>
>>To see if it is on:
>># sestatus
>>google for how to turn it off
>>
>>I would back off the check inter to 30s or so and make it an http
>>check of a file that you know exists, if you can have any static
>>files on your servers. This will allow you to see that haproxy is
>>able to find that file, get a 200 response and verify that the
>>server is up.
>>
>>Also, when you say "free mem going down to 45Mb" are you looking at
>>the first line of "free" or the second line? Ignore the first line,
>>it is designed to cause panic. eg:
>>
>>$ free -m
>> total   used   free sharedbuffers
>>cached
>>Mem: 32244  32069174  0  0
>>  19578
>>-/+ buffers/cache:  12490  19753
>>Swap: 4095  0   4095
>>
>>OMG, I only have 174MB of my 32GB of memory available!?!
>>- no, really 19.75 GB is still available.
>>
>>On your haproxy config, if you log errors separately then you can
>>tail -f that error-only log and watch it as you start up haproxy.
>>And why not do http logging if you are doing http mode? Maybe I am
>>missing something.
>>
>>I would back off the check inter to 30s or so and make it an http
>>check of a file that you know exists, if you can have any static
>>files on your servers. This will allow you to see that haproxy is
>>able to find that file, get a 200 response and verify that the
>>server is really is up and responding fully, not just opening a
>>socket. If you can switch to 1.4rc1 then you get alot more info
>>about the health check/health status on the stats page and you can
>>do set log-health-checks as an addition aid to troubleshooting.
>>
>>
>>global
>>log 127.0.0.1   local0
>>log 127.0.0.1   local1 notice
>>#log loghostlocal0 info
>>option   log-separate-errors
>>
>>maxconn 4096
>>chroot /var/lib/haproxy
>>user haproxy
>>group haproxy
>>daemon
>>#   debug
>>#quiet
>>
>>defaults
>>log global
>>modehttp
>>#   option  httplog
>>option  dontlognull
>>retries 3
>>option redispatch
>>maxconn 4096
>>contimeout  5s
>>clitimeout  30s
>>srvtimeout  30s
>>
>>
>>listen loadbalancer :80
>>mode http
>>balance roundrobin
>>option forwardfor except 10.0.1.50
>>

Re: Fwd: Site running slow

2010-02-07 Thread Hank A. Paulson
I don't know if those will solve the problem (I doubt they will), but if you 
put the machine back into the traffic stream - try to get a few outputs if 
things are going badly:


* stats output from haproxy (socket or web page, pref socket)
* netstat -antpoe output
* netstat -s output
* free -m output
* haproxy http logs
* iptables config output, if any
* be sure to have a tail -f /var/log/messages running before you start the 
test to watch for conntrack and other messages


That will provide clues to what may be the problem(s).
Others will probably have ideas of other things to look for/capture while 
trying the configuration.


On 2/7/10 2:20 AM, Peter Griffin wrote:

Hi there,
Ok I disabled selinux and increased check inter to 30s.  I enabled an
http check of an asphx file because ASP is critical to the operation of
the site.  It was already there but I disabled it earlier because of the
problems we were having:
option httpchk HEAD /testip.ashx HTTP/1.1\r\nHost:\ www.oursite.com


With regards to free, I'm ashamed to say that yes I did go after the
first line.


It happens to people who claim to be very linux savvy, so don't worry about it.

I also did a yum upgrade but will postpone 1.4rc1 until I

see how this change responds.  Will put the LB back online when the
traffic is not that heavy as I cannot risk another outage and hence my
job :)

Will post a reply tomorrow afternoon.

Thank you so much you've been great.





On 7 February 2010 02:06, Hank A. Paulson mailto:h...@spamproof.nospammail.net>> wrote:

You have selinux on, so it may be unhappy with some part of haproxy
- the directory it uses, the socket listeners, etc. Turn it off (if
you can) until you get everything working ok. Turning it off
requires a reboot.

To see if it is on:
# sestatus
google for how to turn it off

I would back off the check inter to 30s or so and make it an http
check of a file that you know exists, if you can have any static
files on your servers. This will allow you to see that haproxy is
able to find that file, get a 200 response and verify that the
server is up.

Also, when you say "free mem going down to 45Mb" are you looking at
the first line of "free" or the second line? Ignore the first line,
it is designed to cause panic. eg:

$ free -m
 total   used   free sharedbuffers
cached
Mem: 32244  32069174  0  0
  19578
-/+ buffers/cache:  12490  19753
Swap: 4095  0   4095

OMG, I only have 174MB of my 32GB of memory available!?!
- no, really 19.75 GB is still available.

On your haproxy config, if you log errors separately then you can
tail -f that error-only log and watch it as you start up haproxy.
And why not do http logging if you are doing http mode? Maybe I am
missing something.

I would back off the check inter to 30s or so and make it an http
check of a file that you know exists, if you can have any static
files on your servers. This will allow you to see that haproxy is
able to find that file, get a 200 response and verify that the
server is really is up and responding fully, not just opening a
socket. If you can switch to 1.4rc1 then you get alot more info
about the health check/health status on the stats page and you can
do set log-health-checks as an addition aid to troubleshooting.


global
log 127.0.0.1   local0
log 127.0.0.1   local1 notice
#log loghostlocal0 info
option   log-separate-errors

maxconn 4096
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
#   debug
#quiet

defaults
log global
modehttp
#   option  httplog
option  dontlognull
retries 3
option redispatch
maxconn 4096
contimeout  5s
clitimeout  30s
srvtimeout  30s


listen loadbalancer :80
mode http
balance roundrobin
option forwardfor except 10.0.1.50
option httpclose
option httplog
option httpchk HEAD /favicon.ico

cookie SERVERID insert indirect nocache
server WEB01 10.0.1.108:80 
cookie A check inter 30s
server WEB05 10.0.1.109:80 
cookie B check inter 30s


listen statistics 10.0.1.50:8080 
stats enable
stats auth stats:stats
stats uri /

[BTW, Did you do a yum upgrade - not yum update after your install
of F12?, "yum update" misses certain kinds of packaging changes,
"yum upgrade" covers

Re: Fwd: Site running slow

2010-02-07 Thread Peter Griffin
Hi there,
Ok I disabled selinux and increased check inter to 30s.  I enabled an http
check of an asphx file because ASP is critical to the operation of the
site.  It was already there but I disabled it earlier because of the
problems we were having:
option httpchk HEAD /testip.ashx HTTP/1.1\r\nHost:\ www.oursite.com

With regards to free, I'm ashamed to say that yes I did go after the first
line.  I also did a yum upgrade but will postpone 1.4rc1 until I see how
this change responds.  Will put the LB back online when the traffic is not
that heavy as I cannot risk another outage and hence my job :)

Will post a reply tomorrow afternoon.

Thank you so much you've been great.





On 7 February 2010 02:06, Hank A. Paulson wrote:

> You have selinux on, so it may be unhappy with some part of haproxy - the
> directory it uses, the socket listeners, etc. Turn it off (if you can) until
> you get everything working ok. Turning it off requires a reboot.
>
> To see if it is on:
> # sestatus
> google for how to turn it off
>
> I would back off the check inter to 30s or so and make it an http check of
> a file that you know exists, if you can have any static files on your
> servers. This will allow you to see that haproxy is able to find that file,
> get a 200 response and verify that the server is up.
>
> Also, when you say "free mem going down to 45Mb" are you looking at the
> first line of "free" or the second line? Ignore the first line, it is
> designed to cause panic. eg:
>
> $ free -m
> total   used   free sharedbuffers cached
> Mem: 32244  32069174  0  0  19578
> -/+ buffers/cache:  12490  19753
> Swap: 4095  0   4095
>
> OMG, I only have 174MB of my 32GB of memory available!?!
> - no, really 19.75 GB is still available.
>
> On your haproxy config, if you log errors separately then you can tail -f
> that error-only log and watch it as you start up haproxy. And why not do
> http logging if you are doing http mode? Maybe I am missing something.
>
> I would back off the check inter to 30s or so and make it an http check of
> a file that you know exists, if you can have any static files on your
> servers. This will allow you to see that haproxy is able to find that file,
> get a 200 response and verify that the server is really is up and responding
> fully, not just opening a socket. If you can switch to 1.4rc1 then you get
> alot more info about the health check/health status on the stats page and
> you can do set log-health-checks as an addition aid to troubleshooting.
>
>
> global
>log 127.0.0.1   local0
>log 127.0.0.1   local1 notice
>#log loghostlocal0 info
>option   log-separate-errors
>
>maxconn 4096
>chroot /var/lib/haproxy
>user haproxy
>group haproxy
>daemon
> #   debug
>#quiet
>
> defaults
>log global
>modehttp
> #   option  httplog
>option  dontlognull
>retries 3
>option redispatch
>maxconn 4096
>contimeout  5s
>clitimeout  30s
>srvtimeout  30s
>
>
> listen loadbalancer :80
>mode http
>balance roundrobin
>option forwardfor except 10.0.1.50
>option httpclose
>option httplog
>option httpchk HEAD /favicon.ico
>
>cookie SERVERID insert indirect nocache
>server WEB01 10.0.1.108:80 cookie A check inter 30s
>server WEB05 10.0.1.109:80 cookie B check inter 30s
>
>
> listen statistics 10.0.1.50:8080
>stats enable
>stats auth stats:stats
>stats uri /
>
> [BTW, Did you do a yum upgrade - not yum update after your install of F12?,
> "yum update" misses certain kinds of packaging changes, "yum upgrade" covers
> all updates, even if the name of a package changes - yum upgrade should be
> the default used in yum examples - I ask because many people don't do this
> and there are many security fixes and other package bug fixes that have been
> posted]
>
>
> On 2/6/10 6:59 AM, Peter Griffin wrote:
>
>> Hi Will,
>> Yes X-Windows is installed, but the default init is runlevel 3 and I
>> have not started X for the past couple of days.  The video card is an
>> addon card so I rule out shared memory.
>>
>> With regards to eth1 I ran iptraf and can see that there is no traffic
>> on eth1 so I'd rule this out as well.  I thought about listening for
>> stunnel requests on eth1 10.0.1.51 and connecting to haproxy on
>> 10.0.1.50, but maybe this will cause more problems...
>> I had already ftp'd a file some 70MB to another machine on the same Vlan
>> and I did not see any problems whatsoever.  What I'm planning to do now
>> is to setup the LB in another environment with another 2 Web servers and
>> 1 DB server and stress the hell out of it.  Then I can also test the
>> network traff

Re: Fwd: Site running slow

2010-02-06 Thread Hank A. Paulson
You have selinux on, so it may be unhappy with some part of haproxy - the 
directory it uses, the socket listeners, etc. Turn it off (if you can) until 
you get everything working ok. Turning it off requires a reboot.


To see if it is on:
# sestatus
google for how to turn it off

I would back off the check inter to 30s or so and make it an http check of a 
file that you know exists, if you can have any static files on your servers. 
This will allow you to see that haproxy is able to find that file, get a 200 
response and verify that the server is up.


Also, when you say "free mem going down to 45Mb" are you looking at the first 
line of "free" or the second line? Ignore the first line, it is designed to 
cause panic. eg:


$ free -m
 total   used   free sharedbuffers cached
Mem: 32244  32069174  0  0  19578
-/+ buffers/cache:  12490  19753
Swap: 4095  0   4095

OMG, I only have 174MB of my 32GB of memory available!?!
- no, really 19.75 GB is still available.

On your haproxy config, if you log errors separately then you can tail -f that 
error-only log and watch it as you start up haproxy. And why not do http 
logging if you are doing http mode? Maybe I am missing something.


I would back off the check inter to 30s or so and make it an http check of a 
file that you know exists, if you can have any static files on your servers. 
This will allow you to see that haproxy is able to find that file, get a 200 
response and verify that the server is really is up and responding fully, not 
just opening a socket. If you can switch to 1.4rc1 then you get alot more info 
about the health check/health status on the stats page and you can do set 
log-health-checks as an addition aid to troubleshooting.


global
log 127.0.0.1   local0
log 127.0.0.1   local1 notice
#log loghostlocal0 info
option   log-separate-errors
maxconn 4096
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
#   debug
#quiet

defaults
log global
modehttp
#   option  httplog
option  dontlognull
retries 3
option redispatch
maxconn 4096
contimeout  5s
clitimeout  30s
srvtimeout  30s

listen loadbalancer :80
mode http
balance roundrobin
option forwardfor except 10.0.1.50
option httpclose
option httplog
option httpchk HEAD /favicon.ico
cookie SERVERID insert indirect nocache
server WEB01 10.0.1.108:80 cookie A check inter 30s
server WEB05 10.0.1.109:80 cookie B check inter 30s

listen statistics 10.0.1.50:8080
stats enable
stats auth stats:stats
stats uri /

[BTW, Did you do a yum upgrade - not yum update after your install of F12?, 
"yum update" misses certain kinds of packaging changes, "yum upgrade" covers 
all updates, even if the name of a package changes - yum upgrade should be the 
default used in yum examples - I ask because many people don't do this and 
there are many security fixes and other package bug fixes that have been posted]


On 2/6/10 6:59 AM, Peter Griffin wrote:

Hi Will,
Yes X-Windows is installed, but the default init is runlevel 3 and I
have not started X for the past couple of days.  The video card is an
addon card so I rule out shared memory.

With regards to eth1 I ran iptraf and can see that there is no traffic
on eth1 so I'd rule this out as well.  I thought about listening for
stunnel requests on eth1 10.0.1.51 and connecting to haproxy on
10.0.1.50, but maybe this will cause more problems...
I had already ftp'd a file some 70MB to another machine on the same Vlan
and I did not see any problems whatsoever.  What I'm planning to do now
is to setup the LB in another environment with another 2 Web servers and
1 DB server and stress the hell out of it.  Then I can also test the
network traffic using Iperf.
Will report back in a few days, thank you once more.




On 6 February 2010 14:29, Willy Tarreau mailto:w...@1wt.eu>> 
wrote:

On Sat, Feb 06, 2010 at 01:16:00PM +0100, Peter Griffin wrote:
 > Both http & https.  Also both web servers started to take it in
turns to
 > report as DOWN but more frequently the second one than the first.
 >
 > I ran ethtool eth0 and can verify that it's full-duplex 1Gbps:

OK.

 > I'm attaching dmesg, I don't understand most of it.

well, it shows some video driver issues, which are unrelated (did you
start a graphics environment on your LB ?). It seems it's reserving
some memory (64 or 512MB, I don't understand well) for the video. I
hope it's not a card with shared memory, as the higher the resolution,
the lower the remaining memory bandwidth for normal work.

But I don't see any i

Re: Fwd: Site running slow

2010-02-06 Thread Peter Griffin
Hi Will,
Yes X-Windows is installed, but the default init is runlevel 3 and I have
not started X for the past couple of days.  The video card is an addon card
so I rule out shared memory.

With regards to eth1 I ran iptraf and can see that there is no traffic on
eth1 so I'd rule this out as well.  I thought about listening for stunnel
requests on eth1 10.0.1.51 and connecting to haproxy on 10.0.1.50, but maybe
this will cause more problems...

I had already ftp'd a file some 70MB to another machine on the same Vlan and
I did not see any problems whatsoever.  What I'm planning to do now is to
setup the LB in another environment with another 2 Web servers and 1 DB
server and stress the hell out of it.  Then I can also test the network
traffic using Iperf.

Will report back in a few days, thank you once more.





On 6 February 2010 14:29, Willy Tarreau  wrote:

> On Sat, Feb 06, 2010 at 01:16:00PM +0100, Peter Griffin wrote:
> > Both http & https.  Also both web servers started to take it in turns to
> > report as DOWN but more frequently the second one than the first.
> >
> > I ran ethtool eth0 and can verify that it's full-duplex 1Gbps:
>
> OK.
>
> > I'm attaching dmesg, I don't understand most of it.
>
> well, it shows some video driver issues, which are unrelated (did you
> start a graphics environment on your LB ?). It seems it's reserving
> some memory (64 or 512MB, I don't understand well) for the video. I
> hope it's not a card with shared memory, as the higher the resolution,
> the lower the remaining memory bandwidth for normal work.
>
> But I don't see any iptables related issue there, so that's fine.
>
> Stupid question, are you sure that your traffic passes via eth0 (the
> gig one) ? I'm asking, because eth1 is a cheap 100 Mbps realtek 8139,
> and if you got the routing wrong, it could explain a lot of networking
> issues !
>
> > I'll try to send a file
> > in both directions to saturate the link as you suggested.
>
> OK.
>
> When doing that, don't bench the disks, just the network. For that,
> create "sparse files", which are empty files for which the kernel
> produces zeroes on the fly, and send them files to /dev/null. Eg
> with ftp :
>
> machine1$ dd if=/dev/null bs=1M count=0 seek=1024 of=1g.bin
>
> machine2$ ftp machine1
> > recv 1g.bin /dev/null
>
>
> Regards,
> Willy
>
>


Re: Fwd: Site running slow

2010-02-06 Thread Willy Tarreau
On Sat, Feb 06, 2010 at 01:16:00PM +0100, Peter Griffin wrote:
> Both http & https.  Also both web servers started to take it in turns to
> report as DOWN but more frequently the second one than the first.
> 
> I ran ethtool eth0 and can verify that it's full-duplex 1Gbps:

OK.

> I'm attaching dmesg, I don't understand most of it.

well, it shows some video driver issues, which are unrelated (did you
start a graphics environment on your LB ?). It seems it's reserving
some memory (64 or 512MB, I don't understand well) for the video. I
hope it's not a card with shared memory, as the higher the resolution,
the lower the remaining memory bandwidth for normal work.

But I don't see any iptables related issue there, so that's fine.

Stupid question, are you sure that your traffic passes via eth0 (the
gig one) ? I'm asking, because eth1 is a cheap 100 Mbps realtek 8139,
and if you got the routing wrong, it could explain a lot of networking
issues !

> I'll try to send a file
> in both directions to saturate the link as you suggested.

OK.

When doing that, don't bench the disks, just the network. For that,
create "sparse files", which are empty files for which the kernel
produces zeroes on the fly, and send them files to /dev/null. Eg
with ftp :

machine1$ dd if=/dev/null bs=1M count=0 seek=1024 of=1g.bin

machine2$ ftp machine1
> recv 1g.bin /dev/null


Regards,
Willy




Fwd: Fwd: Site running slow

2010-02-06 Thread Peter Griffin
I forgot to mention that yes this is a dedicated machine.


On 6 February 2010 12:47, Willy Tarreau  wrote:

> On Sat, Feb 06, 2010 at 12:27:10PM +0100, Peter Griffin wrote:
> > The minute I put the changes and made the loadbalancer active, external
> > users experienced serious downtime.  I tried accessing our site from an
> > external source and sure enough we were unbrowsable.  So I had to take
> > haproxy off again.  Ram was now stable at 750Mb free.
> >
> > At the time I had about 300 connections and only 10% were https.
>
> was it unbrowsable on HTTP too, or just HTTPS ?
>
> > At this point, could it be a defective nic?  Wrong kernel?  I'm running
> > Fedora 12.
>
> Very unlikely. Hmm would this haproxy run on a dedicated machine ?
> If so, can you check its connectivity ? At least run "ethtool eth0" and
> check that your link is correctly detected as full duplex. If you could
> try a file transfer in each direction to confirm that you can saturate
> the link, it'll be nice.
>
> If the machine is dedicated, it's possible that you have iptables loaded
> too and that it quickly fills its conntrack table. You'd see that in
> "dmesg".
>
> Willy
>
>


dmesg.rar
Description: Binary data


Re: Fwd: Site running slow

2010-02-06 Thread Peter Griffin
Both http & https.  Also both web servers started to take it in turns to
report as DOWN but more frequently the second one than the first.

I ran ethtool eth0 and can verify that it's full-duplex 1Gbps:
Settings for eth0:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbag
Wake-on: g
Current message level: 0x0001 (1)
Link detected: yes

I'm attaching dmesg, I don't understand most of it.  I'll try to send a file
in both directions to saturate the link as you suggested.

Thanks again.

On 6 February 2010 12:47, Willy Tarreau  wrote:

> On Sat, Feb 06, 2010 at 12:27:10PM +0100, Peter Griffin wrote:
> > The minute I put the changes and made the loadbalancer active, external
> > users experienced serious downtime.  I tried accessing our site from an
> > external source and sure enough we were unbrowsable.  So I had to take
> > haproxy off again.  Ram was now stable at 750Mb free.
> >
> > At the time I had about 300 connections and only 10% were https.
>
> was it unbrowsable on HTTP too, or just HTTPS ?
>
> > At this point, could it be a defective nic?  Wrong kernel?  I'm running
> > Fedora 12.
>
> Very unlikely. Hmm would this haproxy run on a dedicated machine ?
> If so, can you check its connectivity ? At least run "ethtool eth0" and
> check that your link is correctly detected as full duplex. If you could
> try a file transfer in each direction to confirm that you can saturate
> the link, it'll be nice.
>
> If the machine is dedicated, it's possible that you have iptables loaded
> too and that it quickly fills its conntrack table. You'd see that in
> "dmesg".
>
> Willy
>
>


dmesg.rar
Description: Binary data


Re: Fwd: Site running slow

2010-02-06 Thread Willy Tarreau
On Sat, Feb 06, 2010 at 12:27:10PM +0100, Peter Griffin wrote:
> The minute I put the changes and made the loadbalancer active, external
> users experienced serious downtime.  I tried accessing our site from an
> external source and sure enough we were unbrowsable.  So I had to take
> haproxy off again.  Ram was now stable at 750Mb free.
> 
> At the time I had about 300 connections and only 10% were https.

was it unbrowsable on HTTP too, or just HTTPS ?

> At this point, could it be a defective nic?  Wrong kernel?  I'm running
> Fedora 12.

Very unlikely. Hmm would this haproxy run on a dedicated machine ?
If so, can you check its connectivity ? At least run "ethtool eth0" and
check that your link is correctly detected as full duplex. If you could
try a file transfer in each direction to confirm that you can saturate
the link, it'll be nice.

If the machine is dedicated, it's possible that you have iptables loaded
too and that it quickly fills its conntrack table. You'd see that in
"dmesg".

Willy




Re: Fwd: Site running slow

2010-02-06 Thread Peter Griffin
The minute I put the changes and made the loadbalancer active, external
users experienced serious downtime.  I tried accessing our site from an
external source and sure enough we were unbrowsable.  So I had to take
haproxy off again.  Ram was now stable at 750Mb free.

At the time I had about 300 connections and only 10% were https.

At this point, could it be a defective nic?  Wrong kernel?  I'm running
Fedora 12.

On 6 February 2010 10:30, Willy Tarreau  wrote:

> On Sat, Feb 06, 2010 at 09:51:45AM +0100, Peter Griffin wrote:
> > Hi Will,
> > I didn't see my post in the archives and since this is a production site
> I
> > panicked.
> >
> > Thank you so much for your explanation, it's much clearer now.  I will
> make
> > the changes and report back how it went.  Do you think that I'd be better
> > off upgrading to 4Gb Ram or should 1Gb be enough?
>
> It only depends on the number of concurrent connections. On a finely tuned
> system, you can sustain slightly more than 2 connections through
> haproxy
> with 1 GB RAM. But stunnel will consume more per connection because of the
> SSL context which is heavier. I don't think it's reasonable to go much
> higher
> than 2-3000 concurrent connections on 1 GB RAM via stunnel+haproxy. Also,
> if
> you need that many SSL connections, you'll definitely want to set up a load
> balanced SSL farm or you'll sooner or later run into trouble.
>
> Regards,
> Willy
>
>


Re: Fwd: Site running slow

2010-02-06 Thread Willy Tarreau
On Sat, Feb 06, 2010 at 09:51:45AM +0100, Peter Griffin wrote:
> Hi Will,
> I didn't see my post in the archives and since this is a production site I
> panicked.
> 
> Thank you so much for your explanation, it's much clearer now.  I will make
> the changes and report back how it went.  Do you think that I'd be better
> off upgrading to 4Gb Ram or should 1Gb be enough?

It only depends on the number of concurrent connections. On a finely tuned
system, you can sustain slightly more than 2 connections through haproxy
with 1 GB RAM. But stunnel will consume more per connection because of the
SSL context which is heavier. I don't think it's reasonable to go much higher
than 2-3000 concurrent connections on 1 GB RAM via stunnel+haproxy. Also, if
you need that many SSL connections, you'll definitely want to set up a load
balanced SSL farm or you'll sooner or later run into trouble.

Regards,
Willy




Re: Fwd: Site running slow

2010-02-06 Thread Peter Griffin
Hi Will,
I didn't see my post in the archives and since this is a production site I
panicked.

Thank you so much for your explanation, it's much clearer now.  I will make
the changes and report back how it went.  Do you think that I'd be better
off upgrading to 4Gb Ram or should 1Gb be enough?

Cheers

On 6 February 2010 09:21, Willy Tarreau  wrote:

> Hi Peter,
>
> it's needless to resend your mail 3 times. Most people on this list aren't
> always available, but are generally helpful. Sometimes you just have to be
> a bit patient.
>
> I see no timeout in your stunnel configuration :
>
> > My stunnel.conf:
> > #setuid=stunnel
> > #setgid=proxy
> >
> > debug = 3
> > output = /var/log/stunnel.log
> >
> > socket=l:TCP_NODELAY=1
> > socket=r:TCP_NODELAY=1
> >
> > [https]
> > accept=10.0.1.50:443
> > connect=10.0.1.50:80  
> > TIMEOUTclose=0
> > xforwardedfor=yes
>
> That means that each time a visitor suddenly gets off the net,
> you end up with an ever-lasting connection. Those connections
> pile up, until stunnel cannot accept any more. It's very likely
> the problem you're observing. The problem is even amplified by
> the fact that apparently you're doing some NAT in front of stunnel,
> leaving almost no chance for out of state packets to trigger a
> possible RST. Please set the following ones to reasonable values
> (eg: 30-60 seconds) : TIMEOUTbusy, TIMEOUTidle. This one should
> also be specified even if less important since its the same
> machine : TIMEOUTconnect.
>
> Also, eventhough that's unrelated, I suggest that your replace :
> "option forwardfor" with "option forwardfor except 10.0.1.50" in
> your haproxy config. That way, connections coming from stunnel
> will have the stunnel's x-forwarded-for header last. And those
> coming from everywhere else will have haproxy's. That means that
> your server will always have the client's IP in the last header.
>
> Also, I don't know if the 45 Mbps you were talking about are full
> SSL traffic, but in this case you might want to be very careful
> about setting limits so that stunnel's connections do not go
> through the roof. I think the only way to do that with stunnel is
> by setting ulimit before launching it, though I'm not certain.
>
> Hoping this helps,
> Willy
>
>


Re: Fwd: Site running slow

2010-02-06 Thread Willy Tarreau
Hi Peter,

it's needless to resend your mail 3 times. Most people on this list aren't
always available, but are generally helpful. Sometimes you just have to be
a bit patient.

I see no timeout in your stunnel configuration :

> My stunnel.conf:
> #setuid=stunnel
> #setgid=proxy
> 
> debug = 3
> output = /var/log/stunnel.log
> 
> socket=l:TCP_NODELAY=1
> socket=r:TCP_NODELAY=1
> 
> [https]
> accept=10.0.1.50:443
> connect=10.0.1.50:80 
> TIMEOUTclose=0
> xforwardedfor=yes

That means that each time a visitor suddenly gets off the net,
you end up with an ever-lasting connection. Those connections
pile up, until stunnel cannot accept any more. It's very likely
the problem you're observing. The problem is even amplified by
the fact that apparently you're doing some NAT in front of stunnel,
leaving almost no chance for out of state packets to trigger a
possible RST. Please set the following ones to reasonable values
(eg: 30-60 seconds) : TIMEOUTbusy, TIMEOUTidle. This one should
also be specified even if less important since its the same
machine : TIMEOUTconnect.

Also, eventhough that's unrelated, I suggest that your replace :
"option forwardfor" with "option forwardfor except 10.0.1.50" in
your haproxy config. That way, connections coming from stunnel
will have the stunnel's x-forwarded-for header last. And those
coming from everywhere else will have haproxy's. That means that
your server will always have the client's IP in the last header.

Also, I don't know if the 45 Mbps you were talking about are full
SSL traffic, but in this case you might want to be very careful
about setting limits so that stunnel's connections do not go
through the roof. I think the only way to do that with stunnel is
by setting ulimit before launching it, though I'm not certain.

Hoping this helps,
Willy




Fwd: Site running slow

2010-02-05 Thread Peter Griffin
Hi,
I realise this must have been a stupid question for you I'm quite green to
Linux and haproxy so I do not have debugging skills.  I would appreciate if
someone could explain at least if my hardware (all on one machine) could be
the problem or whether this is just some tcp/ip tuning that needs to be done
on the kernel.

Sorry for being annoying.



-- Forwarded message --
From: Peter Griffin 
Date: 5 February 2010 22:02
Subject: Site running slow
To: haproxy@formilux.org


 Hi,
Setup haproxy 1.3.22 with stunnel 4.22 + OpenSSL 0.9.7m and runs well with a
few internal users.  I run this setup on a P4 with 1Gb of Ram and with a few
users am left with about 750Mb of free Ram.

After deployment of the site and heavy traffic I noticed free mem going down
to 45Mb, occasionally lower than that then goes back to 45Mb.  The site was
brought to a standstill and eventually I had to route straight into 1
webserver.

I noticed the following entries in stunnel:
2010.02.05 21:04:00 LOG3[4030:139813211576080]: SSL_read: Connection reset
by peer (104)
2010.02.05 21:05:25 LOG3[4030:139813211576080]: connect_wait: getsockopt:
Connection refused (111)
2010.02.05 21:10:40 LOG3[4030:139813211645712]: SSL_accept: Peer suddenly
disconnected
2010.02.05 21:12:11 LOG3[4030:139813211576080]: SSL_read: Connection reset
by peer (104)
2010.02.05 21:12:12 LOG3[4030:139813211576080]: SSL socket closed on
SSL_read with 7468 byte(s) in buffer
2010.02.05 21:12:12 LOG3[4030:139813211576080]: SSL socket closed on
SSL_read with 16384 byte(s) in buffer
2010.02.05 21:12:12 LOG3[4030:139813211576080]: SSL socket closed on
SSL_read with 16384 byte(s) in buffer
2010.02.05 21:12:15 LOG3[4030:139813211645712]: SSL_read: Connection reset
by peer (104)
2010.02.05 21:12:28 LOG3[4030:139813211576080]: SSL_accept: Peer suddenly
disconnected
2010.02.05 21:17:28 LOG3[4030:139813211576080]: SSL_read: Connection reset
by peer (104)
2010.02.05 21:17:32 LOG3[4030:139813211645712]: SSL_read: Connection reset
by peer (104)
2010.02.05 21:17:34 LOG3[4030:139813211576080]: SSL socket closed on
SSL_read with 2385 byte(s) in buffer
2010.02.05 21:17:38 LOG3[4030:139813211576080]: SSL socket closed on
SSL_read with 2385 byte(s) in buffer
My stunnel.conf:
#setuid=stunnel
#setgid=proxy

debug = 3
output = /var/log/stunnel.log

socket=l:TCP_NODELAY=1
socket=r:TCP_NODELAY=1

[https]
accept=10.0.1.50:443
connect=10.0.1.50:80 
TIMEOUTclose=0
xforwardedfor=yes

and haproxy.cfg
# this config needs haproxy-1.1.28 or haproxy-1.2.1

global
log 127.0.0.1   local0
log 127.0.0.1   local1 notice
#log loghostlocal0 info
maxconn 4096
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
#   debug
#quiet

defaults
log global
modehttp
#   option  httplog
option  dontlognull
retries 3
option redispatch
maxconn 4096
contimeout  5000
clitimeout  15
srvtimeout  3


listen loadbalancer :80
mode http
balance roundrobin
option forwardfor
option httpclose
cookie SERVERID insert indirect nocache
server WEB01 10.0.1.108:80  cookie A
check inter 5000
server WEB05 10.0.1.109:80  cookie B
check inter 5000


listen statistics 10.0.1.50:8080
stats enable
stats auth stats:stats
stats uri /

The clues I think run in stunnel's logs, in particular:
SSL_accept: Peer suddenly disconnected
&
SSL_read: Connection reset by peer (104)
Is there some setting I am missing in haproxy that could alleviate the
problem or is it just a question of putting more physical ram in?

Thanks in advance.