Re: Problems with load balancing on cloud servers

2011-09-12 Thread Willy Tarreau
Hi,

On Tue, Sep 13, 2011 at 11:02:26AM +0800, Liong Kok Foo wrote:
> Top for server 70 (load problem)
> top - 10:51:23 up 32 days, 22:21,  1 user,  load average: 3.09, 2.99, 2.50
> Tasks: 115 total,   3 running, 112 sleeping,   0 stopped,   0 zombie
> Cpu(s): 38.5%us, 11.0%sy,  0.0%ni, 48.2%id,  0.0%wa,  0.0%hi,  2.3%si,  
> 0.0%st
> Mem:   205k total,  1049708k used,  1000292k free,   264264k buffers
> Swap:  1052248k total,  876k used,  1051372k free,   418272k cached
> 
> Sometimes server B's load will shoot up to 20 or more while server A 
> (and the rest remain at around 5).
> 
> Would really appreciate any input on this matter.

When you look at the stats, you notice that there is much a higher
retransmit number than for other servers. This almost always translates
to connectivity issues. And if there are connectivity issues, then the
server has more difficulties pushing out responses to the clients and
gets more concurrent processes than the other ones, leading to a higher
load and memory usage.

You should run tests between this server and another one : transfer a
large file (500 MB) several time. You should reach the gbps (118 MB/s).
Do this in different directions. Often you'll notice that one direction
is approximately OK while the other one is terrible. Do this with other
reference servers that work well and if you note that communications
with this server are the only ones affected, then ask the provider to
replace it. Sometimes it's just a cable issue. Sometimes it's switch
port, sometimes it's a NIC. But those issues are quite common in
datacenters.

You can also look at the network statistics :

   $ netstat -s|grep retrans

I suspect that you'll notice more retransmits on this one than on the
other servers. Be careful, those stats are from the last boot, so you
have to take uptime into account.

Regards,
Willy




Re: Problems with load balancing on cloud servers

2011-09-12 Thread Baptiste
Hi Liong,

You can also play with vm.swapiness to avoid your ubuntu server to use its swap.

cheers



Re: Problems with load balancing on cloud servers

2011-09-12 Thread Jerry Champlin
Liong:

Have you looked at average query response time statistics?  Perhaps this
could shed some light on your issue.  A graph of response time vs. current
requests for a given resource might help you narrow down the issue if it's
not a load balancer issue or just requests processed / second compared
between the two servers would give you more conclusive evidence that the
load balancer is doing the wrong thing.

-Jerry

Jerry Champlin
Absolute Performance Inc.
--
Enabling businesses to deliver critical applications at lower cost and
higher value to their customers.


On Mon, Sep 12, 2011 at 9:38 PM, Liong Kok Foo wrote:

>  Hi Jerry,
>
> Appreciate the quick reply and pointing out this difference.
>
> However, we are aware of some server having swap and some not. We have
> explored turning swap on or off but it doesn't solve the issue.
>
> Thanks.
>
> Liong Kok Foo
>
>
> On 9/13/2011 11:15 AM, Jerry Champlin wrote:
>
> Liong:
>
>  The only significant difference in the stats you posted is that the server
> without a load problem has no swap configured and the other one does.  I
> would not expect that to cause a problem but that's the only difference that
> jumps off the page.
>
>  -Jerry
>
> Jerry Champlin
> Absolute Performance Inc.
> --
> Enabling businesses to deliver critical applications at lower cost and
> higher value to their customers.
>
>
> On Mon, Sep 12, 2011 at 9:02 PM, Liong Kok Foo wrote:
>
>> Hi,
>>
>> We have been using haproxy for many years now. Implemented it in few of
>> our systems. However, we have been facing some odd problem which we are not
>> sure if it is related to haproxy. The odd problem is that one of the server
>> is having higher load than the others. That is the last server in the LB but
>> we tried switching it to second last and still see only this server giving
>> high load. We also tried cloning the server from one of the existing server
>> that doesn't have this problem but it still is giving this problem.
>>
>> We have checked with the cloud provider to see if the odd server is hosted
>> in a different cloud segment in their datacenter. Doesn't seems so from
>> their reply.
>>
>> We have setup 5 instance of servers in the cloud computing hosted by
>> voxel. One is being used as haproxy server and the other 4 is running apache
>> serving website.
>>
>> Specs of the cloud servers:
>> 1 core CPU
>> 2GB RAM
>> Ubuntu 10.04 LTS
>> Haproxy 1.3.25
>>
>> Check the screen shot for the haproxy stats. We allocated weight of
>> 1:1:1:1. We just doesn't understand why the extra load on this server.
>>
>> Top for server 69 (no problem):
>> top - 10:51:08 up 145 days, 16:34,  1 user,  load average: 1.06, 1.22,
>> 1.27
>> Tasks: 117 total,   2 running, 115 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 28.8%us,  7.6%sy,  0.0%ni, 61.3%id,  0.0%wa,  0.0%hi,  2.1%si,
>>  0.2%st
>> Mem:   205k total,  1587608k used,   462392k free,   760100k buffers
>> Swap:0k total,0k used,0k free,   432848k cached
>>
>> Top for server 70 (load problem)
>> top - 10:51:23 up 32 days, 22:21,  1 user,  load average: 3.09, 2.99, 2.50
>> Tasks: 115 total,   3 running, 112 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 38.5%us, 11.0%sy,  0.0%ni, 48.2%id,  0.0%wa,  0.0%hi,  2.3%si,
>>  0.0%st
>> Mem:   205k total,  1049708k used,  1000292k free,   264264k buffers
>> Swap:  1052248k total,  876k used,  1051372k free,   418272k cached
>>
>> Sometimes server B's load will shoot up to 20 or more while server A (and
>> the rest remain at around 5).
>>
>> Would really appreciate any input on this matter.
>>
>> Thanks.
>>
>> --
>>  Liong Kok Foo
>>
>>
>


Re: Problems with load balancing on cloud servers

2011-09-12 Thread Liong Kok Foo

Hi Jerry,

Appreciate the quick reply and pointing out this difference.

However, we are aware of some server having swap and some not. We have 
explored turning swap on or off but it doesn't solve the issue.


Thanks.

Liong Kok Foo


On 9/13/2011 11:15 AM, Jerry Champlin wrote:

Liong:

The only significant difference in the stats you posted is that the 
server without a load problem has no swap configured and the other one 
does.  I would not expect that to cause a problem but that's the only 
difference that jumps off the page.


-Jerry

Jerry Champlin
Absolute Performance Inc.
--
Enabling businesses to deliver critical applications at lower cost and 
higher value to their customers.



On Mon, Sep 12, 2011 at 9:02 PM, Liong Kok Foo 
mailto:kokfoo.li...@innity.com>> wrote:


Hi,

We have been using haproxy for many years now. Implemented it in
few of our systems. However, we have been facing some odd problem
which we are not sure if it is related to haproxy. The odd problem
is that one of the server is having higher load than the others.
That is the last server in the LB but we tried switching it to
second last and still see only this server giving high load. We
also tried cloning the server from one of the existing server that
doesn't have this problem but it still is giving this problem.

We have checked with the cloud provider to see if the odd server
is hosted in a different cloud segment in their datacenter.
Doesn't seems so from their reply.

We have setup 5 instance of servers in the cloud computing hosted
by voxel. One is being used as haproxy server and the other 4 is
running apache serving website.

Specs of the cloud servers:
1 core CPU
2GB RAM
Ubuntu 10.04 LTS
Haproxy 1.3.25

Check the screen shot for the haproxy stats. We allocated weight
of 1:1:1:1. We just doesn't understand why the extra load on this
server.

Top for server 69 (no problem):
top - 10:51:08 up 145 days, 16:34,  1 user,  load average: 1.06,
1.22, 1.27
Tasks: 117 total,   2 running, 115 sleeping,   0 stopped,   0 zombie
Cpu(s): 28.8%us,  7.6%sy,  0.0%ni, 61.3%id,  0.0%wa,  0.0%hi,
 2.1%si,  0.2%st
Mem:   205k total,  1587608k used,   462392k free,   760100k
buffers
Swap:0k total,0k used,0k free,   432848k
cached

Top for server 70 (load problem)
top - 10:51:23 up 32 days, 22:21,  1 user,  load average: 3.09,
2.99, 2.50
Tasks: 115 total,   3 running, 112 sleeping,   0 stopped,   0 zombie
Cpu(s): 38.5%us, 11.0%sy,  0.0%ni, 48.2%id,  0.0%wa,  0.0%hi,
 2.3%si,  0.0%st
Mem:   205k total,  1049708k used,  1000292k free,   264264k
buffers
Swap:  1052248k total,  876k used,  1051372k free,   418272k
cached

Sometimes server B's load will shoot up to 20 or more while server
A (and the rest remain at around 5).

Would really appreciate any input on this matter.

Thanks.

-- 
Liong Kok Foo





Re: Problems with load balancing on cloud servers

2011-09-12 Thread Jerry Champlin
Liong:

The only significant difference in the stats you posted is that the server
without a load problem has no swap configured and the other one does.  I
would not expect that to cause a problem but that's the only difference that
jumps off the page.

-Jerry

Jerry Champlin
Absolute Performance Inc.
--
Enabling businesses to deliver critical applications at lower cost and
higher value to their customers.


On Mon, Sep 12, 2011 at 9:02 PM, Liong Kok Foo wrote:

> Hi,
>
> We have been using haproxy for many years now. Implemented it in few of our
> systems. However, we have been facing some odd problem which we are not sure
> if it is related to haproxy. The odd problem is that one of the server is
> having higher load than the others. That is the last server in the LB but we
> tried switching it to second last and still see only this server giving high
> load. We also tried cloning the server from one of the existing server that
> doesn't have this problem but it still is giving this problem.
>
> We have checked with the cloud provider to see if the odd server is hosted
> in a different cloud segment in their datacenter. Doesn't seems so from
> their reply.
>
> We have setup 5 instance of servers in the cloud computing hosted by voxel.
> One is being used as haproxy server and the other 4 is running apache
> serving website.
>
> Specs of the cloud servers:
> 1 core CPU
> 2GB RAM
> Ubuntu 10.04 LTS
> Haproxy 1.3.25
>
> Check the screen shot for the haproxy stats. We allocated weight of
> 1:1:1:1. We just doesn't understand why the extra load on this server.
>
> Top for server 69 (no problem):
> top - 10:51:08 up 145 days, 16:34,  1 user,  load average: 1.06, 1.22, 1.27
> Tasks: 117 total,   2 running, 115 sleeping,   0 stopped,   0 zombie
> Cpu(s): 28.8%us,  7.6%sy,  0.0%ni, 61.3%id,  0.0%wa,  0.0%hi,  2.1%si,
>  0.2%st
> Mem:   205k total,  1587608k used,   462392k free,   760100k buffers
> Swap:0k total,0k used,0k free,   432848k cached
>
> Top for server 70 (load problem)
> top - 10:51:23 up 32 days, 22:21,  1 user,  load average: 3.09, 2.99, 2.50
> Tasks: 115 total,   3 running, 112 sleeping,   0 stopped,   0 zombie
> Cpu(s): 38.5%us, 11.0%sy,  0.0%ni, 48.2%id,  0.0%wa,  0.0%hi,  2.3%si,
>  0.0%st
> Mem:   205k total,  1049708k used,  1000292k free,   264264k buffers
> Swap:  1052248k total,  876k used,  1051372k free,   418272k cached
>
> Sometimes server B's load will shoot up to 20 or more while server A (and
> the rest remain at around 5).
>
> Would really appreciate any input on this matter.
>
> Thanks.
>
> --
> Liong Kok Foo
>
>