On 07/13/2012 06:55 AM, Leandro Reox wrote:
Ok, here is the story, we deployed some inhouse APIs in our Openstack privade cloud, and we were stressing them up, we realize that some packages were taking so long, to discard the behavior of the api, we installed apache, lighttpd and event tried with netcat, of course on the guest systems running ubuntu 10.10 w/virtio, after getting nuts modifing sysctl parameters to change the guest behavior, we realized that if we installed apache, or lighttpd on the PHYSICAL host the behavior was the same ...., that surprised us, when we try the same benchmark on a node without bonding, bridging and without any KVM packages or nova installed, with the same HW specs, the benchmark passes OK, but if we run the same tests on a spare nova node with everything installed + bonding + bridging that never run a virtual guest machine, the test fails too, so, so far:

Tested on hosts with Ubuntu 10.10, 11.10 and 12.04

- Clean node without bonding + briding or KVM - just the eth0 configured - PASS
- Spare node with bridging - PASS
- Spare node with just bonding (dynamic link aggr mode4) - PASS
- Spare node with nova + kvm + bonding + bridging - FAILS
- Spare node with nova + kvm - PASS

Is there a chance that working with bridging + bonding + nova some module get screwed, ill attach the tests , you can see that a small amount of packages takes TOO LONG, like 3secs, and the overhead time is on the "CONNECT" phase

If I recall correctly, 3 seconds is the default, initial TCP retransmission timeout (at least in older kernels - what is your load generator running?). Between that, and your mentioning connect phase, my first guess (it is only a guess) would be that something is causing TCP SYNchronize segments to be dropped. If that is the case, it should show-up in netstat -s statistics. Snap them on both client and server before the test is started, and after the test is completed, and then run them through something like beforeafter ( ftp://ftp.cup.hp.com/dist/networking/tools )

netstat -s > before.server
# run benchmark
netstat -s > after.server
beforeafter before.server after.server > delta.server
less delta.server

(As a sanity check, make certain that before.server and after.server have the same number of lines. The habit of Linux's netstat to avoid printing a statistic with a value of zero can, sometimes, confuse beforeafter if a stat appears in after that was not present in before.)

It might not be a bad idea to include ethtool -S statistics from each of the interfaces in that procedure as well.

rick jones
probably a good idea to mention the bonding mode you are using

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 172.16.161.25 (be patient)
Completed 2500 requests
Completed 5000 requests
Completed 7500 requests
Completed 10000 requests
Completed 12500 requests
Completed 15000 requests
Completed 17500 requests
Completed 20000 requests
Completed 22500 requests
Completed 25000 requests
Finished 25000 requests


Server Software:        Apache/2.2.16
Server Hostname:        172.16.161.25
Server Port:            80

Document Path:          /
Document Length:        177 bytes

Concurrency Level:      5
Time taken for tests:   7.493 seconds
Complete requests:      25000
Failed requests:        0
Write errors:           0
Total transferred:      11350000 bytes
HTML transferred:       4425000 bytes
Requests per second:    3336.53 [#/sec] (mean)
Time per request:       1.499 [ms] (mean)
Time per request:       0.300 [ms] (mean, across all concurrent requests)
Transfer rate:          1479.28 [Kbytes/sec] received

Connection Times (ms)
             min  mean[+/-sd] median   max
Connect:        0    1  46.6      0    3009
Processing:     0    1   5.7      0     277
Waiting:        0    0   4.6      0     277
Total:          0    1  46.9      1    3010

Percentage of the requests served within a certain time (ms)
 50%      1
 66%      1
 75%      1
 80%      1
 90%      1
 95%      1
 98%      1
 99%      1
100%   3010 (longest request)

Regards!





_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to