We have received 3-4 copies of this email over different times - not
sure what's going on.
Congratulations on reaching such a high scale in your testing. As you
probably know by now that scaling isn't a straight-forward task and
involves much analysis and tuning.
If you're cpu is 90% utilized at 10000 users, I don't see how you can
expect to get more throughput from this system ? In fact, you will
probably find that beyond a certain point (say 6 to 8000 users), you
will need more cpu/user as scalability drops.
In any case, at such high rates, there can be lots of issues and it is
difficult to preddict what exactly you may be hitting.
Shanti
On 09/24/09 19:25, Mingfan Lu wrote:
I using faban/oliophp to stress a machine (16 Core) as web server with
two other DB nodes ( a master_slave cluster, master using a high speed
SATA disk while slave using a SSD disk)
When #concurrent users scaling from 9K 10K 11K 12K 13K 14K 15K 16K the
throughput increasing and then decreasing.
9k *1810.323*
10K *1969.393*
11K *1859.053*
12K *1842.368*
13K *1849.213*
14K *1843.955*
It seems that there are some bottleneck here.
Detail to see the attached run.xml
My ramp time is 300s while steady time is 600s and the rampdown is 60s
The client start up:
Time between starts (ms) :1
Start simultaneously: No
Start agents in parallel: No
But my profiling data shows that the CPU( Highest is about 80%~90% when
#concurrent user is 10000, softirq% is about 14% with *4tx and 4rx*
queues ) / Networks Bandwidth(70% of 1Gb) /Memory Usage/Disk are not the
bottleneck. The Apache error log is very clean with no exception and
error. At the same time I have disabled the static images serving (Just
disable all *img* tag in the HTML)
From the pictures in
http://docs.google.com/present/view?id=df7282nf_30x8gwmrch&autoStart=true
<http://docs.google.com/present/view?id=df7282nf_30x8gwmrch&autoStart=true>
, when 9K concurrent user, the response time is steady enough, when 10K,
there is pulse lasting 600sec (what happen?) and down to very small
enough in the last 300sec.
I want to know what cause the strange pulse when concurrent users reach
10K?