well, it is easy to identify the irq issue here:
1, in "top", press "1" to display all CPU details. and press "H" to display the 
Traffic Server threadings, by default the process is sorted with CPU usage desc.
you may get one of the CPU with full load but not single TS process.

2, "cat /proc/interrupts", and grep out your 10GE nic, check the IRQs. you need 
the IRQs on different CPUs for better performance.
you may get that all the IRQs for the NIC is on one CPU, that is the CPU  with 
full load, typically this CPU0

just set the smp_affinity for each IRQ, here is a not prove to working one line 
script(replace the eth1 with your NIC name):

j=0;for i in $(grep eth1 /proc/interrupts | awk -F: "{print \$1}");do test $j 
-gt $(grep processor /proc/cpuinfo | tail -n 1 | awk '{print $NF}') && let 
j=0;echo $(echo -n $(python -c 'a=1<<'$(echo $j%32 | bc)'; print "%X"%a'); echo 
-n $(k=$(echo $j/32 | bc);while [ $k -gt 0 ];do echo -n ",00000000";let 
k=k-1;done))> /proc/irq/$i/smp_affinity;let j=j+1;done


FYI

在 2013-3-22,上午6:23,Igor Galić <[email protected]> 写道:

> This may be useful:
> 
> http://kerneltrap.org/mailarchive/linux-netdev/2010/4/15/6274814/thread
> 
> Hi Yongming,
> 
> I haven't changed the networking configuraton but I've also noticed that once 
> the first core is at 100% utilization the server won't answer all ping 
> requests anymore and has packet loss. This might be a sign that all network 
> traffic is handled by the first core isn't it?
> 
> You can find a screenshot of the threading output of top here: 
> http://i.imgur.com/X3te2Ru.png
> 
> Best Regards
> Philip
> 
> 2013/3/21 Yongming Zhao <[email protected]>
> well, due to the high network traffic, have you make the 10Ge NIC irq  
> balanced to multiple cpu?
> 
> and can you show us the threading CPU usage in the top? 
> 
> thanks
> 
> 在 2013-3-21,下午7:42,Philip <[email protected]> 写道:
> 
> I've just upgraded to ATS 3.3.1-dev. The problem still is the same: 
> http://i.imgur.com/1pHWQy7.png
> 
> The load goes on one core. (The server is only running ATS)
> 
> 2013/3/21 Philip <[email protected]>
> Hi Igor,
> 
> I am using ATS 3.2.4, Debian 6 (Squeeze) and a 3.2.13 Kernel.
> 
> I was using the "traffic_line -r" command to see the number of origin 
> connections growing and htop/atop to see that only one core is 100% utilized. 
> I've already tested the following changes to the configuration:
> 
> proxy.config.accept_threads -> 0
> 
> proxy.config.exec_thread.autoconfig -> 0
> proxy.config.exec_thread.limit -> 120
> 
> They had no effect there is still the one core that becomes 100% utilized and 
> turns out to be a bottleneck.
> 
> Best Regards
> Philip
> 
> 
> 2013/3/21 Igor Galić <[email protected]>
> Hi Philip,
> 
> Let's start with some simple data mining: 
> 
> which version of ATS are you running?
> What OS/Distro/version are you running it on?
> 
> Are you looking at stats_over_http's output to determine what's going on in 
> ATS?
> 
> -- i
> 
> I have noticed the following strange behavior: Once the number of origin 
> connections start to increase and the proxying speed collapses the first core 
> is at 100% utilization while the others are not even close to that. It seems 
> like the origin requests are handled by the first core only. Is this expected 
> behavior that can be changed by editing the configuration or is this a bug?
> 
> 
> 
> 2013/3/20 Philip <[email protected]>
> Hi,
> 
> I am running ATS on a pretty large server with two physical 6 core XEON CPUs 
> and 22 raw device disks. I want to use that server as a frontend for several 
> fileservers. It is currently configured to be infront of two file-servers. 
> The load on the ATS server is pretty low. About 1-4% disk utilization and 
> 500Mbps of outgoing traffic.
> 
> Once I direct the traffic of the third file server towards ATS something 
> strange happens:
> 
> - The number of origin connection increases continually.
> - Requests that hit ATS and are not cached are served really slow to the 
> client (about 35 kB/s) while requests that are served from the cache are 
> blazingly fast.
> 
> The ATS server has a dedicated 10Gbps port that is not maxed out, no CPU core 
> is maxxed, there is no swapping, there are no error logs and also the origin 
> servers are not heavy utilized. It feels like there are not enough workers to 
> process the origin requests.
> 
> Is there anything I can do to check if my theory is right and a way to 
> increase the number of origin workers?
> 
> Best Regards
> Philip
> 
> 
> 
> 
> -- 
> Igor Galić
> 
> Tel: +43 (0) 664 886 22 883
> Mail: [email protected]
> URL: http://brainsware.org/
> GPG: 6880 4155 74BD FD7C B515  2EA5 4B1D 9E08 A097 C9AE
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Igor Galić
> 
> Tel: +43 (0) 664 886 22 883
> Mail: [email protected]
> URL: http://brainsware.org/
> GPG: 6880 4155 74BD FD7C B515  2EA5 4B1D 9E08 A097 C9AE
> 

Reply via email to