On 09/16/2010 08:09 PM, Pranav Desai wrote:
On
Done. Using 2 disks. Do you recommend a raid config for better performance ?

No, I recommend RAID only if you really, really care for redundancy / reliability. ATS will automatically "load balance" across any disks that you throw at it.


Here are my tcp mem parameters. And req/sec isnt of concern here so I
should be ok with the listen queue and backlogs. If you have any
particular setting in mind please let me know.

net.ipv4.tcp_mem = 1339776      1786368 2679552
net.ipv4.tcp_wmem = 4096        87380   8388608
net.ipv4.tcp_rmem = 4096        87380   8388608

These would only make a difference if there's a latency between the client and the server (which is generally not the case in a "lab"). The above are for the autoscaling window sizes I think, which seem reasonable. There are settings in records.config to bump up the initial window size for the connections:

CONFIG proxy.config.net.sock_send_buffer_size_in INT 262144
CONFIG proxy.config.net.sock_recv_buffer_size_in INT 0
CONFIG proxy.config.net.sock_send_buffer_size_out INT 0
CONFIG proxy.config.net.sock_recv_buffer_size_out INT 0


You definitely want to tune that mmap setting I mentioned earlier, and a few other interesting sysctl's would be

    net.ipv4.tcp_max_syn_backlog (set it pretty high)
    net.core.somaxconn
    net.ipv4.tcp_syncookies (enable it)
    net.ipv4.ip_local_port_range
    net.ipv4.tcp_ecn (probably want it disabled)
    net.ipv4.tcp_max_tw_buckets (increase for lots of sockets I think)



I think I can reproduce it but under load, so it might be a bit
difficult to debug it especially with all the threads. I will try to
get to a simpler test case to reproduce it. Maybe I can run
traffic_server alone with a single network and io thread ? How do you
guys normally debug it ?


I think we should move the discussions related to this crasher problem to the [email protected] mailing list (information how to subscribe to it is on http://trafficserver.apache.org). There's a wider crowd there that might be able to help as well, in particular John Plevyak knows the cache better than anyone else on the planet.

That much said, if you can reproduce it with restrictions like you mention, that'll certainly help. Or, just describe how to setup the environment, and what "load" to send to it, that might also help. But the more you can limit the parameters / tests / time necessary to reproduce it, the better.

Thanks!

-- leif

Reply via email to