On 09/16/2010 08:09 PM, Pranav Desai wrote:
On
Done. Using 2 disks. Do you recommend a raid config for better performance ?
No, I recommend RAID only if you really, really care for redundancy /
reliability. ATS will automatically "load balance" across any disks that
you throw at it.
Here are my tcp mem parameters. And req/sec isnt of concern here so I
should be ok with the listen queue and backlogs. If you have any
particular setting in mind please let me know.
net.ipv4.tcp_mem = 1339776 1786368 2679552
net.ipv4.tcp_wmem = 4096 87380 8388608
net.ipv4.tcp_rmem = 4096 87380 8388608
These would only make a difference if there's a latency between the
client and the server (which is generally not the case in a "lab"). The
above are for the autoscaling window sizes I think, which seem
reasonable. There are settings in records.config to bump up the initial
window size for the connections:
CONFIG proxy.config.net.sock_send_buffer_size_in INT 262144
CONFIG proxy.config.net.sock_recv_buffer_size_in INT 0
CONFIG proxy.config.net.sock_send_buffer_size_out INT 0
CONFIG proxy.config.net.sock_recv_buffer_size_out INT 0
You definitely want to tune that mmap setting I mentioned earlier, and a
few other interesting sysctl's would be
net.ipv4.tcp_max_syn_backlog (set it pretty high)
net.core.somaxconn
net.ipv4.tcp_syncookies (enable it)
net.ipv4.ip_local_port_range
net.ipv4.tcp_ecn (probably want it disabled)
net.ipv4.tcp_max_tw_buckets (increase for lots of sockets I think)
I think I can reproduce it but under load, so it might be a bit
difficult to debug it especially with all the threads. I will try to
get to a simpler test case to reproduce it. Maybe I can run
traffic_server alone with a single network and io thread ? How do you
guys normally debug it ?
I think we should move the discussions related to this crasher problem
to the [email protected] mailing list (information how to
subscribe to it is on http://trafficserver.apache.org). There's a wider
crowd there that might be able to help as well, in particular John
Plevyak knows the cache better than anyone else on the planet.
That much said, if you can reproduce it with restrictions like you
mention, that'll certainly help. Or, just describe how to setup the
environment, and what "load" to send to it, that might also help. But
the more you can limit the parameters / tests / time necessary to
reproduce it, the better.
Thanks!
-- leif