Hi, I'm having problems with ssh-agent when I am connecting to a large (several hundred) hosts at once. I'm using a kanif ( http://taktuk.gforge.inria.fr/kanif/) which is a very nice package that distributes ssh connections across the hosts you are connecting to (a fan-out sort of approach, so all connections are not coming from one host). However, all hosts have to authenticate, so all the hosts have to wind their way back to the ssh-agent. This problem isn't isolated to just kanif, however. I see it when using other utilities that rely on many concurrent connections to the ssh-agent.
running strace on the ssh-agent, things start out ok, then go sour and it starts spitting out: read(160, 0xbf8f300a, 1024) = -1 EAGAIN (Resource temporarily unavailable) read(160, 0xbf8f300a, 1024) = -1 EAGAIN (Resource temporarily unavailable) read(160, 0xbf8f300a, 1024) = -1 EAGAIN (Resource temporarily unavailable) while pegging the cpu. Tracking the number of connections to the agent once every second (while true; do netstat -an | grep -c <agent socket name>; sleep 1) looks like: 5 5 5 35 98 154 155 200 287 287 at that point I kill the agent, but it will stick at that value if I don't. It's not always 287, but varies. I've seen it as high as 447 connections at once, but it's usually in the 200 range. I've tried different ssh-agents on different kernels and machines, and haven't found a combination that works. However, I have tried it on a FreeBSD box which did not have the problem. It seems to me that I'm hitting some kind of kernel limit (open file limit perhaps?) But I've fiddled with various sysctl values with no good results. Has anyone ran across this, or have any further debugging suggestions? --Bob /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
