I'm at a loss. Here's the situation. We have a couple hundred squid clients forced (using never-direct allow all) to run through a parent proxy array. On occasion (no discernable pattern) the squid process on the clients uses all the available CPU cycles, and tailing the access log shows that nothing is being served. Attempting to use the cachemgr.cgi (from a different box) or the squidclient (on the client proxy) returns no results while the CPU usage is running wild. Once the CPU usage slows down, results are returned. Running a squid -k debug (for the usual 2 seconds) and tailing the cache.log seems to show acls being processed. Often in conjunction with these CPU spikes I'll see "TCP connection to [parent-proxy] failed", but not always. Sometimes these connection failed messages appear in the cache log without a CPU spike.
There are two round-robin parent proxies, which peer (using ICP) and a third "master" parent (also peering with ICP) which is running two squid processes (one per CPU) due to some sites not handling "sessions" changing IPs. All three are on the same network switch. They are being monitored with MRTG (http://mrtg.schoolaccess.net/squid/) and never seem to slow down. The connection failed messages don't favor one server over the other. Most of the clients are accessing the internet over satellite, but some are on terrestrial links. The problem does not occur more often with either connection method. I thought that I had found the problem when I read about the half_closed_clients bug, but after patching, compiling and installing SQUID2.5-STABLE7 on a couple of sites, the problem persists (though perhaps less often). Disabling half-closed-clients in the squid.conf, unsurprisingly had no noticeable effect. The version I replaced for testing is SQUID2.5-STABLE4. STABLE7 was compiled with --bindir=/home/squid2/bin --sbindir=/home/squid2/bin --libexecdir=/home/squid2/bin --datadir=/home/squid2/etc --sysconfdir=/etc/squid --localstatedir=/home/squid2 --mandir=/usr/man --enable-ssl --enable-err-languages=English --disable-ident-lookups --with-pthreads --enable-storeio=ufs,aufs,diskd --enable-snmp --enable-async-io --with-aio STABLE4 was compiled with --bindir=/home/squid2/bin --sbindir=/home/squid2/bin --libexecdir=/home/squid2/bin --datadir=/home/squid2/etc --sysconfdir=/etc/squid --localstatedir=/home/squid2 --mandir=/usr/man --enable-xmalloc-statistics --enable-useragent-log --enable-referer-log --enable-err-languages=English --disable-ident-lookups --with-pthreads --enable-storeio=ufs,aufs,diskd In the interest of preventing this message from becoming too long, I have posted the conf files at http://mrtg.schoolaccess.net/squidconf/. The -sparse files have all comments and blank lines stripped out. What further tools can I use (on a Linux 2.2 kernel) to figure out what squid is doing when the CPU usage spikes? What information did I leave out that might be relevant?
