> Moving this to squid-dev due to increasingly propellerhead-like > content... :) > > Looking over the code and some debugging output, it's pretty clear > what's happening here. > > The carpSelectParent() function does the appropriate hashing of each > URL+parent hash and the requisite ranking of the results. To determine > whether or not the highest-hash-value parent is the parent that > should, in fact, be returned, it uses peerHTTPOkay() as its test. > > The problem here is that peerHTTPOkay only returns 0 if the peer in > question has been marked DEAD; carpSelectParent has no way of knowing > if the peer is down unless squid has "officially" marked it DEAD. > > So, if the highest-ranked peer is a peer that is refusing connections > but isn't marked DEAD yet, then peer_select tries to use it, and when > it fails, falls back to ANY_PARENT - this actually shows up in the > access.log, which I didn't realize when I initially sent this in. Once > we've tried to hit the parent 10 times, we officially mark it DEAD, > and then carpSelectParent() does the Right Thing. > > So, we have a couple option here as far as how to resolve this: > > 1. Adjust PEER_TCP_MAGIC_COUNT from 10 to 1, so that a parent is > marked DEAD after only one failure. This may be overly sensitive > however. Alternatively, carpSelectParent() can check peer->tcp_up and > disqualify the peer if it's not equal to PEER_TCP_MAGIC_COUNT; this > will have a similar effect without going through the overhead of > actually marking the peer DEAD and then "reviving" it.
Patches went in recently to make that setting a squid.conf option. Squid-3: http://www.squid-cache.org/Versions/v3/HEAD/changesets/b9678.patch Squid-2: http://www.squid-cache.org/Versions/v2/HEAD/changesets/12208.patch http://www.squid-cache.org/Versions/v2/HEAD/changesets/12209.patch > > 2. Somehow have carpSelectParent() return the entire sorted list of > peers, so that if the to choice is found to be down, then > peer_select() already knows where to go next... > > 3. Add some special-case code (I'm guessing this would be either in > forward.c or peer_select.c) so that if a connection to a peer selected > by carpSelectParent() fails, then increment a counter (which would be > unique to that request) and call carpSelectParent() again. This > counter can be used in carpPeerSelect to ignore the X highest-ranked > entries. Once this peer gets officially declared DEAD, this becomes > moot. > > Personally, I'm partial to #3, but other approaches are welcome :) > I'm partial to #2. But not for any particular reason. Patches for either #2 or #3 are welcome. Amos