Re: [squid-users] Squid losing connectivity for 30 seconds

Amos Jeffries Wed, 23 Nov 2011 03:37:39 -0800

On 23/11/2011 11:11 p.m., Elie Merhej wrote:

 Hi,
I am currently facing a problem that I wasn't able to find asolution for in the mailing list or on the internet,My squid is dying for 30 seconds every one hour at the same exacttime, squid process will still be running,I lose my wccp connectivity, the cache peers detect the squid as adead sibling, and the squid cannot server any requestsThe network connectivity of the sever is not affected (a ping to thesquid's ip doesn't timeout)
The problem doesn't start immediately when the squid is installed onthe server (The server is dedicated as a squid)
It starts when the cache directories starts to fill up,
I have started my setup with 10 cache directors, the squid willstart having the problem when the cache directories are above 50%filledwhen i change the number of cache directory (9,8,...) the squidworks for a while then the same problem
cache_dir aufs /cache1/squid 90000 140 256
cache_dir aufs /cache2/squid 90000 140 256
cache_dir aufs /cache3/squid 90000 140 256
cache_dir aufs /cache4/squid 90000 140 256
cache_dir aufs /cache5/squid 90000 140 256
cache_dir aufs /cache6/squid 90000 140 256
cache_dir aufs /cache7/squid 90000 140 256
cache_dir aufs /cache8/squid 90000 140 256
cache_dir aufs /cache9/squid 90000 140 256
cache_dir aufs /cache10/squid 80000 140 256

I have 1 terabyte of storage
Finally I created two cache dircetories (One on each HDD) but theproblem persisted
You have 2 HDD?  but, but, you have 10 cache_dir.
We repeatedly say "one cache_dir per disk" or similar. In particularone cache_dir per physical drive spindle (for "disks" made up ofmultiple physical spindles) wherever possible with physicaldrives/spindles mounting separately to ensure the pairing. Squidperforms a very unusual pattern of disk I/O which stress them down tothe hardware controller level and make this kind of detail criticalfor anything like good speed. Avoiding cache_dir object limitationsby adding more UFS-based dirs to one disk does not improve thesituation.
That is a problem which will be affecting your Squid all the timethough, possibly making the source of the pause worse.
From teh description I believe it is garbage collection on the cachedirectories. The pauses can be visible when garbage collecting anycaches over a few dozen GB. The squid default "swap_high" and"swap_low" values are "5" apart, with at minimum being a value of 0apart. These are whole % points of the total cache size, being erasedfrom disk in a somewhat random-access style across the cache area. Idid mention uncommon disk I/O patterns, right?
To be sure what it is, you can use the "strace" tool to the squidworker process (the second PID in current stable Squids) and see whatis running. But given the hourly regularity and past experience withothers on similar cache sizes, I'm almost certain its the garbagecollection.
Amos
Hi Amos,

Thank you for your fast reply,
I have 2 HDD (450GB and 600GB)
df -h displays that i have 357Gb and 505GB available
In my last test, my cache dir where:
cache_swap_low 90
cache_swap_high 95

This is not. For anything more than 10-20 GB I recommend setting it tono more than 1 apart, possibly the same value if that works.Squid has a light but CPU-intensive and possibly long garbage removalcycle above cache_swap_low, and a much more aggressive but faster andless CPU intensive removal above cache_swap_high. On large caches it isbetter in terms of downtime going straight to the aggressive removal andclearing disk space fast, despite the bandwidth cost replacing any itemsthe light removal would have left.



Amos

Re: [squid-users] Squid losing connectivity for 30 seconds

Reply via email to