Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails
Gear! I am testing it with 4.1 since UFS and AUFS are great but... doesn't support SMP. Eliezer * another thread on the way to the list. Eliezer Croitoru Linux System Administrator Mobile: +972-5-28704261 Email: elie...@ngtech.co.il -Original Message- From: Alex Rousskov [mailto:rouss...@measurement-factory.com] Sent: Friday, July 13, 2018 3:24 AM To: squid-users@lists.squid-cache.org Cc: Eliezer Croitoru Subject: Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails On 07/12/2018 06:20 PM, Eliezer Croitoru wrote: > From the docs: > http://www.squid-cache.org/Versions/v4/cfgman/cache_swap_low.html > > I see that this is only for UFS/AUFS/diskd and not rock cache_dir. > What about rock cache_dir? Rock cache_dirs cannot overflow by design. Rock reserves a configured amount of disk space and uses nothing but that amount of disk space. Due to optimistic allocation by file systems, you can still run out of disk space if something else consumes space on the same partition, but the rock database itself cannot overflow. Alex. > -Original Message- > From: squid-users [mailto:squid-users-boun...@lists.squid-cache.org] On > Behalf Of Amos Jeffries > Sent: Friday, July 13, 2018 1:33 AM > To: squid-users@lists.squid-cache.org > Subject: Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but > uses 100% of partition and fails > > On 13/07/18 04:16, Alex Rousskov wrote: >> On 07/12/2018 05:53 AM, pete dawgg wrote: >> >> >>> When there is no traffic squid seems to cleaning up well enough: over >>> night (no traffic) disk usage went down to 30GB (now it's at 50GB >>> again) >> >> This may be a sign that your Squid cannot keep up with the load. IIRC, >> AUFS uses lazy garbage collection so it is possible for the stream of >> new objects to outpace the stream of object deletion events, resulting >> in a gradually increasing cache size. Using even more aggressive >> cache_swap_high might help, but there is no good configuration solution >> to this UFS problem AFAIK. >> > > FYI, to be more aggressive place the two limits closer together. > > I made the removal rate grow in steps of the difference between the > marks. A low of 60 and high of 70 means there are 4 steps of 10 between > 60% and 100% full cache - so Squid will be removing 4*200 objects/sec > when the cache is 99.999% full. But a low of 90 and high 91 will remove > 10*200 objects/sec at the same full point. > > Low numbers like 60, 70 etc are only needed now if you have to push the > removal rate past 2K objects/sec - eg low 60 high 61 will be removing > 40*200 = 8K objects/sec. > > > If you know your peak traffic rate in req/sec you should be able to tune > the purge rate to match that peak traffic rate. The speed traffic > reaches that peak should inform what the gap is between the watermarks. > > Amos > ___ > squid-users mailing list > squid-users@lists.squid-cache.org > http://lists.squid-cache.org/listinfo/squid-users > > ___ > squid-users mailing list > squid-users@lists.squid-cache.org > http://lists.squid-cache.org/listinfo/squid-users > ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails
On 07/12/2018 06:20 PM, Eliezer Croitoru wrote: > From the docs: > http://www.squid-cache.org/Versions/v4/cfgman/cache_swap_low.html > > I see that this is only for UFS/AUFS/diskd and not rock cache_dir. > What about rock cache_dir? Rock cache_dirs cannot overflow by design. Rock reserves a configured amount of disk space and uses nothing but that amount of disk space. Due to optimistic allocation by file systems, you can still run out of disk space if something else consumes space on the same partition, but the rock database itself cannot overflow. Alex. > -Original Message- > From: squid-users [mailto:squid-users-boun...@lists.squid-cache.org] On > Behalf Of Amos Jeffries > Sent: Friday, July 13, 2018 1:33 AM > To: squid-users@lists.squid-cache.org > Subject: Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but > uses 100% of partition and fails > > On 13/07/18 04:16, Alex Rousskov wrote: >> On 07/12/2018 05:53 AM, pete dawgg wrote: >> >> >>> When there is no traffic squid seems to cleaning up well enough: over >>> night (no traffic) disk usage went down to 30GB (now it's at 50GB >>> again) >> >> This may be a sign that your Squid cannot keep up with the load. IIRC, >> AUFS uses lazy garbage collection so it is possible for the stream of >> new objects to outpace the stream of object deletion events, resulting >> in a gradually increasing cache size. Using even more aggressive >> cache_swap_high might help, but there is no good configuration solution >> to this UFS problem AFAIK. >> > > FYI, to be more aggressive place the two limits closer together. > > I made the removal rate grow in steps of the difference between the > marks. A low of 60 and high of 70 means there are 4 steps of 10 between > 60% and 100% full cache - so Squid will be removing 4*200 objects/sec > when the cache is 99.999% full. But a low of 90 and high 91 will remove > 10*200 objects/sec at the same full point. > > Low numbers like 60, 70 etc are only needed now if you have to push the > removal rate past 2K objects/sec - eg low 60 high 61 will be removing > 40*200 = 8K objects/sec. > > > If you know your peak traffic rate in req/sec you should be able to tune > the purge rate to match that peak traffic rate. The speed traffic > reaches that peak should inform what the gap is between the watermarks. > > Amos > ___ > squid-users mailing list > squid-users@lists.squid-cache.org > http://lists.squid-cache.org/listinfo/squid-users > > ___ > squid-users mailing list > squid-users@lists.squid-cache.org > http://lists.squid-cache.org/listinfo/squid-users > ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails
Hey Amos, From the docs: http://www.squid-cache.org/Versions/v4/cfgman/cache_swap_low.html I see that this is only for UFS/AUFS/diskd and not rock cache_dir. What about rock cache_dir? Eliezer Eliezer Croitoru Linux System Administrator Mobile: +972-5-28704261 Email: elie...@ngtech.co.il -Original Message- From: squid-users [mailto:squid-users-boun...@lists.squid-cache.org] On Behalf Of Amos Jeffries Sent: Friday, July 13, 2018 1:33 AM To: squid-users@lists.squid-cache.org Subject: Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails On 13/07/18 04:16, Alex Rousskov wrote: > On 07/12/2018 05:53 AM, pete dawgg wrote: > > >> When there is no traffic squid seems to cleaning up well enough: over >> night (no traffic) disk usage went down to 30GB (now it's at 50GB >> again) > > This may be a sign that your Squid cannot keep up with the load. IIRC, > AUFS uses lazy garbage collection so it is possible for the stream of > new objects to outpace the stream of object deletion events, resulting > in a gradually increasing cache size. Using even more aggressive > cache_swap_high might help, but there is no good configuration solution > to this UFS problem AFAIK. > FYI, to be more aggressive place the two limits closer together. I made the removal rate grow in steps of the difference between the marks. A low of 60 and high of 70 means there are 4 steps of 10 between 60% and 100% full cache - so Squid will be removing 4*200 objects/sec when the cache is 99.999% full. But a low of 90 and high 91 will remove 10*200 objects/sec at the same full point. Low numbers like 60, 70 etc are only needed now if you have to push the removal rate past 2K objects/sec - eg low 60 high 61 will be removing 40*200 = 8K objects/sec. If you know your peak traffic rate in req/sec you should be able to tune the purge rate to match that peak traffic rate. The speed traffic reaches that peak should inform what the gap is between the watermarks. Amos ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails
On 13/07/18 04:16, Alex Rousskov wrote: > On 07/12/2018 05:53 AM, pete dawgg wrote: > > >> When there is no traffic squid seems to cleaning up well enough: over >> night (no traffic) disk usage went down to 30GB (now it's at 50GB >> again) > > This may be a sign that your Squid cannot keep up with the load. IIRC, > AUFS uses lazy garbage collection so it is possible for the stream of > new objects to outpace the stream of object deletion events, resulting > in a gradually increasing cache size. Using even more aggressive > cache_swap_high might help, but there is no good configuration solution > to this UFS problem AFAIK. > FYI, to be more aggressive place the two limits closer together. I made the removal rate grow in steps of the difference between the marks. A low of 60 and high of 70 means there are 4 steps of 10 between 60% and 100% full cache - so Squid will be removing 4*200 objects/sec when the cache is 99.999% full. But a low of 90 and high 91 will remove 10*200 objects/sec at the same full point. Low numbers like 60, 70 etc are only needed now if you have to push the removal rate past 2K objects/sec - eg low 60 high 61 will be removing 40*200 = 8K objects/sec. If you know your peak traffic rate in req/sec you should be able to tune the purge rate to match that peak traffic rate. The speed traffic reaches that peak should inform what the gap is between the watermarks. Amos ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails
On 07/12/2018 05:53 AM, pete dawgg wrote: > I have set workers 8 just recently; but the disk full error had > definitely been occuring before. AUFS cache_dirs are not compatible with SMP Squid. Removing workers was the right thing to do even if that incompatibility was not causing disk overflows. > FATAL: Ipc::Mem::Segment::open failed to shm_open(/squid-cf__metadata.shm): > (2) No such file or directory > This error seems to occur when the disk is really full and squid is restarted. Ah, then it could be a side effect of poor PID management (and associated shared resource locking) in Squid v3. You can probably ignore this error until you fix the restarts. FWIW, Squid v4 addressed those shortcomings. > When there is no traffic squid seems to cleaning up well enough: over > night (no traffic) disk usage went down to 30GB (now it's at 50GB > again) This may be a sign that your Squid cannot keep up with the load. IIRC, AUFS uses lazy garbage collection so it is possible for the stream of new objects to outpace the stream of object deletion events, resulting in a gradually increasing cache size. Using even more aggressive cache_swap_high might help, but there is no good configuration solution to this UFS problem AFAIK. > There was another error i just fixed: >> FATAL: Failed to open swap log /mnt/cache/squid/swap.state.new > Not a permissions or diskspace problem, caused by workers 8. > I have deactivated workers 8 and this error went away. Yes, that error is one of the signs that AUFS cache_dirs are not SMP-aware. Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails
THX for your reply! > Betreff: Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but > uses 100% of partition and fails > > On 07/11/2018 04:39 AM, pete dawgg wrote: > > > cache_dir aufs /mnt/cache/squid 75000 16 256 > > > FATAL: Ipc::Mem::Segment::open failed to shm_open(/squid-cf__metadata.shm): > > (2) No such file or directory > > If you are using a combination of an SMP-unaware disk cache (AUFS) with > SMP features such as multiple workers or a shared memory cache, please > note that this combination is not supported. I have set workers 8 just recently; but the disk full error had definitely been occuring before. > The FATAL message above is about a shared memory segment used for > collapsed forwarding. IIRC, Squid v3 attempted to create those segments > even if they were not needed, so I cannot tell for sure whether you are > using an unsupported combination of SMP/non-SMP features. > > I can tell you that you cannot use a combination of collapsed > forwarding, AUFS cache_dir, and multiple workers. Also, non-SMP > collapsed forwarding was primarily tested with UFS cache_dirs. I was not aware of that - i can de-activate the workers 8 setting again. "Collapsed forwarding" was not set intentionally. This error seems to occur when the disk is really full and squid is restarted. > > Unfortunately, I cannot answer your question regarding overflowing AUFS > cache directories. One possibility is that Squid is not cleaning up old > cache files fast enough. You already set cache_swap_low/cache_swap_high > aggressively. Does Squid actively remove objects from the full disk > cache when you start it up _without_ any traffic? If not, it could be a > Squid bug. Unfortunately, nobody has worked on AUFS code for years > (AFAIK) so it may be difficult to fix anything that might be broken there. When there is no traffic squid seems to cleaning up well enough: over night (no traffic) disk usage went down to 30GB (now it's at 50GB again) There was another error i just fixed: > FATAL: Failed to open swap log /mnt/cache/squid/swap.state.new Not a permissions or diskspace problem, caused by workers 8. I have deactivated workers 8 and this error went away. THX for your input! pete ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails
On 07/11/2018 04:39 AM, pete dawgg wrote: > cache_dir aufs /mnt/cache/squid 75000 16 256 > FATAL: Ipc::Mem::Segment::open failed to shm_open(/squid-cf__metadata.shm): > (2) No such file or directory If you are using a combination of an SMP-unaware disk cache (AUFS) with SMP features such as multiple workers or a shared memory cache, please note that this combination is not supported. The FATAL message above is about a shared memory segment used for collapsed forwarding. IIRC, Squid v3 attempted to create those segments even if they were not needed, so I cannot tell for sure whether you are using an unsupported combination of SMP/non-SMP features. I can tell you that you cannot use a combination of collapsed forwarding, AUFS cache_dir, and multiple workers. Also, non-SMP collapsed forwarding was primarily tested with UFS cache_dirs. Unfortunately, I cannot answer your question regarding overflowing AUFS cache directories. One possibility is that Squid is not cleaning up old cache files fast enough. You already set cache_swap_low/cache_swap_high aggressively. Does Squid actively remove objects from the full disk cache when you start it up _without_ any traffic? If not, it could be a Squid bug. Unfortunately, nobody has worked on AUFS code for years (AFAIK) so it may be difficult to fix anything that might be broken there. Cheers, Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails
On 11/07/18 22:39, pete dawgg wrote: > Hello list, > > i run squid 3.5.27 with some special settings for windows updates as > suggested here: > https://wiki.squid-cache.org/ConfigExamples/Caching/WindowsUpdates It's been > running almost trouble-free for some time, but for ~2 months the > cache-partition has been filling up to 100% (space; inodes were OK) and squid > then failed. > That implies that either your cache_dir size accounting is VERY badly broken, something else is filling the disk (eg failing to rotate swap.state journals), or disk purging is not able to keep up with the traffic flow. > the cache-dir is on a 100GB ext2-partition and configured like this: > Hmm, a partition. What else is using the same physical disk? Squid puts such random I/O pattern on cache disks its best not to be using the actual physical drive for other things in parallel - they can slow Squid down, and conversely Squid can cause problems to other uses by flooding the disk controller queues. > cache_dir aufs /mnt/cache/squid 75000 16 256 These numbers do matter for ext2 more than for other FS types. You need them to be large enough not to allocate too many inodes per directory. I would use "64 256" here, or even "128 256" for a bigger safety margin. (I *think* modern ext2 implementations have resolved the core issue, but that may be wrong and ext2 is old enough to be wary.) > cache_swap_low 60 > cache_swap_high 75 > minimum_object_size 0 KB > maximum_object_size 6000 MB If you bumped this for the Win8 sizes mentioned in our wiki, the Win10 major updates have bumped sizes up again past 10GB. So you may need to increase this. > > some special settings for the windows updates: > range_offset_limit 6000 MB Add the ACLs necessary to restrict this to WU traffic. Its really hard on cache space**, so should not be allowed to just any traffic. ** What I mean by that is it may result in N parallel fetches of the entire object unless collapsed forwarding feature is used. In regards to your situation; consider a 10GB WU object being fetched 10 times -> 10*10 GB of disk space required just to fetch. Which over-fills your available 45GB (60% of 75000 MB [cache_swap_low/100 * cache_dir] ). And 11 will overflow your whole disk. > maximum_object_size 6000 MB > quick_abort_min -1 > quick_abort_max -1 > quick_abort_pct -1 > > when i restart squid with its initscript it sometimes expunges some stuff > from the cache but then fails again after a short while: > before restart: > /dev/sdb299G 93G 863M 100% /mnt/cache > after restart: > /dev/sdb299G 87G 7,4G 93% /mnt/cache > How much of that /mnt/cache size is in /mnt/cache/squid ? Is it one physical HDD spindle (versus a RAID drive) ? > > there are two types of errors in cache.log: > FATAL: Ipc::Mem::Segment::open failed to shm_open(/squid-cf__metadata.shm): (2) No such file or directory The cf__metadata.shm error is quite bad - it means your collapsed forwarding is now working well. Which implies it is not preventing the disk overflow on parallel huge WU fetches. Are you able to try the new Squid-4? there are some collapsed forwarding and cache management changes that may fix or allow better diagnosis of these particularly and maybe your disk usage problem. > FATAL: Failed to rename log file /mnt/cache/squid/swap.state.new to /mnt/cache/squid/swap.state This is suspicious, how large are those swap files? Does your proxy have correct access permissions on them and the directories in their path - both Unix filesystem and SELinux / AppArmour / whatever your system uses for advanced access matter here. Same things to check for the /dev/shm device and *.shm file access error above. But /dev/shm should be root things rather than Squid user access. > > What should i do to make squid work with windows updates reliably again? Some other things you can check; You can try to make the cache_swap_high/low be closer together and much larger (eg the default 90 and 95 values). Current 3.5 have fixed the bug which made smaller values necessary on some earlier installs. If you can afford the delays it introduces to restart, you could run a full scan of the cached data (stop Squid, delete the swap.state* files, then restart Squid and wait). - you could do that with a copy of Squid not handling user traffic if necessary, but the running one cannot use the cache while its happening. Otherwise, have you tried purging the entire cache and starting Squid with a clean slate? that would be a lot faster for recovery than the above scan. But does have a bit more bandwidth spent short-term while re-filling the cache. Amos ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
[squid-users] squid 3.5.27 does not respect cache_dir-size but uses 100% of partition and fails
Hello list, i run squid 3.5.27 with some special settings for windows updates as suggested here: https://wiki.squid-cache.org/ConfigExamples/Caching/WindowsUpdates It's been running almost trouble-free for some time, but for ~2 months the cache-partition has been filling up to 100% (space; inodes were OK) and squid then failed. the cache-dir is on a 100GB ext2-partition and configured like this: cache_dir aufs /mnt/cache/squid 75000 16 256 cache_swap_low 60 cache_swap_high 75 minimum_object_size 0 KB maximum_object_size 6000 MB some special settings for the windows updates: range_offset_limit 6000 MB maximum_object_size 6000 MB quick_abort_min -1 quick_abort_max -1 quick_abort_pct -1 when i restart squid with its initscript it sometimes expunges some stuff from the cache but then fails again after a short while: before restart: /dev/sdb299G 93G 863M 100% /mnt/cache after restart: /dev/sdb299G 87G 7,4G 93% /mnt/cache there are two types of errors in cache.log: FATAL: Ipc::Mem::Segment::open failed to shm_open(/squid-cf__metadata.shm): (2) No such file or directory FATAL: Failed to rename log file /mnt/cache/squid/swap.state.new to /mnt/cache/squid/swap.state What should i do to make squid work with windows updates reliably again? ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users