Re: Another ZFS ARC memory question
Quoting Slawa Olhovchenkov s...@zxy.spb.ru (from Thu, 1 Mar 2012 18:28:26 +0400): On Tue, Feb 28, 2012 at 05:14:37AM +1100, Peter Jeremy wrote: * what is the community's advice for production machines running ZFS on FreeBSD, is manually limiting the ARC cache (to ensure that there's enough actually free memory to handle a spike in application memory usage) the best solution to this spike-in-memory-means-crash problem? Are you swapping onto a ZFS vdev? If so, change back to a raw (or geom) device - swapping to ZFS is known to be problematic. If you I see kernel stuck when swapping to ZFS. This is only known problem? This is a known problem. Don't use swap on a zpool. If you want fault tollerance use gmirror for the swap paritions instead (make sure the swap partition does end _before_ the last sector of the disk in this case). Bye, Alexander. -- As of next Thursday, UNIX will be flushed in favor of TOPS-10. Please update your programs. http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Another ZFS ARC memory question
On Fri, 2012-03-02 at 10:25 +0100, Alexander Leidinger wrote: Quoting Slawa Olhovchenkov s...@zxy.spb.ru (from Thu, 1 Mar 2012 18:28:26 +0400): On Tue, Feb 28, 2012 at 05:14:37AM +1100, Peter Jeremy wrote: * what is the community's advice for production machines running ZFS on FreeBSD, is manually limiting the ARC cache (to ensure that there's enough actually free memory to handle a spike in application memory usage) the best solution to this spike-in-memory-means-crash problem? Are you swapping onto a ZFS vdev? We are not swapping onto a ZFS vdev (we've been down that road and know it's a bad idea). Our issue is primarily with ARC cache eviction not happening fast enough or at all when there is a spike in memory usage, causing machines to hang. We are presently working around it by limiting arc_max to 4G on our 24G RAM production boxes (which seems like a massive waste of performance) and by doing very careful/aggressive application level management of memory usage to ensure stability (limits.conf didn't work for us, so we rolled our own). A better solution would be welcome, though, so that we can utilise all the free memory we're presently keeping around as a safety margin - ideally it would be used as ARC cache. Two more questions, again wrt 8.2-RELEASE: 1. Is it expected that with a 4G limited arc_max value that we should see wired memory usage around 7-8G? I understand that the kernel has to use some memory, but really 3-4G of non-ARC data? 2. We have some development machines with only 3G of RAM. Previously they had no arc_max set and were left to tune themselves. They were quite unstable. Now we've set arc_max to 256M but things have got worse: we've seen a disk i/o big performance hit (untarring a ports tarball now takes 20 minutes), and wired memory usage is up around 2.5GB, the machines are swapping a lot, and crashing more frequently. Follows is arc_summary.pl from one of the troubled dev machines showing the ARC using 500% of the memory it should be. Also uname follows. My second question is: have there been fixes between 8.2-RELEASE and 8.3-BETA1 or 9.0-RELEASE which solve this ARC over-usage problem? hybrid@node5:~$ ./arc_summary.pl ZFS Subsystem ReportFri Mar 2 09:55:00 2012 System Memory: 8.92% 264.89 MiB Active, 6.43% 190.75 MiB Inact 80.91% 2.35GiB Wired, 1.97% 58.46 MiB Cache 1.74% 51.70 MiB Free, 0.03% 864.00 KiB Gap Real Installed: 3.00GiB Real Available: 99.56% 2.99GiB Real Managed: 97.04% 2.90GiB Logical Total: 3.00GiB Logical Used: 90.20% 2.71GiB Logical Free: 9.80% 300.91 MiB Kernel Memory: 1.08GiB Data: 98.75% 1.06GiB Text: 1.25% 13.76 MiB Kernel Memory Map: 2.83GiB Size: 26.80% 775.56 MiB Free: 73.20% 2.07GiB Page: 1 ARC Summary: (THROTTLED) Storage pool Version: 15 Filesystem Version: 4 Memory Throttle Count: 53.77m ARC Misc: Deleted:1.99m Recycle Misses: 6.84m Mutex Misses: 6.96k Evict Skips:6.96k ARC Size: 552.16% 1.38GiB Target Size: (Adaptive) 100.00% 256.00 MiB Min Size (Hard Limit): 36.23% 92.75 MiB Max Size (High Water): 2:1 256.00 MiB ARC Size Breakdown: Recently Used Cache Size: 16.97% 239.90 MiB Frequently Used Cache Size: 83.03% 1.15GiB ARC Hash Breakdown: Elements Max: 83.19k Elements Current: 84.72% 70.48k Collisions: 2.53m Chain Max: 9 Chains: 18.94k Page: 2 ARC Efficiency: 126.65m Cache Hit Ratio:95.07% 120.41m Cache Miss Ratio: 4.93% 6.24m Actual Hit Ratio:
Re: Another ZFS ARC memory question
On Tue, Feb 28, 2012 at 05:14:37AM +1100, Peter Jeremy wrote: * what is the community's advice for production machines running ZFS on FreeBSD, is manually limiting the ARC cache (to ensure that there's enough actually free memory to handle a spike in application memory usage) the best solution to this spike-in-memory-means-crash problem? Are you swapping onto a ZFS vdev? If so, change back to a raw (or geom) device - swapping to ZFS is known to be problematic. If you I see kernel stuck when swapping to ZFS. This is only known problem? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Another ZFS ARC memory question
On 2012-Feb-24 11:06:52 +, Luke Marsden luke-li...@hybrid-logic.co.uk wrote: We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines but have been having trouble with short spikes in application memory usage resulting in huge amounts of swapping, bringing the whole machine to its knees and crashing it hard. I suspect this is because when there is a sudden spike in memory usage the zfs arc reclaim thread is unable to free system memory fast enough. There were a large number of fairly serious ZFS bugs that have been fixed since 8.2-RELEASE and I would suggest you look at upgrading. That said, I haven't seen the specific problem you are reporting. * is this a known problem? I'm unaware of it specifically as it relates to ZFS. You don't mention how big the memory usage spike is but unless there is sufficient free+ cache available to cope with a usage spike then you will have problems whether it's UFS or ZFS (though it's possibly worse with ZFS). FreeBSD is known not to cope well with running out of memory. * what is the community's advice for production machines running ZFS on FreeBSD, is manually limiting the ARC cache (to ensure that there's enough actually free memory to handle a spike in application memory usage) the best solution to this spike-in-memory-means-crash problem? Are you swapping onto a ZFS vdev? If so, change back to a raw (or geom) device - swapping to ZFS is known to be problematic. If you have very spiky memory requirements, increasing vm.v_cache_min and/or vm.v_free_reserved might give you better results. * has FreeBSD 9.0 / ZFS v28 solved this problem? The ZFS code is the same in 9.0 and 8.3. Since 8.3 is less of a jump, I'd recommend that you try 8.3-prerelease in a test box and see how it handles your load. Note that there's no need to upgrade your pools from v15 to v28 unless you want the ZFS features - the actual ZFS code is independent of pool version. * rather than setting a hard limit on the ARC cache size, is it possible to adjust the auto-tuning variables to leave more free memory for spiky memory situations? e.g. set the auto-tuning to make arc eat 80% of memory instead of ~95% like it is at present? Memory spikes are absorbed by vm.v_cache_min and vm.v_free_reserved in the first instance. The current vfs.zfs.arc_max default may be a bit high for some workloads but at this point in time, you will need to tune it manually. -- Peter Jeremy pgpb0kzq1SDsY.pgp Description: PGP signature
Another ZFS ARC memory question
Hi all, Just wanted to get your opinion on best practices for ZFS. We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines but have been having trouble with short spikes in application memory usage resulting in huge amounts of swapping, bringing the whole machine to its knees and crashing it hard. I suspect this is because when there is a sudden spike in memory usage the zfs arc reclaim thread is unable to free system memory fast enough. This most recently happened yesterday as you can see from the following munin graphs: E.g. http://hybrid-logic.co.uk/memory-day.png http://hybrid-logic.co.uk/swap-day.png Our response has been to start limiting the ZFS ARC cache to 4GB on our production machines - trading performance for stability is fine with me (and we have L2ARC on SSD so we still get good levels of caching). My questions are: * is this a known problem? * what is the community's advice for production machines running ZFS on FreeBSD, is manually limiting the ARC cache (to ensure that there's enough actually free memory to handle a spike in application memory usage) the best solution to this spike-in-memory-means-crash problem? * has FreeBSD 9.0 / ZFS v28 solved this problem? * rather than setting a hard limit on the ARC cache size, is it possible to adjust the auto-tuning variables to leave more free memory for spiky memory situations? e.g. set the auto-tuning to make arc eat 80% of memory instead of ~95% like it is at present? * could the arc reclaim thread be made to drop ARC pages with higher priority before the system starts swapping out application pages? Thank you for any/all answers, and thank you for making FreeBSD awesome :-) Best Regards, Luke Marsden -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org