Re: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()
On Wed, Oct 15, 2014 at 11:56:33PM -0600, Justin T. Gibbs wrote: > avg pointed out the rate limiting code in vm_pageout_scan() during discussion > about PR 187594. While it certainly can contribute to the problems discussed > in that PR, a bigger problem is that it can allow the OOM killer to be > triggered even though there is plenty of reclaimable memory available in the > system. Any load that can consume enough pages within the polling interval > to hit the v_free_min threshold (e.g. multiple 'dd if=/dev/zero > of=/file/on/zfs') can make this happen. > > The product I?m working on does not have swap configured and treats any OOM > trigger as fatal, so it is very obvious when this happens. :-) > > I?ve tried several things to mitigate the problem. The first was to ignore > rate limiting for pass 2. However, even though ZFS is guaranteed to receive > some feedback prior to OOM being declared, my testing showed that a trivial > load (a couple dd operations) could still consume enough of the reclaimed > space to leave the system below its target at the end of pass 2. After > removing the rate limiting entirely, I?ve so far been unable to kill the > system via a ZFS induced load. > > I understand the motivation behind the rate limiting, but the current > implementation seems too simplistic to be safe. The documentation for the > Solaris slab allocator provides good motivation for their approach of using a > ?sliding average? to reign in temporary bursts of usage without unduly > harming efficient service for the recorded steady-state memory demand. > Regardless of the approach taken, I believe that the OOM killer must be a > last resort and shouldn?t be called when there are caches that can be culled. > > One other thing I?ve noticed in my testing with ZFS is that it needs feedback > and a little time to react to memory pressure. Calling it?s lowmem handler > just once isn?t enough for it to limit in-flight writes so it can avoid reuse > of pages that it just freed up. But, it doesn?t take too long to react (> > 1sec in the profiling I?ve done). Is there a way in vm_pageout_scan() that > we can better record that progress is being made (pages were freed in the > pass, even if some/all of them were consumed again) and allow more passes > before the OOM killer is invoked in this case? > > ? > Justin https://docs.freebsd.org/cgi/getmsg.cgi?fetch=103436+0+/usr/local/www/db/text/2014/freebsd-hackers/20141012.freebsd-hackers might have some relevance. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()
On 16/10/2014 12:08, Steven Hartland wrote: > Unfortunately ZFS doesn't prevent new inflight writes until it > hits zfs_dirty_data_max, so while what your suggesting will > help, if the writes come in quick enough I would expect it to > still be able to out run the pageout. As I've mentioned, arc_memory_throttle() also plays role in limiting the dirty data. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()
Unfortunately ZFS doesn't prevent new inflight writes until it hits zfs_dirty_data_max, so while what your suggesting will help, if the writes come in quick enough I would expect it to still be able to out run the pageout. - Original Message - From: "Justin T. Gibbs" To: Cc: ; "Andriy Gapon" Sent: Thursday, October 16, 2014 6:56 AM Subject: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan() avg pointed out the rate limiting code in vm_pageout_scan() during discussion about PR 187594. While it certainly can contribute to the problems discussed in that PR, a bigger problem is that it can allow the OOM killer to be triggered even though there is plenty of reclaimable memory available in the system. Any load that can consume enough pages within the polling interval to hit the v_free_min threshold (e.g. multiple 'dd if=/dev/zero of=/file/on/zfs') can make this happen. The product I’m working on does not have swap configured and treats any OOM trigger as fatal, so it is very obvious when this happens. :-) I’ve tried several things to mitigate the problem. The first was to ignore rate limiting for pass 2. However, even though ZFS is guaranteed to receive some feedback prior to OOM being declared, my testing showed that a trivial load (a couple dd operations) could still consume enough of the reclaimed space to leave the system below its target at the end of pass 2. After removing the rate limiting entirely, I’ve so far been unable to kill the system via a ZFS induced load. I understand the motivation behind the rate limiting, but the current implementation seems too simplistic to be safe. The documentation for the Solaris slab allocator provides good motivation for their approach of using a “sliding average” to reign in temporary bursts of usage without unduly harming efficient service for the recorded steady-state memory demand. Regardless of the approach taken, I believe that the OOM killer must be a last resort and shouldn’t be called when there are caches that can be culled. One other thing I’ve noticed in my testing with ZFS is that it needs feedback and a little time to react to memory pressure. Calling it’s lowmem handler just once isn’t enough for it to limit in-flight writes so it can avoid reuse of pages that it just freed up. But, it doesn’t take too long to react (> 1sec in the profiling I’ve done). Is there a way in vm_pageout_scan() that we can better record that progress is being made (pages were freed in the pass, even if some/all of them were consumed again) and allow more passes before the OOM killer is invoked in this case? — Justin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()
On 16/10/2014 08:56, Justin T. Gibbs wrote: > avg pointed out the rate limiting code in vm_pageout_scan() during discussion > about PR 187594. While it certainly can contribute to the problems discussed > in that PR, a bigger problem is that it can allow the OOM killer to be > triggered even though there is plenty of reclaimable memory available in the > system. Any load that can consume enough pages within the polling interval > to hit the v_free_min threshold (e.g. multiple 'dd if=/dev/zero > of=/file/on/zfs') can make this happen. > > The product I’m working on does not have swap configured and treats any OOM > trigger as fatal, so it is very obvious when this happens. :-) > > I’ve tried several things to mitigate the problem. The first was to ignore > rate limiting for pass 2. However, even though ZFS is guaranteed to receive > some feedback prior to OOM being declared, my testing showed that a trivial > load (a couple dd operations) could still consume enough of the reclaimed > space to leave the system below its target at the end of pass 2. After > removing the rate limiting entirely, I’ve so far been unable to kill the > system via a ZFS induced load. > > I understand the motivation behind the rate limiting, but the current > implementation seems too simplistic to be safe. The documentation for the > Solaris slab allocator provides good motivation for their approach of using a > “sliding average” to reign in temporary bursts of usage without unduly > harming efficient service for the recorded steady-state memory demand. > Regardless of the approach taken, I believe that the OOM killer must be a > last resort and shouldn’t be called when there are caches that can be > culled. FWIW, I have this toy branch: https://github.com/avg-I/freebsd/compare/experiment/uma-cache-trimming Not all commits are relevant to the problem and some things are unfinished. Not sure if the changes would help your case either... > One other thing I’ve noticed in my testing with ZFS is that it needs feedback > and a little time to react to memory pressure. Calling it’s lowmem handler > just once isn’t enough for it to limit in-flight writes so it can avoid reuse > of pages that it just freed up. But, it doesn’t take too long to react (> I've been thinking about this and maybe we need to make arc_memory_throttle() more aggressive on FreeBSD. I can't say that I really follow the logic of that code, though. > 1sec in the profiling I’ve done). Is there a way in vm_pageout_scan() that > we can better record that progress is being made (pages were freed in the > pass, even if some/all of them were consumed again) and allow more passes > before the OOM killer is invoked in this case? -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()
On 16/10/2014 08:56, Justin T. Gibbs wrote: > avg pointed out the rate limiting code in vm_pageout_scan() during discussion > about PR 187594. While it certainly can contribute to the problems discussed > in that PR, a bigger problem is that it can allow the OOM killer to be > triggered even though there is plenty of reclaimable memory available in the > system. Any load that can consume enough pages within the polling interval > to hit the v_free_min threshold (e.g. multiple 'dd if=/dev/zero > of=/file/on/zfs') can make this happen. > > The product I’m working on does not have swap configured and treats any OOM > trigger as fatal, so it is very obvious when this happens. :-) > > I’ve tried several things to mitigate the problem. The first was to ignore > rate limiting for pass 2. However, even though ZFS is guaranteed to receive > some feedback prior to OOM being declared, my testing showed that a trivial > load (a couple dd operations) could still consume enough of the reclaimed > space to leave the system below its target at the end of pass 2. After > removing the rate limiting entirely, I’ve so far been unable to kill the > system via a ZFS induced load. > > I understand the motivation behind the rate limiting, but the current > implementation seems too simplistic to be safe. The documentation for the > Solaris slab allocator provides good motivation for their approach of using a > “sliding average” to reign in temporary bursts of usage without unduly > harming efficient service for the recorded steady-state memory demand. > Regardless of the approach taken, I believe that the OOM killer must be a > last resort and shouldn’t be called when there are caches that can be > culled. FWIW, I have this toy branch: https://github.com/avg-I/freebsd/compare/experiment/uma-cache-trimming Not all commits are relevant to the problem and some things are unfinished. Not sure if the changes would help your case either... > One other thing I’ve noticed in my testing with ZFS is that it needs feedback > and a little time to react to memory pressure. Calling it’s lowmem handler > just once isn’t enough for it to limit in-flight writes so it can avoid reuse > of pages that it just freed up. But, it doesn’t take too long to react (> I've been thinking about this and maybe we need to make arc_memory_throttle() more aggressive on FreeBSD. I can't say that I really follow the logic of that code, though. > 1sec in the profiling I’ve done). Is there a way in vm_pageout_scan() that > we can better record that progress is being made (pages were freed in the > pass, even if some/all of them were consumed again) and allow more passes > before the OOM killer is invoked in this case? -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()
avg pointed out the rate limiting code in vm_pageout_scan() during discussion about PR 187594. While it certainly can contribute to the problems discussed in that PR, a bigger problem is that it can allow the OOM killer to be triggered even though there is plenty of reclaimable memory available in the system. Any load that can consume enough pages within the polling interval to hit the v_free_min threshold (e.g. multiple 'dd if=/dev/zero of=/file/on/zfs') can make this happen. The product I’m working on does not have swap configured and treats any OOM trigger as fatal, so it is very obvious when this happens. :-) I’ve tried several things to mitigate the problem. The first was to ignore rate limiting for pass 2. However, even though ZFS is guaranteed to receive some feedback prior to OOM being declared, my testing showed that a trivial load (a couple dd operations) could still consume enough of the reclaimed space to leave the system below its target at the end of pass 2. After removing the rate limiting entirely, I’ve so far been unable to kill the system via a ZFS induced load. I understand the motivation behind the rate limiting, but the current implementation seems too simplistic to be safe. The documentation for the Solaris slab allocator provides good motivation for their approach of using a “sliding average” to reign in temporary bursts of usage without unduly harming efficient service for the recorded steady-state memory demand. Regardless of the approach taken, I believe that the OOM killer must be a last resort and shouldn’t be called when there are caches that can be culled. One other thing I’ve noticed in my testing with ZFS is that it needs feedback and a little time to react to memory pressure. Calling it’s lowmem handler just once isn’t enough for it to limit in-flight writes so it can avoid reuse of pages that it just freed up. But, it doesn’t take too long to react (> 1sec in the profiling I’ve done). Is there a way in vm_pageout_scan() that we can better record that progress is being made (pages were freed in the pass, even if some/all of them were consumed again) and allow more passes before the OOM killer is invoked in this case? — Justin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"