On Tue, Sep 03, 2019 at 06:01:06PM -0400, Mark Johnston wrote: > > > have you considered running UMA_RECLAIM_TRIM periodically, even without a > > > memory > > > pressure? > > > I think that with such a periodic trimming there will be less need to > > > invoke > > > vm_lowmem(). > > Slawa and I talked about this in the past. His complaint is that a > large cache can take a significant amount of time to trim, and it
Not only large cache. Also I am see large mbuf cache (10GB+ after pike network activites) and many other zones. For example, live server w/ stable/11, cache sizes of zones in MB: 54 RADIX NODE 55 zio_data_buf_131072 73 zio_data_buf_12288 77 zio_data_buf_98304 93 socket 99 tcpcb 1072 mbuf 1136 zio_buf_131072 1443 zio_data_buf_1048576 17242 mbuf_jumbo_page > manifests as a spike of CPU usage and contention on the zone lock. In > particular, keg_drain() iterates over the list of free slabs with the > keg lock held, and if the many items were freed to the keg while > trimming/draining, the list can be quite long. This can have effects > outside the zone, for example if we are reclaiming items from zones used > by other UMA zones, like the bucket or slab zones. > > Reclaiming cached items when there is no demand for free pages seems > wrong to me. We historically had similar problems with the page daemon, > which last year was changed to perform smaller reclamations at a greater > frequency. I suspect a better approach for UMA would be to similarly > increase reclaim frequency and reduce the number of items freed in one > go. My goal is next: 1. Today memory size is quite big. 64GB is minimal, 256GB is not extraordinary. As result memory processed at lowmem event can be very big compared to historicaly. 2. Memory reclaiming is very expensetive at last stage. As result reclaiming some 10GB can take 10s and more 3. Memory depletion may be very fast at current speed (40Gbit network connectivity is 5GB/s). As result (2+3) memory reclaiming at lowmem event may be slow for depletion compensation. 4. Many susbsystem now try do't cause lowmem by automatic memory depletion. Large unused in zone cache memory cause inefficent memory use (see above -- about 18GB memory may be used as cache or other way but currently just in zone cache. lowmem don't caused because all consumer try to rest sufficient free memory) 5. NUMA dramatize situation because (as I see) memory allocation can give from dom1 and free to dom0. As result zone cache in dom0 rise and don't used. Currently kernel not fully NUMA-avare and need many work. 6. All of this can cause exause memory below vmd_free_reserved and slowly many operation in kernel. I am see all of this. > > > Also, I think that we would be able to retire (or re-purpose) > > > lowmem_period. > > > E.g., the trimming would be done every lowmem_period, but vm_lowmem() > > > would not > > > be throttled. > > Some of the vm_lowmem eventhandlers probably shouldn't be called each > time the page daemon scans the inactive queue (every 0.1s under memory > pressure). ufsdirhash_lowmem and mb_reclaim in particular don't seem > like they need to be invoked very frequently. We could easily define > multiple eventhandlers to differentiate between these cases, though. > > > > One example of the throttling of vm_lowmem being bad is its interaction > > > with the > > > ZFS ARC. When there is a spike in memory usage we want the ARC to adapt > > > as > > > quickly as possible. But at present the lowmem_period logic interferes > > > with that. > > > > Some time ago, I sent Mark a patch that implements this logic, > > specialy for ARC and mbuf cooperate. > > > > Mostly problem I am see at this > > work -- very slowly vm_page_free(). May be currenly this is more > > speedy... > > How did you determine this? This is you guess: ====== > while ((slab = SLIST_FIRST(&freeslabs)) != NULL) { > SLIST_REMOVE(&freeslabs, slab, uma_slab, us_hlink); > keg_free_slab(keg, slab, keg->uk_ipers); > } > 2019 Feb 2 19:49:54.800524364 zio_data_buf_1048576 1032605 > cache_reclaim limit 100 dom 0 nitems 1672 imin 298 > 2019 Feb 2 19:49:54.800524364 zio_data_buf_1048576 1033736 > cache_reclaim recla 149 dom 0 nitems 1672 imin 298 > 2019 Feb 2 19:49:54.802524468 zio_data_buf_1048576 3119710 > cache_reclaim limit 100 dom 1 nitems 1 imin 0 > 2019 Feb 2 19:49:54.802524468 zio_data_buf_1048576 3127550 keg_drain2 > 2019 Feb 2 19:49:54.803524487 zio_data_buf_1048576 4444219 keg_drain3 > 2019 Feb 2 19:49:54.838524634 zio_data_buf_1048576 39553705 keg_drain4 > 2019 Feb 2 19:49:54.838524634 zio_data_buf_1048576 39565323 > zone_reclaim:return > > 35109.486 ms for last loop, 149 items to freed. 35ms to free 149MB (38144 4KB pages), so roughly 1us per page. That does seem like a lot, but freeing a page (vm_page_free(m)) is much more expensive than freeing an item to UMA (i.e., uma_zfree()). Most of that time will be spent in _kmem_unback(). ====== > You are on stable/12 I believe, so r350374 might help if you do not > already have it. Not try this at this moment. > I guess the vm_page_free() calls are coming from the UMA trimmer? Indirect, from keg_drain() _______________________________________________ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"