Aawel Kakub Dawidek, et al, First, I am describing a moving target and maybe I am off target for you...
The memory consumption output via slab allocator functions is not really correct. Normally, even when memory is freed it is cached until SLEEP memory allocation fails, and then it is re-allocated. Is this your memory leak? So, memory tends to so up as more and more allocated and never decreased from a point, IMO. The assumption is that memory will be allocated if SLEEP is called and then returns.. Two standard hash tables allocs via NOSLEEPs in arc.c and ?dbuf.c? within local buf_init()s retry with 1/2 values. So, my first memory consumption issue is to decrease to 1/4s if failed. And or even starting with 1/2 or 1/4 the size of the default hash tables. Minimally generating a message as to what size the hash tables are might tell you something.. I am assuming that your smaller page size or disk blocks might alloc larger hash tables than wanted. Then go thru the functions and identify all of the SLEEP allocs and pre-alloc a working set number of items via one of the slab functions. Then go and change all of the SLEEPs to NOSLEEPs and return failures on memory allocs. You can then retry the allocs at a later time or return ENOMEM. This will keep a responsive system even when low memory has occured. Worst case scenario, hopefully a poorly managed FS is a /opt based FSs that can be offlined instead of panicing the whole box. Also, I am assuming that FSs memory allocs are less important than some other internal / kernel object. You might find that you can't change some of the SLEEPs to NOSLEEP, but doing most will probably delay your problem until you have possibly leaked more mem. This isn't that hard and will remove or hopefully significantly delay your panics.. On the other hand, if I remember correctly, VOP_INACTIVE ties to the DNLC and this is a much more complicated fix and would require a code walk thru your current dev code. Mitchell Erblich ------------------ Pawel Jakub Dawidek wrote: > > ZFS works really stable on FreeBSD, but I'm biggest problem is how to > control ZFS memory usage. I've no idea how to leash that beast. > > FreeBSD has a backpresure mechanism. I can register my function so it > will be called when there are memory problems, which I do. I using it > for ARC layer. > Even with this in place under heavy load the kernel panics, because > memory with KM_SLEEP cannot be allocated. > > Here are some statistics of memory usage when the panic occurs: > > zfs_znode_cache: 356 * 11547 = 4110732 bytes > zil_lwb_cache: 176 * 43 = 7568 bytes > arc_buf_t: 20 * 7060 = 141200 bytes > arc_buf_hdr_t: 188 * 7060 = 1327280 bytes > dnode_t: 756 * 162311 = 122707116 bytes !! > dmu_buf_impl_t: 332 * 18649 = 6191468 > other: 14432256 bytes (regular kmem_alloc()) > > There is 1GB of RAM, 320MB is for the kernel. 1/3 if kernel memory is > configured as ARC's maximum. > When it panics, debugger statistics show that there is around 2/3 of > this actually allocated, but probably due memory fragmentation. > > The most important part is dnode_t probably as it looks it doesn't obey > any limits. Maybe it is a bug in my port and I'm leaking them somehow? > On the other hand when I unload ZFS kernel module, FreeBSD's kernel > reports any memory leaks - they exist, but are much, much smaller. > > There is also quite a lot of znodes, which I'd also like to be able to > free and not sure how. In Solaris vnode's life end in VOP_INACTIVE() > routine, but znode if kept around. In FreeBSD VOP_INACTIVE() means "puts > the vnode onto free vnodes list" and when we want to use this vnode for > different file system VOP_RECLAIM() is called and VOP_RECLAIM() will be > a good place to free znode as well, if possible. > > Any ideas how to fix it? > > -- > Pawel Jakub Dawidek http://www.wheel.pl > pjd at FreeBSD.org http://www.FreeBSD.org > FreeBSD committer Am I Evil? Yes, I Am! > > ------------------------------------------------------------------------ > Part 1.1.2Type: application/pgp-signature > > ------------------------------------------------------------------------ > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://opensolaris.org/mailman/listinfo/zfs-code