sorry Pawel Jakub Dawidek, I don't read what I type.. :-)
Mitchell Eeerblich ------------------ Erblichs wrote: > > Aawel Kakub Dawidek, et al, > > First, I am describing a moving target and maybe I > am off target for you... > > The memory consumption output via slab allocator > functions is not really correct. > > Normally, even when memory is freed it is cached until > SLEEP memory allocation fails, and then it is > re-allocated. Is this your memory leak? So, memory > tends to so up as more and more allocated and never > decreased from a point, IMO. > > The assumption is that memory will be allocated if SLEEP > is called and then returns.. > > Two standard hash tables allocs via NOSLEEPs in arc.c > and ?dbuf.c? within local buf_init()s retry with 1/2 > values. > > So, my first memory consumption issue is to decrease to > 1/4s if failed. And or even starting with 1/2 or 1/4 > the size of the default hash tables. Minimally generating > a message as to what size the hash tables are might > tell you something.. I am assuming that your smaller > page size or disk blocks might alloc larger hash tables > than wanted. > > Then go thru the functions and identify all of the > SLEEP allocs and pre-alloc a working set number of > items via one of the slab functions. > > Then go and change all of the SLEEPs to NOSLEEPs and > return failures on memory allocs. You can then retry > the allocs at a later time or return ENOMEM. This will > keep a responsive system even when low memory has > occured. Worst case scenario, hopefully a poorly managed > FS is a /opt based FSs that can be offlined instead of > panicing the whole box. > > Also, I am assuming that FSs memory allocs are less > important than some other internal / kernel object. > > You might find that you can't change some of the SLEEPs > to NOSLEEP, but doing most will probably delay your > problem until you have possibly leaked more mem. > > This isn't that hard and will remove or hopefully > significantly delay your panics.. > > On the other hand, if I remember correctly, VOP_INACTIVE > ties to the DNLC and this is a much more complicated fix > and would require a code walk thru your current dev code. > > Mitchell Erblich > ------------------ > > > > > Pawel Jakub Dawidek wrote: > > > > ZFS works really stable on FreeBSD, but I'm biggest problem is how to > > control ZFS memory usage. I've no idea how to leash that beast. > > > > FreeBSD has a backpresure mechanism. I can register my function so it > > will be called when there are memory problems, which I do. I using it > > for ARC layer. > > Even with this in place under heavy load the kernel panics, because > > memory with KM_SLEEP cannot be allocated. > > > > Here are some statistics of memory usage when the panic occurs: > > > > zfs_znode_cache: 356 * 11547 = 4110732 bytes > > zil_lwb_cache: 176 * 43 = 7568 bytes > > arc_buf_t: 20 * 7060 = 141200 bytes > > arc_buf_hdr_t: 188 * 7060 = 1327280 bytes > > dnode_t: 756 * 162311 = 122707116 bytes !! > > dmu_buf_impl_t: 332 * 18649 = 6191468 > > other: 14432256 bytes (regular kmem_alloc()) > > > > There is 1GB of RAM, 320MB is for the kernel. 1/3 if kernel memory is > > configured as ARC's maximum. > > When it panics, debugger statistics show that there is around 2/3 of > > this actually allocated, but probably due memory fragmentation. > > > > The most important part is dnode_t probably as it looks it doesn't obey > > any limits. Maybe it is a bug in my port and I'm leaking them somehow? > > On the other hand when I unload ZFS kernel module, FreeBSD's kernel > > reports any memory leaks - they exist, but are much, much smaller. > > > > There is also quite a lot of znodes, which I'd also like to be able to > > free and not sure how. In Solaris vnode's life end in VOP_INACTIVE() > > routine, but znode if kept around. In FreeBSD VOP_INACTIVE() means "puts > > the vnode onto free vnodes list" and when we want to use this vnode for > > different file system VOP_RECLAIM() is called and VOP_RECLAIM() will be > > a good place to free znode as well, if possible. > > > > Any ideas how to fix it? > > > > -- > > Pawel Jakub Dawidek http://www.wheel.pl > > pjd at FreeBSD.org http://www.FreeBSD.org > > FreeBSD committer Am I Evil? Yes, I Am! > > > > ------------------------------------------------------------------------ > > Part 1.1.2Type: application/pgp-signature > > > > ------------------------------------------------------------------------ > > _______________________________________________ > > zfs-code mailing list > > zfs-code at opensolaris.org > > http://opensolaris.org/mailman/listinfo/zfs-code > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://opensolaris.org/mailman/listinfo/zfs-code