sorry Pawel Jakub Dawidek,
I don't read what I type.. :-)
Mitchell Eeerblich
------------------
Erblichs wrote:
>
> Aawel Kakub Dawidek, et al,
>
> First, I am describing a moving target and maybe I
> am off target for you...
>
> The memory consumption output via slab allocator
> functions is not really correct.
>
> Normally, even when memory is freed it is cached until
> SLEEP memory allocation fails, and then it is
> re-allocated. Is this your memory leak? So, memory
> tends to so up as more and more allocated and never
> decreased from a point, IMO.
>
> The assumption is that memory will be allocated if SLEEP
> is called and then returns..
>
> Two standard hash tables allocs via NOSLEEPs in arc.c
> and ?dbuf.c? within local buf_init()s retry with 1/2
> values.
>
> So, my first memory consumption issue is to decrease to
> 1/4s if failed. And or even starting with 1/2 or 1/4
> the size of the default hash tables. Minimally generating
> a message as to what size the hash tables are might
> tell you something.. I am assuming that your smaller
> page size or disk blocks might alloc larger hash tables
> than wanted.
>
> Then go thru the functions and identify all of the
> SLEEP allocs and pre-alloc a working set number of
> items via one of the slab functions.
>
> Then go and change all of the SLEEPs to NOSLEEPs and
> return failures on memory allocs. You can then retry
> the allocs at a later time or return ENOMEM. This will
> keep a responsive system even when low memory has
> occured. Worst case scenario, hopefully a poorly managed
> FS is a /opt based FSs that can be offlined instead of
> panicing the whole box.
>
> Also, I am assuming that FSs memory allocs are less
> important than some other internal / kernel object.
>
> You might find that you can't change some of the SLEEPs
> to NOSLEEP, but doing most will probably delay your
> problem until you have possibly leaked more mem.
>
> This isn't that hard and will remove or hopefully
> significantly delay your panics..
>
> On the other hand, if I remember correctly, VOP_INACTIVE
> ties to the DNLC and this is a much more complicated fix
> and would require a code walk thru your current dev code.
>
> Mitchell Erblich
> ------------------
>
>
>
>
> Pawel Jakub Dawidek wrote:
> >
> > ZFS works really stable on FreeBSD, but I'm biggest problem is how to
> > control ZFS memory usage. I've no idea how to leash that beast.
> >
> > FreeBSD has a backpresure mechanism. I can register my function so it
> > will be called when there are memory problems, which I do. I using it
> > for ARC layer.
> > Even with this in place under heavy load the kernel panics, because
> > memory with KM_SLEEP cannot be allocated.
> >
> > Here are some statistics of memory usage when the panic occurs:
> >
> > zfs_znode_cache: 356 * 11547 = 4110732 bytes
> > zil_lwb_cache: 176 * 43 = 7568 bytes
> > arc_buf_t: 20 * 7060 = 141200 bytes
> > arc_buf_hdr_t: 188 * 7060 = 1327280 bytes
> > dnode_t: 756 * 162311 = 122707116 bytes !!
> > dmu_buf_impl_t: 332 * 18649 = 6191468
> > other: 14432256 bytes (regular kmem_alloc())
> >
> > There is 1GB of RAM, 320MB is for the kernel. 1/3 if kernel memory is
> > configured as ARC's maximum.
> > When it panics, debugger statistics show that there is around 2/3 of
> > this actually allocated, but probably due memory fragmentation.
> >
> > The most important part is dnode_t probably as it looks it doesn't obey
> > any limits. Maybe it is a bug in my port and I'm leaking them somehow?
> > On the other hand when I unload ZFS kernel module, FreeBSD's kernel
> > reports any memory leaks - they exist, but are much, much smaller.
> >
> > There is also quite a lot of znodes, which I'd also like to be able to
> > free and not sure how. In Solaris vnode's life end in VOP_INACTIVE()
> > routine, but znode if kept around. In FreeBSD VOP_INACTIVE() means "puts
> > the vnode onto free vnodes list" and when we want to use this vnode for
> > different file system VOP_RECLAIM() is called and VOP_RECLAIM() will be
> > a good place to free znode as well, if possible.
> >
> > Any ideas how to fix it?
> >
> > --
> > Pawel Jakub Dawidek http://www.wheel.pl
> > pjd at FreeBSD.org http://www.FreeBSD.org
> > FreeBSD committer Am I Evil? Yes, I Am!
> >
> > ------------------------------------------------------------------------
> > Part 1.1.2Type: application/pgp-signature
> >
> > ------------------------------------------------------------------------
> > _______________________________________________
> > zfs-code mailing list
> > zfs-code at opensolaris.org
> > http://opensolaris.org/mailman/listinfo/zfs-code
> _______________________________________________
> zfs-code mailing list
> zfs-code at opensolaris.org
> http://opensolaris.org/mailman/listinfo/zfs-code