Aawel Kakub Dawidek, et al,

        First, I am describing a moving target and maybe I
        am off target for you...

        The memory consumption output via slab allocator 
        functions is not really correct.

        Normally, even when memory is freed it is cached until 
        SLEEP memory allocation fails, and then it is 
        re-allocated. Is this your memory leak? So, memory
        tends to so up as more and more allocated and never
        decreased from a point, IMO.

        The assumption is that memory will be allocated if SLEEP
        is called and then returns..

        Two standard hash tables allocs via NOSLEEPs in arc.c
        and ?dbuf.c? within local buf_init()s retry with 1/2
        values.

        So, my first memory consumption issue is to decrease to
        1/4s if failed. And or even starting with 1/2 or 1/4
        the size of the default hash tables. Minimally generating
        a message as to what size the hash tables are might
        tell you something.. I am assuming that your smaller
        page size or disk blocks might alloc larger hash tables
        than wanted.

        Then go thru the functions and identify all of the
        SLEEP allocs and pre-alloc a working set number of
        items via one of the slab functions.

        Then go and change all of the SLEEPs to NOSLEEPs and
        return failures on memory allocs. You can then retry
        the allocs at a later time or return ENOMEM. This will
        keep a responsive system even when low memory has
        occured. Worst case scenario, hopefully a poorly managed
        FS is a /opt based FSs that can be offlined instead of
        panicing the whole box.

        Also, I am assuming that FSs memory allocs are less
        important than some other internal / kernel object.

        You might find that you can't change some of the SLEEPs
        to NOSLEEP, but doing most will probably delay your
        problem until you have possibly leaked more mem.

        This isn't that hard and will remove or hopefully
        significantly delay your panics..

        On the other hand, if I remember correctly, VOP_INACTIVE
        ties to the DNLC and this is a much more complicated fix
        and would require a code walk thru your current dev code.

        Mitchell Erblich
        ------------------

        
        

Pawel Jakub Dawidek wrote:
> 
> ZFS works really stable on FreeBSD, but I'm biggest problem is how to
> control ZFS memory usage. I've no idea how to leash that beast.
> 
> FreeBSD has a backpresure mechanism. I can register my function so it
> will be called when there are memory problems, which I do. I using it
> for ARC layer.
> Even with this in place under heavy load the kernel panics, because
> memory with KM_SLEEP cannot be allocated.
> 
> Here are some statistics of memory usage when the panic occurs:
> 
> zfs_znode_cache: 356 * 11547 = 4110732 bytes
>   zil_lwb_cache: 176 * 43 = 7568 bytes
>       arc_buf_t:  20 * 7060 = 141200 bytes
>   arc_buf_hdr_t: 188 * 7060 = 1327280 bytes
>         dnode_t: 756 * 162311 = 122707116 bytes !!
>  dmu_buf_impl_t: 332 * 18649 = 6191468
>           other: 14432256 bytes (regular kmem_alloc())
> 
> There is 1GB of RAM, 320MB is for the kernel. 1/3 if kernel memory is
> configured as ARC's maximum.
> When it panics, debugger statistics show that there is around 2/3 of
> this actually allocated, but probably due memory fragmentation.
> 
> The most important part is dnode_t probably as it looks it doesn't obey
> any limits. Maybe it is a bug in my port and I'm leaking them somehow?
> On the other hand when I unload ZFS kernel module, FreeBSD's kernel
> reports any memory leaks - they exist, but are much, much smaller.
> 
> There is also quite a lot of znodes, which I'd also like to be able to
> free and not sure how. In Solaris vnode's life end in VOP_INACTIVE()
> routine, but znode if kept around. In FreeBSD VOP_INACTIVE() means "puts
> the vnode onto free vnodes list" and when we want to use this vnode for
> different file system VOP_RECLAIM() is called and VOP_RECLAIM() will be
> a good place to free znode as well, if possible.
> 
> Any ideas how to fix it?
> 
> --
> Pawel Jakub Dawidek                       http://www.wheel.pl
> pjd at FreeBSD.org                           http://www.FreeBSD.org
> FreeBSD committer                         Am I Evil? Yes, I Am!
> 
>   ------------------------------------------------------------------------
>    Part 1.1.2Type: application/pgp-signature
> 
>   ------------------------------------------------------------------------
> _______________________________________________
> zfs-code mailing list
> zfs-code at opensolaris.org
> http://opensolaris.org/mailman/listinfo/zfs-code

Reply via email to