sorry Pawel Jakub Dawidek,

        I don't read what I type.. :-)

        Mitchell Eeerblich
        ------------------

Erblichs wrote:
> 
> Aawel Kakub Dawidek, et al,
> 
>         First, I am describing a moving target and maybe I
>         am off target for you...
> 
>         The memory consumption output via slab allocator
>         functions is not really correct.
> 
>         Normally, even when memory is freed it is cached until
>         SLEEP memory allocation fails, and then it is
>         re-allocated. Is this your memory leak? So, memory
>         tends to so up as more and more allocated and never
>         decreased from a point, IMO.
> 
>         The assumption is that memory will be allocated if SLEEP
>         is called and then returns..
> 
>         Two standard hash tables allocs via NOSLEEPs in arc.c
>         and ?dbuf.c? within local buf_init()s retry with 1/2
>         values.
> 
>         So, my first memory consumption issue is to decrease to
>         1/4s if failed. And or even starting with 1/2 or 1/4
>         the size of the default hash tables. Minimally generating
>         a message as to what size the hash tables are might
>         tell you something.. I am assuming that your smaller
>         page size or disk blocks might alloc larger hash tables
>         than wanted.
> 
>         Then go thru the functions and identify all of the
>         SLEEP allocs and pre-alloc a working set number of
>         items via one of the slab functions.
> 
>         Then go and change all of the SLEEPs to NOSLEEPs and
>         return failures on memory allocs. You can then retry
>         the allocs at a later time or return ENOMEM. This will
>         keep a responsive system even when low memory has
>         occured. Worst case scenario, hopefully a poorly managed
>         FS is a /opt based FSs that can be offlined instead of
>         panicing the whole box.
> 
>         Also, I am assuming that FSs memory allocs are less
>         important than some other internal / kernel object.
> 
>         You might find that you can't change some of the SLEEPs
>         to NOSLEEP, but doing most will probably delay your
>         problem until you have possibly leaked more mem.
> 
>         This isn't that hard and will remove or hopefully
>         significantly delay your panics..
> 
>         On the other hand, if I remember correctly, VOP_INACTIVE
>         ties to the DNLC and this is a much more complicated fix
>         and would require a code walk thru your current dev code.
> 
>         Mitchell Erblich
>         ------------------
> 
> 
> 
> 
> Pawel Jakub Dawidek wrote:
> >
> > ZFS works really stable on FreeBSD, but I'm biggest problem is how to
> > control ZFS memory usage. I've no idea how to leash that beast.
> >
> > FreeBSD has a backpresure mechanism. I can register my function so it
> > will be called when there are memory problems, which I do. I using it
> > for ARC layer.
> > Even with this in place under heavy load the kernel panics, because
> > memory with KM_SLEEP cannot be allocated.
> >
> > Here are some statistics of memory usage when the panic occurs:
> >
> > zfs_znode_cache: 356 * 11547 = 4110732 bytes
> >   zil_lwb_cache: 176 * 43 = 7568 bytes
> >       arc_buf_t:  20 * 7060 = 141200 bytes
> >   arc_buf_hdr_t: 188 * 7060 = 1327280 bytes
> >         dnode_t: 756 * 162311 = 122707116 bytes !!
> >  dmu_buf_impl_t: 332 * 18649 = 6191468
> >           other: 14432256 bytes (regular kmem_alloc())
> >
> > There is 1GB of RAM, 320MB is for the kernel. 1/3 if kernel memory is
> > configured as ARC's maximum.
> > When it panics, debugger statistics show that there is around 2/3 of
> > this actually allocated, but probably due memory fragmentation.
> >
> > The most important part is dnode_t probably as it looks it doesn't obey
> > any limits. Maybe it is a bug in my port and I'm leaking them somehow?
> > On the other hand when I unload ZFS kernel module, FreeBSD's kernel
> > reports any memory leaks - they exist, but are much, much smaller.
> >
> > There is also quite a lot of znodes, which I'd also like to be able to
> > free and not sure how. In Solaris vnode's life end in VOP_INACTIVE()
> > routine, but znode if kept around. In FreeBSD VOP_INACTIVE() means "puts
> > the vnode onto free vnodes list" and when we want to use this vnode for
> > different file system VOP_RECLAIM() is called and VOP_RECLAIM() will be
> > a good place to free znode as well, if possible.
> >
> > Any ideas how to fix it?
> >
> > --
> > Pawel Jakub Dawidek                       http://www.wheel.pl
> > pjd at FreeBSD.org                           http://www.FreeBSD.org
> > FreeBSD committer                         Am I Evil? Yes, I Am!
> >
> >   ------------------------------------------------------------------------
> >    Part 1.1.2Type: application/pgp-signature
> >
> >   ------------------------------------------------------------------------
> > _______________________________________________
> > zfs-code mailing list
> > zfs-code at opensolaris.org
> > http://opensolaris.org/mailman/listinfo/zfs-code
> _______________________________________________
> zfs-code mailing list
> zfs-code at opensolaris.org
> http://opensolaris.org/mailman/listinfo/zfs-code

Reply via email to