On Wed, Jan 10, 2007 at 12:39:57AM +0000, Ricardo Correia wrote: > Hi, > > I'm not sure how to control the ARC on the ZFS port to FUSE. > > In the alpha1 release, for testing, I simply set the zfs_arc_max and > zfs_arc_min variables to 80 MBs and 64 MBs (respectively) to prevent the ARC > from growing unboundedly. > > However, I'm having a problem. A simple run of the following script will > cause > zfs-fuse memory usage to grow almost indefinitely: > > for i in `seq 1 100000`; > do > touch /pool/testdir/$i > done > > The problem seems to be that vnodes are getting allocated and never freed. > > From what I understand, and from what I read in the previous thread about a > similar issue that Pawel was having, this is what happens in Solaris (and in > zfs-fuse, by extension): > > 1) When VN_RELE() is called and vp->v_count reaches 1, VOP_INACTIVE() is > called. > 2) VOP_INACTIVE() calls zfs_inactive() which calls zfs_zinactive(). > 3) zfs_zinactive() calls dmu_buf_rele() > 4) ?? > 5) znode_pageout_func() calls zfs_znode_free() which finally frees the vnode. > > As for step 4, Mark Maybee mentioned: > > "Note that the db_immediate_evict == 0 means that you > will probably *not* see a callback to the pageout function immediately. > This is the general case. We hold onto the znode (and related memory) > until the associated disk blocks are evicted from the cache (arc). The > cache is likely to hold onto that data until either: > - we encounter memory shortage, and so reduce the cache size > - we read new data into the cache, and evict this data to > make space for it." > > So even if I have a "not very big" cache, there can be a lot of alloc'ed > vnodes which consume a lot more memory! > Of course, if the ARC would somehow take that memory in account when checking > zfs_arc_max it would be easier to tune it.
I'm sorry to say that, but I'm happy to see you have the same problem:) Maybe it will be easier to solve it. This is one of the latest problems I've. I've spent a lot of time trying to solve it, but no luck. In FreeBSD, a subsystem can register vm_lowmem hook, which bascially means "call me if there is no free memory, so I may be able to free something". When someone allocates memory with M_WAITOK flag (equivalent of Solaris' KM_SLEEP) and there is no free memory in the system, the allocator calls registered vm_lowmem hooks begging for memory. I register the hook in my port, so I'm called when there is no free memory. Then, I'm immediately waking the arc_reclaim_thread thread. It almost work, but is not reliable. Under high load it is possible that it won't free enough memory (most of the time malloc(131072) is called then). I am able to reproduce it with 'iozone -a'. > So, the better question is: is the ARC even helpful for a FUSE filesystem? > I mean, the Linux kernel is already caching file data, even for FUSE > filesystems. I'm not sure if ZFS can work without ARC, but I'd suggest not to turn it off. I don't know what algorithm Linux use for caching, but ARC is really nice and efficient. > PS: I'm also considering opening the vdevs (the underlying block devices or > files) with O_DIRECT, otherwise the kernel will also cache them. Does it make > sense? I'd guess it makes a lot of sense, cache duplication is not a good thing. -- Pawel Jakub Dawidek http://www.wheel.pl pjd at FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-code/attachments/20070110/8197e89c/attachment.bin>