Ricardo Correia wrote: > Hi, > > I'm not sure how to control the ARC on the ZFS port to FUSE. > > In the alpha1 release, for testing, I simply set the zfs_arc_max and > zfs_arc_min variables to 80 MBs and 64 MBs (respectively) to prevent the ARC > from growing unboundedly. > > However, I'm having a problem. A simple run of the following script will > cause > zfs-fuse memory usage to grow almost indefinitely: > > for i in `seq 1 100000`; > do > touch /pool/testdir/$i > done > > The problem seems to be that vnodes are getting allocated and never freed. > > From what I understand, and from what I read in the previous thread about a > similar issue that Pawel was having, this is what happens in Solaris (and in > zfs-fuse, by extension): > > 1) When VN_RELE() is called and vp->v_count reaches 1, VOP_INACTIVE() is > called. > 2) VOP_INACTIVE() calls zfs_inactive() which calls zfs_zinactive(). > 3) zfs_zinactive() calls dmu_buf_rele() > 4) ?? > 5) znode_pageout_func() calls zfs_znode_free() which finally frees the vnode. > > As for step 4, Mark Maybee mentioned: > > "Note that the db_immediate_evict == 0 means that you > will probably *not* see a callback to the pageout function immediately. > This is the general case. We hold onto the znode (and related memory) > until the associated disk blocks are evicted from the cache (arc). The > cache is likely to hold onto that data until either: > - we encounter memory shortage, and so reduce the cache size > - we read new data into the cache, and evict this data to > make space for it." > > So even if I have a "not very big" cache, there can be a lot of alloc'ed > vnodes which consume a lot more memory! > Of course, if the ARC would somehow take that memory in account when checking > zfs_arc_max it would be easier to tune it. > Its not just the vnodes, there are also znodes, dnodes and dbufs to consider. The bottom line is that 64 MB of vnode-related ARC data can tie up 192MB of other memory. We are in the process of making the ARC more aware of these extra overheads.
> So, the better question is: is the ARC even helpful for a FUSE filesystem? > I mean, the Linux kernel is already caching file data, even for FUSE > filesystems. > > Maybe I should try to disable the ARC? Or set it to a very small maximum size? > > Or should I have a thread that monitors memory usage and calls > arc_kmem_reclaim() when it reaches a certain point? If this is the case, I > don't know how to determine what is considered excessive memory usage, since > we could be running in a computer with as little or as much memory as.. well, > Linux can handle. > This is done on Solaris. We have a thread that monitors the free memory available on the machine, and tries to keep the ARC usage in check. In your FUSE implementation you may want to minimally track the size of the vnode/znode/dnode/dbuf cache usage. In general, this is just a hard problem. You want to keep as much data in the ARC as possible for best performance, while not impacting applications that need memory. The better integration you can get between the ARC and the VM system, the happier you will be. In the long term, Sun will likely migrate the ARC functionality completely into the VM system. -Mark