On Wed, Jan 10, 2007 at 12:39:57AM +0000, Ricardo Correia wrote:
> Hi,
> 
> I'm not sure how to control the ARC on the ZFS port to FUSE.
> 
> In the alpha1 release, for testing, I simply set the zfs_arc_max and 
> zfs_arc_min variables to 80 MBs and 64 MBs (respectively) to prevent the ARC 
> from growing unboundedly.
> 
> However, I'm having a problem. A simple run of the following script will 
> cause 
> zfs-fuse memory usage to grow almost indefinitely:
> 
> for i in `seq 1 100000`;
> do
>  touch /pool/testdir/$i
> done
> 
> The problem seems to be that vnodes are getting allocated and never freed.
> 
> From what I understand, and from what I read in the previous thread about a 
> similar issue that Pawel was having, this is what happens in Solaris (and in 
> zfs-fuse, by extension):
> 
> 1) When VN_RELE() is called and vp->v_count reaches 1, VOP_INACTIVE() is 
> called.
> 2) VOP_INACTIVE() calls zfs_inactive() which calls zfs_zinactive().
> 3) zfs_zinactive() calls dmu_buf_rele()
> 4) ??
> 5) znode_pageout_func() calls zfs_znode_free() which finally frees the vnode.
> 
> As for step 4, Mark Maybee mentioned:
> 
> "Note that the db_immediate_evict == 0 means that you
> will probably *not* see a callback to the pageout function immediately.
> This is the general case.  We hold onto the znode (and related memory)
> until the associated disk blocks are evicted from the cache (arc).  The
> cache is likely to hold onto that data until either:
>         - we encounter memory shortage, and so reduce the cache size
>         - we read new data into the cache, and evict this data to
>           make space for it."
> 
> So even if I have a "not very big" cache, there can be a lot of alloc'ed 
> vnodes which consume a lot more memory!
> Of course, if the ARC would somehow take that memory in account when checking 
> zfs_arc_max it would be easier to tune it.

I'm sorry to say that, but I'm happy to see you have the same problem:)
Maybe it will be easier to solve it.

This is one of the latest problems I've. I've spent a lot of time trying
to solve it, but no luck.

In FreeBSD, a subsystem can register vm_lowmem hook, which bascially
means "call me if there is no free memory, so I may be able to free
something". When someone allocates memory with M_WAITOK flag (equivalent
of Solaris' KM_SLEEP) and there is no free memory in the system, the
allocator calls registered vm_lowmem hooks begging for memory.

I register the hook in my port, so I'm called when there is no free
memory. Then, I'm immediately waking the arc_reclaim_thread thread.

It almost work, but is not reliable. Under high load it is possible that
it won't free enough memory (most of the time malloc(131072) is called
then). I am able to reproduce it with 'iozone -a'.

> So, the better question is: is the ARC even helpful for a FUSE filesystem?
> I mean, the Linux kernel is already caching file data, even for FUSE 
> filesystems.

I'm not sure if ZFS can work without ARC, but I'd suggest not to turn it
off. I don't know what algorithm Linux use for caching, but ARC is
really nice and efficient.

> PS: I'm also considering opening the vdevs (the underlying block devices or 
> files) with O_DIRECT, otherwise the kernel will also cache them. Does it make 
> sense?

I'd guess it makes a lot of sense, cache duplication is not a good
thing.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd at FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: 
<http://mail.opensolaris.org/pipermail/zfs-code/attachments/20070110/8197e89c/attachment.bin>

Reply via email to