On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:

Sun has opened internal CR 6859997.  It is now in Dispatched state at High 
priority.

CR 6859997 has been accepted and is actively being worked on. The following info has been added to that CR:

This is a problem with the ZFS file prefetch code (zfetch) in dmu_zfetch.c.  
The test script provided by the submitter (thanks Bob!) does no file 
prefetching the second time through each file.  This problem exists in ZFS in 
Solaris 10, Nevada, and OpenSolaris.

This test script creates 3000 files each 8M long so the amount of data (24G) is 
greater than the amount of memory (16G on a Thumper). With the default 
blocksize of 128k, each of the 3000 files has 63  blocks.  The first time 
through, zfetch ramps up a single prefetch stream normally.  But the second 
time through, dmu_zfetch() calls  dmu_zfetch_find() which thinks that the data 
has already been prefetched so no additional prefetching is started.

This problem is not seen with 500 files each 48M in length (still 24G of data).  In that 
case there's still only one prefetch stream but it is reclaimed when one of the requested 
offsets is not found.  The reason it is not found is that stream "strided" the 
first time through after reaching the zfetch cap, which is 256 blocks.  Files with no 
more than 256 blocks don't require a stride.  So this problem will only be seen when the 
data from a file with no more than 256 blocks is accessed after being tossed from the ARC.

The fix for this problem may be more feedback between the ARC and the zfetch 
code.  Or it may make sense to restart the prefetch stream after some time has 
passed or perhaps whenever there's a miss on a block that was expected to have 
already been prefetched?

On a Thumper running Nevada build 118, the first pass of this test takes 2 
minutes 50 seconds and the second pass takes 5 minutes 22 seconds.  If 
dmu_zfetch_find() is modified to restart the refetch stream when the requested 
offset is 0 and more than 2 seconds has passed since the stream was last 
accessed then the time needed for the second pass is reduced to 2 minutes 24 
seconds.

Additional investigation is currently taking place to determine if another 
solution makes more sense.  And more testing will be needed to see what affect 
this change has on other prefetch patterns.

6412053 is a related CR which mentions that the zfetch code may not be issuing 
I/O at a sufficient pace.  This behavior is also seen on a Thumper running the 
test script in CR 6859997 since, even when prefetch is ramping up as expected, 
less than half of the available I/O bandwidth is being used.  Although more 
aggressive file prefetching could increase memory pressure as described in CRs 
6258102 and 6469558.


-- Rich
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to