On Mon, Jul 20, 2009 at 7:52 PM, Bob Friesenhahn wrote:
Sun has opened internal CR 6859997. It is now in Dispatched state at High priority.
CR 6859997 has been accepted and is actively being worked on. The following info has been added to that CR:
This is a problem with the ZFS file prefetch code (zfetch) in dmu_zfetch.c. The test script provided by the submitter (thanks Bob!) does no file prefetching the second time through each file. This problem exists in ZFS in Solaris 10, Nevada, and OpenSolaris. This test script creates 3000 files each 8M long so the amount of data (24G) is greater than the amount of memory (16G on a Thumper). With the default blocksize of 128k, each of the 3000 files has 63 blocks. The first time through, zfetch ramps up a single prefetch stream normally. But the second time through, dmu_zfetch() calls dmu_zfetch_find() which thinks that the data has already been prefetched so no additional prefetching is started. This problem is not seen with 500 files each 48M in length (still 24G of data). In that case there's still only one prefetch stream but it is reclaimed when one of the requested offsets is not found. The reason it is not found is that stream "strided" the first time through after reaching the zfetch cap, which is 256 blocks. Files with no more than 256 blocks don't require a stride. So this problem will only be seen when the data from a file with no more than 256 blocks is accessed after being tossed from the ARC. The fix for this problem may be more feedback between the ARC and the zfetch code. Or it may make sense to restart the prefetch stream after some time has passed or perhaps whenever there's a miss on a block that was expected to have already been prefetched? On a Thumper running Nevada build 118, the first pass of this test takes 2 minutes 50 seconds and the second pass takes 5 minutes 22 seconds. If dmu_zfetch_find() is modified to restart the refetch stream when the requested offset is 0 and more than 2 seconds has passed since the stream was last accessed then the time needed for the second pass is reduced to 2 minutes 24 seconds. Additional investigation is currently taking place to determine if another solution makes more sense. And more testing will be needed to see what affect this change has on other prefetch patterns. 6412053 is a related CR which mentions that the zfetch code may not be issuing I/O at a sufficient pace. This behavior is also seen on a Thumper running the test script in CR 6859997 since, even when prefetch is ramping up as expected, less than half of the available I/O bandwidth is being used. Although more aggressive file prefetching could increase memory pressure as described in CRs 6258102 and 6469558. -- Rich _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss