2012-01-09 0:29, John Martin пишет:
On 01/08/12 11:30, Jim Klimov wrote:
However for smaller servers, such as home NASes which have
about one user overall, pre-reading and caching files even
for a single use might be an objective per se - just to let
the hard-disks spin down. Say, if I sit down to watch a
movie from my NAS, it is likely that for 90 or 120 minutes
there will be no other IO initiated by me. The movie file
can be pre-read in a few seconds, and then most of the
storage system can go to sleep.
I can't find such home-NAS usage uncommon, because I am
my own example user - so I see this pattern often ;)
Isn't this just a more extreme case of prediction?
Probably is, and this is probably not a task for only ZFS,
but for logic outside it. There are some requirements
that ZFS should meet, in order for this to work, though.
In addition to the file system knowing there will only
be one client reading 90-120 minutes of (HD?) video
that will fit in the memory of a small(er) server,
now the hard drive power management code also knows there
won't be another access for 90-120 minutes so it is OK
to spin down the hard drive(s).
Well, in the original post I did suggest that the prediction
logic might go into scripting or some other user-level tool.
And it should, really, to keep the kernel clean and slim.
The "predictor" might be as simple as a DTrace file access
monitor, which would "cat" or "tar" files into /dev/null.
I.e. if it detected access to "*.(avi|mkv|wmv)", then it
should cat the file. If it detected "*.(mp3|ogg|jpg)" it
should tar the parent directory. Might be dumb and still
sufficiently efficient ;)
However, for such usecases this tool would need some
"guarantees" from ZFS. One would be that the read-ahead
data will find its way into caches and won't be evicted
for no reason (when there's no other RAM pressure).
This means that the tool should be able to read all the
data and metadata required by ZFS, so that no more disk
access is required if it's all in cache.
It might require a tunable in ZFS for home-NAS users
which would disable current "no-caching" for detected
streaming reads: we need the opposite of that behavior.
Another part is HDD power-management, which reportedly
works in Solaris, allowing disks to spin down when there
was no access for some time. Probably there is a syscall
to do this on-demand as well...
On a side note, for home-NASes or other not-heavily-used
storage servers, it would be wonderful to be able to cache
small writes into ZIL devices, if present, and not flush
them onto the main pool until some megabyte limit is
reached (i.e. ZIL is full), or a pool export/import event
occurs. This would allow main disk arrays to remain idle
for a long time while small sporadic writes which are
initiated by the OS (logs, atimes, web-browser cache
files, whatever), and have these writes persistently
stored in ZIL. Essentially, this would be like setting
TXG-commit times to practical infinity, and actually
commit based on bytecount limits. One possible difference
would be not-streaming larger writes to pool disks at once,
but also storing them in dedicated ZIL.
zfs-discuss mailing list