Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-11 Thread Jim Klimov

2012-01-11 1:26, Jim Klimov пишет:

To follow on the subject of VDEV caching, even if
only of metadata, in oi_148a, I have found the
disabling entry in /etc/system of the LiveUSB:

set zfs:zfs_vdev_cache_size=0


Now that I have the cache turned on and my scrub
continues, cache efficiency so far happens to be
75%. Not bad for a feature turned off by default:

# kstat -p zfs:0:vdev_cache_stats
zfs:0:vdev_cache_stats:class misc
zfs:0:vdev_cache_stats:crtime 60.67302806
zfs:0:vdev_cache_stats:delegations 22619
zfs:0:vdev_cache_stats:hits 32989
zfs:0:vdev_cache_stats:misses 10676
zfs:0:vdev_cache_stats:snaptime 39898.161717983

//Jim


And at this moment I can guess the caching effect
becomes incredible (at least for a feature disabled
and dismissed at useless/harmful) - if I read the
numbers correctly, a 99+% cache hit ratio with
just VDEV prereads:

# kstat -p zfs:0:vdev_cache_stats
zfs:0:vdev_cache_stats:classmisc
zfs:0:vdev_cache_stats:crtime   60.67302806
zfs:0:vdev_cache_stats:delegations  23398
zfs:0:vdev_cache_stats:hits 1309308
zfs:0:vdev_cache_stats:misses   11592
zfs:0:vdev_cache_stats:snaptime 89207.679698161

True, the task (scrubbing) is metadata-intensive :)
Still, for the future, when beginning a scrub the
system might auto-tune or at least suggest to enable
the VDEV prefetch, perhaps with larger strokes)...

BTW, what does the delegations field mean? ;)


--


++
||
| Климов Евгений, Jim Klimov |
| технический директор   CTO |
| ЗАО ЦОС и ВТ  JSC COSHT |
||
| +7-903-7705859 (cellular)  mailto:jimkli...@cos.ru |
|CC:ad...@cos.ru,jimkli...@gmail.com |
++
| ()  ascii ribbon campaign - against html mail  |
| /\- against microsoft attachments  |
++


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-10 Thread Jim Klimov

To follow on the subject of VDEV caching, even if
only of metadata, in oi_148a, I have found the
disabling entry in /etc/system of the LiveUSB:

set zfs:zfs_vdev_cache_size=0


Now that I have the cache turned on and my scrub
continues, cache efficiency so far happens to be
75%. Not bad for a feature turned off by default:

# kstat -p zfs:0:vdev_cache_stats
zfs:0:vdev_cache_stats:classmisc
zfs:0:vdev_cache_stats:crtime   60.67302806
zfs:0:vdev_cache_stats:delegations  22619
zfs:0:vdev_cache_stats:hits 32989
zfs:0:vdev_cache_stats:misses   10676
zfs:0:vdev_cache_stats:snaptime 39898.161717983

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-09 Thread John Martin

On 01/08/12 20:10, Jim Klimov wrote:


Is it true or false that: ZFS might skip the cache and
go to disks for streaming reads?


I don't believe this was ever suggested.  Instead, if
data is not already in the file system cache and a
large read is made from disk should the file system
put this data into the cache?

BTW, I chose the term streaming to be a subset
of sequential where the access pattern is sequential but
at what appears to be artificial time intervals.
The suggested pre-read of the entire file would
be a simple sequential read done as quickly
as the hardware allows.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-09 Thread Jim Klimov

Thanks for the replies, some more questions follow.

Your answers below seem to contradict each other somewhat.
Is it true that:
1) VDEV cache before b70 used to contain a full copy
   of prefetched disk contents,

2) VDEV cache since b70 analyzes the prefetched sectors
   and only keeps metadata blocks,

3) VDEV cache since b148 is disabled by default?

So in fact currently we only have file-level intelligent
prefetching?

On my older systems I fired kstat -p zfs:0:vdev_cache_stats
and saw hit/miss ratios ranging from 30% to 70%. On the oi_148a
box I do indeed see all-zeros.

While I do understand the implications of VDEV-caching lots
of disks on systems with inadequate RAM, I tend to find this
feature useful on smaller systems - like home-NASes. It is
essentially free in terms of mechanical seeks, as well as
in RAM (what is 60-100Mb for a small box at home?) and any
nonzero hit ratio that speeds up the system seems justifiable ;)

I've tried playing with the options on my oi_148a LiveUSB
repair boot, and got varying results:

VDEV is indeed disabled by default, but can be enabled.
My system is scrubbing now, so it's got a few cache hits
(about 10%) right away.

root@openindiana:~# echo zfs_vdev_cache_size/W0t1000 | mdb -kw
zfs_vdev_cache_size:0   =   0x989680

root@openindiana:~# kstat -p zfs:0:vdev_cache_stats
zfs:0:vdev_cache_stats:classmisc
zfs:0:vdev_cache_stats:crtime   65.042318652
zfs:0:vdev_cache_stats:delegations  72
zfs:0:vdev_cache_stats:hits 11
zfs:0:vdev_cache_stats:misses   158
zfs:0:vdev_cache_stats:snaptime 114232.782154249

However, trying to increase the prefetch size hung my system
almost immediately (in a couple of seconds). I'm away from
it now, so I'll ask for a photo of the console screen :)

root@openindiana:~# echo zfs_vdev_cache_max/W0t16384 | mdb -kw
zfs_vdev_cache_max: 0x4000  =   0x4000
root@openindiana:~# echo zfs_vdev_cache_bshift/W0t20 | mdb -kw
zfs_vdev_cache_bshift:  0x10=   0x14


So there are deeper questions:
1) As of Illumos bug #175 (as well as OpenSolaris b148 and
   if known - Solaris 11), is the vdev prefetch feature
   *removed* from codebase (no as of oi_148a, what about
   others?), or disabled by default (i.e. limit is preset
   to 0, tune it yourself)?

2) If it is only disabled, are there solid plans to remove
   it, or can we vote to keep it for those interested? :)

3) If the feature is present and gets enabled, how would
   VDEV prefetch play along with file prefetch, again? ;)

4) Is there some tuneable (after b70) to enable prefetching
   and keeping of user-data as well (not only metadata)?
   Perhaps only so that I could test it with my use-patterns
   to make sure that caching generic sectors is useless for
   me, and I really should revert to caching only metadata?

5) Would it make sense to increase zfs_vdev_cache_bshift?
   For example, when I tried to set it to 20 and prefetch
   a whole 1MB of data, why would that cause the system
   to die? Can it increase cache hit ratios (if works)?

6) Does the VDEV cache keep ZFS blocks or disk sectors?
   For example, on my 4k disks the blocks are 4k, even
   though there are a few hundred bytes worth of data in
   metadata blocks and 3+KB of slack space.

7) Modern HDDs often have 32-64Mb DRAM cache onboard.
   Is there any reason to match VDEV cache size with that
   in any way (1:1, 2:1, etc)?

Thanks again,
//Jim Klimov


2012-01-09 6:06, Richard Elling wrote:

On Jan 8, 2012, at 5:10 PM, Jim Klimov wrote:

2012-01-09 4:14, Richard Elling пишет:

On Jan 7, 2012, at 8:59 AM, Jim Klimov wrote:


I wonder if it is possible (currently or in the future as an RFE)
to tell ZFS to automatically read-ahead some files and cache them
in RAM and/or L2ARC?


See discussions on the ZFS intelligent prefetch algorithm. I think Ben 
Rockwood's
description is the best general description:
http://www.cuddletech.com/blog/pivot/entry.php?id=1040

And a more engineer-focused description is at:
http://www.solarisinternals.com/wiki/index.php/ZFS_Performance#Intelligent_prefetch
  -- richard


Thanks for the pointers. While I've seen those articles
(in fact, one of the two non-spam comments in Ben's
blog was mine), rehashing the basics is always useful ;)

Still, how does VDEV prefetch play along with File-level
Prefetch?


Trick question… it doesn't. vdev prefetching is disabled in opensolaris b148, 
illumos,
and Solaris 11 releases. The benefits of having the vdev cache for large 
numbers of
disks does not appear to justify the cost. See
http://wesunsolve.net/bugid/id/6684116
https://www.illumos.org/issues/175


For example, if ZFS prefetched 64K from disk
at the SPA level, and those sectors luckily happen to
contain next blocks of a streaming-read file, would
the file-level prefetch take the data from RAM cache
or still request them from the disk?


As of b70, vdev_cache only contains metadata. See

Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-09 Thread John Martin

On 01/08/12 10:15, John Martin wrote:


I believe Joerg Moellenkamp published a discussion
several years ago on how L1ARC attempt to deal with the pollution
of the cache by large streaming reads, but I don't have
a bookmark handy (nor the knowledge of whether the
behavior is still accurate).


http://www.c0t0d0s0.org/archives/5329-Some-insight-into-the-read-cache-of-ZFS-or-The-ARC.html
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-09 Thread Jim Klimov

2012-01-09 18:15, John Martin пишет:

On 01/08/12 20:10, Jim Klimov wrote:


Is it true or false that: ZFS might skip the cache and
go to disks for streaming reads?

  (The more I think
 about it, the more senseless this sentence seems, and
 I might have just mistaken it with ZIL writes of bulk
 data).


I don't believe this was ever suggested. Instead, if
data is not already in the file system cache and a
large read is made from disk should the file system
put this data into the cache?


Hmmm... perhaps THIS is what I could mistake it for...

Thus the correct version of the question goes like this:
is it true or false that some large reads from disk can
be deemed by ZFS as too big and rare to cache in ARC?
If yes, what conditions are checked to mark a read as
such? Can this behavior be disabled in order to try and
cache every read (further subject to normal eviction
due to MRU/MFU/memory pressure and other considerations)?

Thanks again,
//Jim Klimov


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-08 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Jim Klimov
 
 I wonder if it is possible (currently or in the future as an RFE)
 to tell ZFS to automatically read-ahead some files and cache them
 in RAM and/or L2ARC?
 
 One use-case would be for Home-NAS setups where multimedia (video
 files or catalogs of images/music) are viewed form a ZFS box. For
 example, if a user wants to watch a film, or listen to a playlist
 of MP3's, or push photos to a wall display (photo frame, etc.),
 the storage box should read-ahead all required data from HDDs
 and save it in ARC/L2ARC. Then the HDDs can spin down for hours
 while the pre-fetched gigabytes of data are used by consumers
 from the cache. End-users get peace, quiet and less electricity
 used while they enjoy their multimedia entertainment ;)

This whole subject is important and useful - and not unique to ZFS.  The
whole question is, how can the system predict which things are going to be
requested next?

In the case of a video - there's a big file which is likely to be read
sequentially.  I don't know how far readahead currently will read ahead, but
it is surely only smart enough to stay within a single file.  If the
readahead buffer starts to get low, and the disks have been spun down, I
don't know how low the buffer gets before it will trigger more readahead.
But at least in the case of streaming video files, there's a very realistic
possibility that something like the existing readahead can do what you want.

In the case of your MP3 collection...  Probably the only thing you can do is
to write a script which will simply go read all the files you predict will
be read soon.  The key here is the prediction - There's no way ZFS or
solaris, or any other OS in the present day is going to intelligently
predict which files you'll be requesting soon.  But you, the user, who knows
your usage patterns, might be able to make these predictions and request to
cache them.  The request is simply - telling the system to start reading
those files now.  So it's very easy to cache, as long as you know what to
cache.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-08 Thread John Martin

On 01/08/12 09:30, Edward Ned Harvey wrote:


In the case of your MP3 collection...  Probably the only thing you can do is
to write a script which will simply go read all the files you predict will
be read soon.  The key here is the prediction - There's no way ZFS or
solaris, or any other OS in the present day is going to intelligently
predict which files you'll be requesting soon.



The other prediction is whether the blocks will be reused.
If the blocks of a streaming read are only used once, then
it may be wasteful for a file system to allow these blocks
to placed in the cache.  If a file system purposely
chooses to not cache streaming reads, manually scheduling a
pre-read of particular files may simply cause the file to be read
from disk twice: on the manual pre-read and when it is read again
by the actual application.

I believe Joerg Moellenkamp published a discussion
several years ago on how L1ARC attempt to deal with the pollution
of the cache by large streaming reads, but I don't have
a bookmark handy (nor the knowledge of whether the
behavior is still accurate).
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-08 Thread Jim Klimov

2012-01-08 19:15, John Martin пишет:

On 01/08/12 09:30, Edward Ned Harvey wrote:


In the case of your MP3 collection... Probably the only thing you can
do is
to write a script which will simply go read all the files you predict
will
be read soon. The key here is the prediction - There's no way ZFS or
solaris, or any other OS in the present day is going to intelligently
predict which files you'll be requesting soon.



The other prediction is whether the blocks will be reused.
If the blocks of a streaming read are only used once, then
it may be wasteful for a file system to allow these blocks
to placed in the cache. If a file system purposely
chooses to not cache streaming reads, manually scheduling a
pre-read of particular files may simply cause the file to be read
from disk twice: on the manual pre-read and when it is read again
by the actual application.

I believe Joerg Moellenkamp published a discussion
several years ago on how L1ARC attempt to deal with the pollution
of the cache by large streaming reads, but I don't have
a bookmark handy (nor the knowledge of whether the
behavior is still accurate).


Well, this point is valid for intensively-used servers - but
then such blocks might just get evicted from the caches by
newer and/or more-frequently-used blocks.

However for smaller servers, such as home NASes which have
about one user overall, pre-reading and caching files even
for a single use might be an objective per se - just to let
the hard-disks spin down. Say, if I sit down to watch a
movie from my NAS, it is likely that for 90 or 120 minutes
there will be no other IO initiated by me. The movie file
can be pre-read in a few seconds, and then most of the
storage system can go to sleep.

//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-08 Thread John Martin

On 01/08/12 11:30, Jim Klimov wrote:


However for smaller servers, such as home NASes which have
about one user overall, pre-reading and caching files even
for a single use might be an objective per se - just to let
the hard-disks spin down. Say, if I sit down to watch a
movie from my NAS, it is likely that for 90 or 120 minutes
there will be no other IO initiated by me. The movie file
can be pre-read in a few seconds, and then most of the
storage system can go to sleep.


Isn't this just a more extreme case of prediction?
In addition to the file system knowing there will only
be one client reading 90-120 minutes of (HD?) video
that will fit in the memory of a small(er) server,
now the hard drive power management code also knows there
won't be another access for 90-120 minutes so it is OK
to spin down the hard drive(s).
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-08 Thread Jim Klimov

2012-01-09 0:29, John Martin пишет:

On 01/08/12 11:30, Jim Klimov wrote:


However for smaller servers, such as home NASes which have
about one user overall, pre-reading and caching files even
for a single use might be an objective per se - just to let
the hard-disks spin down. Say, if I sit down to watch a
movie from my NAS, it is likely that for 90 or 120 minutes
there will be no other IO initiated by me. The movie file
can be pre-read in a few seconds, and then most of the
storage system can go to sleep.


I can't find such home-NAS usage uncommon, because I am
my own example user - so I see this pattern often ;)




Isn't this just a more extreme case of prediction?


Probably is, and this is probably not a task for only ZFS,
but for logic outside it. There are some requirements
that ZFS should meet, in order for this to work, though.
Details follow...


In addition to the file system knowing there will only
be one client reading 90-120 minutes of (HD?) video
that will fit in the memory of a small(er) server,
now the hard drive power management code also knows there
won't be another access for 90-120 minutes so it is OK
to spin down the hard drive(s).


Well, in the original post I did suggest that the prediction
logic might go into scripting or some other user-level tool.
And it should, really, to keep the kernel clean and slim.

The predictor might be as simple as a DTrace file access
monitor, which would cat or tar files into /dev/null.
I.e. if it detected access to *.(avi|mkv|wmv), then it
should cat the file. If it detected *.(mp3|ogg|jpg) it
should tar the parent directory. Might be dumb and still
sufficiently efficient ;)

However, for such usecases this tool would need some
guarantees from ZFS. One would be that the read-ahead
data will find its way into caches and won't be evicted
for no reason (when there's no other RAM pressure).
This means that the tool should be able to read all the
data and metadata required by ZFS, so that no more disk
access is required if it's all in cache.
It might require a tunable in ZFS for home-NAS users
which would disable current no-caching for detected
streaming reads: we need the opposite of that behavior.

Another part is HDD power-management, which reportedly
works in Solaris, allowing disks to spin down when there
was no access for some time. Probably there is a syscall
to do this on-demand as well...

On a side note, for home-NASes or other not-heavily-used
storage servers, it would be wonderful to be able to cache
small writes into ZIL devices, if present, and not flush
them onto the main pool until some megabyte limit is
reached (i.e. ZIL is full), or a pool export/import event
occurs. This would allow main disk arrays to remain idle
for a long time while small sporadic writes which are
initiated by the OS (logs, atimes, web-browser cache
files, whatever), and have these writes persistently
stored in ZIL. Essentially, this would be like setting
TXG-commit times to practical infinity, and actually
commit based on bytecount limits. One possible difference
would be not-streaming larger writes to pool disks at once,
but also storing them in dedicated ZIL.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-08 Thread Richard Elling
On Jan 7, 2012, at 8:59 AM, Jim Klimov wrote:

 I wonder if it is possible (currently or in the future as an RFE)
 to tell ZFS to automatically read-ahead some files and cache them
 in RAM and/or L2ARC?

See discussions on the ZFS intelligent prefetch algorithm. I think Ben 
Rockwood's
description is the best general description:
http://www.cuddletech.com/blog/pivot/entry.php?id=1040

And a more engineer-focused description is at:
http://www.solarisinternals.com/wiki/index.php/ZFS_Performance#Intelligent_prefetch
 -- richard


 
 One use-case would be for Home-NAS setups where multimedia (video
 files or catalogs of images/music) are viewed form a ZFS box. For
 example, if a user wants to watch a film, or listen to a playlist
 of MP3's, or push photos to a wall display (photo frame, etc.),
 the storage box should read-ahead all required data from HDDs
 and save it in ARC/L2ARC. Then the HDDs can spin down for hours
 while the pre-fetched gigabytes of data are used by consumers
 from the cache. End-users get peace, quiet and less electricity
 used while they enjoy their multimedia entertainment ;)
 
 Is it possible? If not, how hard would it be to implement?
 
 In terms of scripting, would it suffice to detect reads (i.e.
 with DTrace) and read the files to /dev/null to get them cached
 along with all required metadata (so that mechanical HDDs are
 not required for reads afterwards)?
 
 Thanks,
 //Jim Klimov
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 

ZFS and performance consulting
http://www.RichardElling.com
illumos meetup, Jan 10, 2012, Menlo Park, CA
http://www.meetup.com/illumos-User-Group/events/41665962/ 














___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-08 Thread Jim Klimov

2012-01-09 4:14, Richard Elling пишет:

On Jan 7, 2012, at 8:59 AM, Jim Klimov wrote:


I wonder if it is possible (currently or in the future as an RFE)
to tell ZFS to automatically read-ahead some files and cache them
in RAM and/or L2ARC?


See discussions on the ZFS intelligent prefetch algorithm. I think Ben 
Rockwood's
description is the best general description:
http://www.cuddletech.com/blog/pivot/entry.php?id=1040

And a more engineer-focused description is at:
http://www.solarisinternals.com/wiki/index.php/ZFS_Performance#Intelligent_prefetch
  -- richard


Thanks for the pointers. While I've seen those articles
(in fact, one of the two non-spam comments in Ben's
blog was mine), rehashing the basics is always useful ;)

Still, how does VDEV prefetch play along with File-level
Prefetch? For example, if ZFS prefetched 64K from disk
at the SPA level, and those sectors luckily happen to
contain next blocks of a streaming-read file, would
the file-level prefetch take the data from RAM cache
or still request them from the disk?

In what cases would it make sense to increase the
zfs_vdev_cache_size? Does it apply to all disks
combined, or to each disk (or even slice/partition)
separately?

In fact, this reading got me thinking that I might have
a fundamental misunderstanding lately; hence a couple
of new yes-no questions arose:

Is it true or false that: ZFS might skip the cache and
go to disks for streaming reads? (The more I think
about it, the more senseless this sentence seems, and
I might have just mistaken it with ZIL writes of bulk
data).

Is it true or false that: ARC might evict cached blocks
based on age (without new reads or other processes
requiring the RAM space)?

And I guess the generic answer to my original question
regarding intelligent pre-fetching of whole files is
that this should be done by scripts outside ZFS itself,
and that the read-prefetch as well as ARC/L2ARC is all
in place already. So if no other IOs occur, the disks
may spin down... if only not for those nasty writes
that may sporadically occur and which I'd love to see
pushed out to dedicated ZILs ;)

Thanks,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-08 Thread Richard Elling
On Jan 8, 2012, at 5:10 PM, Jim Klimov wrote:
 2012-01-09 4:14, Richard Elling пишет:
 On Jan 7, 2012, at 8:59 AM, Jim Klimov wrote:
 
 I wonder if it is possible (currently or in the future as an RFE)
 to tell ZFS to automatically read-ahead some files and cache them
 in RAM and/or L2ARC?
 
 See discussions on the ZFS intelligent prefetch algorithm. I think Ben 
 Rockwood's
 description is the best general description:
 http://www.cuddletech.com/blog/pivot/entry.php?id=1040
 
 And a more engineer-focused description is at:
 http://www.solarisinternals.com/wiki/index.php/ZFS_Performance#Intelligent_prefetch
  -- richard
 
 Thanks for the pointers. While I've seen those articles
 (in fact, one of the two non-spam comments in Ben's
 blog was mine), rehashing the basics is always useful ;)
 
 Still, how does VDEV prefetch play along with File-level
 Prefetch?

Trick question… it doesn't. vdev prefetching is disabled in opensolaris b148, 
illumos,
and Solaris 11 releases. The benefits of having the vdev cache for large 
numbers of 
disks does not appear to justify the cost. See
http://wesunsolve.net/bugid/id/6684116
https://www.illumos.org/issues/175

 For example, if ZFS prefetched 64K from disk
 at the SPA level, and those sectors luckily happen to
 contain next blocks of a streaming-read file, would
 the file-level prefetch take the data from RAM cache
 or still request them from the disk?

As of b70, vdev_cache only contains metadata. See 
http://wesunsolve.net/bugid/id/6437054

 In what cases would it make sense to increase the
 zfs_vdev_cache_size? Does it apply to all disks
 combined, or to each disk (or even slice/partition)
 separately?

It applies to each leaf vdev.

 
 In fact, this reading got me thinking that I might have
 a fundamental misunderstanding lately; hence a couple
 of new yes-no questions arose:
 
 Is it true or false that: ZFS might skip the cache and
 go to disks for streaming reads? (The more I think
 about it, the more senseless this sentence seems, and
 I might have just mistaken it with ZIL writes of bulk
 data).

Unless the primarycache parameter is set to none, reads 
will look in the ARC first.

 
 Is it true or false that: ARC might evict cached blocks
 based on age (without new reads or other processes
 requiring the RAM space)?

False. Evictions occur when needed.

NB, I'm not sure of the status of the Solaris 11 ARC no-grow issue.
As that code is not open sourced, and we know that Oracle rewrote
some of the ARC code, all bets are off.

 And I guess the generic answer to my original question
 regarding intelligent pre-fetching of whole files is
 that this should be done by scripts outside ZFS itself,
 and that the read-prefetch as well as ARC/L2ARC is all
 in place already. So if no other IOs occur, the disks
 may spin down... if only not for those nasty writes
 that may sporadically occur and which I'd love to see
 pushed out to dedicated ZILs ;)

I've setup external prefetching for specific use cases.  Spin-down 
is another can of worms…
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
illumos meetup, Jan 10, 2012, Menlo Park, CA
http://www.meetup.com/illumos-User-Group/events/41665962/ 














___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs read-ahead and L2ARC

2012-01-07 Thread Jim Klimov

I wonder if it is possible (currently or in the future as an RFE)
to tell ZFS to automatically read-ahead some files and cache them
in RAM and/or L2ARC?

One use-case would be for Home-NAS setups where multimedia (video
files or catalogs of images/music) are viewed form a ZFS box. For
example, if a user wants to watch a film, or listen to a playlist
of MP3's, or push photos to a wall display (photo frame, etc.),
the storage box should read-ahead all required data from HDDs
and save it in ARC/L2ARC. Then the HDDs can spin down for hours
while the pre-fetched gigabytes of data are used by consumers
from the cache. End-users get peace, quiet and less electricity
used while they enjoy their multimedia entertainment ;)

Is it possible? If not, how hard would it be to implement?

In terms of scripting, would it suffice to detect reads (i.e.
with DTrace) and read the files to /dev/null to get them cached
along with all required metadata (so that mechanical HDDs are
not required for reads afterwards)?

Thanks,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss