On 2012-08-24 14:39, Jim Klimov wrote:
The idea of dedicated metadata devices (likely SSDs) for ZFS
has been generically discussed a number of times on this list,
but I don't think I've seen a final proposal that someone would
take up for implementation (as a public source code, at least).
OK, I am not a ZFS dev and have barely even looked at the code, but it
seems to me that this could be dealt with in an easier and more
efficient manner by modifying current L2ARC code to make a persistent
cache device, and adding the preference mechanism somebody has already
suggested (e.g. prefer metadata, or prefer specific typed of metadata)
My reasoning is as follows:
1) As metadata is already available on the main pool devices, there is
no need to make this data redundant. It is there for acceleration. In
the event of a failure, it can just be read directly from the pool, and
there is no need to write the data twice (as would be in a mirrored
'metaxel') or waste the space. This is only my oppinion, but it makes
sense to me. The other option, for me, would be to make it the main
storage area for metadata, with no requirement to store it on the main
pool devices beyond needing enough copies. i.e. if you need 2 metadata
copies but have only one metaxel, store on on there and one in the pool.
If you need 2 copies and there are 2 metaxels, store them on the
metaxels, no pool storage needed.
2) Persistent cache devices and cache policies would bring more
benefits to the system overall than adding this metaxel: No warming of
the cache (besides reading in what is stored there on import/boot, so
lets say accelerated warming) & finer control over what to store in the
cache. The cache devices could then be tuned on a per dataset (and
possibly per cache dev, so certain data types prefer the cache dev with
the best performance profile for it) basis to provide the best for your
own unique situation. Possibly even a "keep this dataset in cache at all
times" would be usefull for less frequently accessed but time-critical
data (so no more loops cat'ing to /dev/null to keep data in cache).
3) This would provide, IMHO, the building blocks for a combined
cache/log device. This would basically go as follows: You set up, say, a
pair of persistent cache devices. You then tell ZFS that these can be
used for ZIL blocks, with something like the copies attribute to tell it
to ensure redundancy. So it basically builds a ZIL device from blocks
within the cache as it needs it. It would not be as fast as a dedicated
log device, but would allow greater efficiency.
Point 3 would be for future development, but I believe the benefits of
cache persistence and policies are enough to make them a priority. I
believe it would cover what the metaxel is trying to do and more.
The other, simpler, option I could see is a flag which tells ZFS "Keep
metadata in the cache", which ensures all metadata (where possible) is
stored in ARC/L2ARC at all times, and possibly forces it to be read in
zfs-discuss mailing list