On Dec 5, 2012, at 5:41 AM, Jim Klimov <jimkli...@cos.ru> wrote:
> On 2012-12-05 04:11, Richard Elling wrote:
>> On Nov 29, 2012, at 1:56 AM, Jim Klimov <jimkli...@cos.ru
>> <mailto:jimkli...@cos.ru>> wrote:
>>> I've heard a claim that ZFS relies too much on RAM caching, but
>>> implements no sort of priorities (indeed, I've seen no knobs to
>>> tune those) - so that if the storage box receives many different
>>> types of IO requests with different "administrative weights" in
>>> the view of admins, it can not really throttle some IOs to boost
>>> others, when such IOs have to hit the pool's spindles.
>> Caching has nothing to do with QoS in this context. *All* modern
>> filesystems cache to RAM, otherwise they are unusable.
> Yes, I get that. However, many systems get away with less RAM
> than recommended for ZFS rigs (like the ZFS SA with a couple
> hundred GB as the starting option), and make their compromises
> elsewhere. They have to anyway, and they get different results,
> perhaps even better suited to certain narrow or big niches.
This is nothing more than a specious argument. They have small
caches, so their performance is not as good as those with larger
caches. This is like saying you need a smaller CPU cache because
larger CPU caches get full.
> Whatever the aggregate result, this difference does lead to
> some differing features that The Others' marketing trumpets
> praise as the advantage :) - like this ability to mark some
> IO traffic as of higher priority than other traffics, in one
> case (which is now also an Oracle product line, apparently)...
> Actually, this question stems from a discussion at a seminar
> I've recently attended - which praised ZFS but pointed out its
> weaknesses against some other players on the market, so we are
> not unaware of those.
>>> For example, I might want to have corporate webshop-related
>>> databases and appservers to be the fastest storage citizens,
>>> then some corporate CRM and email, then various lower priority
>>> zones and VMs, and at the bottom of the list - backups.
>> Please read the papers on the ARC and how it deals with MFU and
>> MRU cache types. You can adjust these policies using the primarycache
>> and secondarycache properties at the dataset level.
> I've read on that, and don't exactly see how much these help
> if there is pressure on RAM so that cache entries expire...
> Meaning, if I want certain datasets to remain cached as long
> as possible (i.e. serve website or DB from RAM, not HDD), at
> expense of other datasets that might see higher usage, but
> have lower business priority - how do I do that? Or, perhaps,
> add (L2)ARC shares, reservations and/or quotas concepts to the
> certain datasets which I explicitly want to throttle up or down?
MRU evictions take precedence over MFU evictions. If the data is
not in MFU, then it is, by definition, not being frequently used.
> At most, now I can mark the lower-priority datasets' data or
> even metadata as not cached in ARC or L2ARC. On-off. There seems
> to be no smaller steps, like in QoS tags [0-7] or something like
> BTW, as a short side question: is it a true or false statement,
> that: if I set primarycache=metadata, then ZFS ARC won't cache
> any "userdata" and thus it won't appear in (expire into) L2ARC?
> So the real setting is that I can cache data+meta in RAM, and
> only meta in SSD? Not the other way around (meta in RAM but
> both data+meta in SSD)?
That is correct, by my reading of the code.
>>> AFAIK, now such requests would hit the ARC, then the disks if
>>> needed - in no particular order. Well, can the order be made
>>> "particular" with current ZFS architecture, i.e. by setting
>>> some datasets to have a certain NICEness or another priority
>> ZFS has a priority-based I/O scheduler that works at the DMU level.
>> However, there is no system call interface in UNIX that transfers
>> priority or QoS information (eg read() or write()) into the file system VFS
>> interface. So the grainularity of priority control is by zone or dataset.
> I do not think I've seen mention of priority controls per dataset,
> at least not in generic ZFS. Actually, that was part of my question
> above. And while throttling or resource shares between higher level
> software components (zones, VMs) might have similar effect, this is
> not something really controlled and enforced by the storage layer.
The priority scheduler is by type of I/O request. For example, sync
requests have priority over async requests. Reads and writes have
priority over scrubbing etc. The inter-dataset scheduling is done at
the zone level.
There is more work being done in this area, but it is still in the research
zfs-discuss mailing list