Re: [zfs-discuss] RFE: Un-dedup for unique blocks

Sašo Kiselkov Tue, 22 Jan 2013 15:31:31 -0800

On 01/22/2013 11:22 PM, Jim Klimov wrote:
> On 2013-01-22 23:03, Sašo Kiselkov wrote:
>> On 01/22/2013 10:45 PM, Jim Klimov wrote:
>>> On 2013-01-22 14:29, Darren J Moffat wrote:
>>>> Preallocated ZVOLs - for swap/dump.
>>>
>>> Or is it also supported to disable COW for such datasets, so that
>>> the preallocated swap/dump zvols might remain contiguous on the
>>> faster tracks of the drive (i.e. like a dedicated partition, but
>>> with benefits of ZFS checksums and maybe compression)?
>>
>> I highly doubt it, as it breaks one of the fundamental design principles
>> behind ZFS (always maintain transactional consistency). Also,
>> contiguousness and compression are fundamentally at odds (contiguousness
>> requires each block to remain the same length regardless of contents,
>> compression varies block length depending on the entropy of the
>> contents).
> 
> Well, dump and swap devices are kind of special in that they need
> verifiable storage (i.e. detectable to have no bit-errors) but not
> really consistency as in sudden-power-off transaction protection.


I get your point, but I would argue that if you are willing to
preallocate storage for these, then putting dump/swap on an iSCSI LUN as
opposed to having it locally is kind of pointless anyway. Since they are
used rarely, having them "thin provisioned" is probably better in a
iSCSI environment than wasting valuable network-storage resources on
something you rarely need.

> Both have a lifetime span of a single system uptime - like L2ARC,
> for example - and will be reused anew afterwards - after a reboot,
> a power-surge, or a kernel panic.

For the record, the L2ARC is not transactionally consistent. It use a
completely different allocation strategy from the main pool (essentially
a simple rotor). Besides, if you plan to shred your dump contents after
reboot anyway, why fat-provision them? I can understand swap, but dump?

> So while metadata used to address the swap ZVOL contents may and
> should be subject to common ZFS transactions and COW and so on,
> and jump around the disk along with rewrites of blocks, the ZVOL
> userdata itself may as well occupy the same positions on the disk,
> I think, rewriting older stuff. With mirroring likely in place as
> well as checksums, there are other ways than COW to ensure that
> the swap (at least some component thereof) contains what it should,
> even with intermittent errors of some component devices.

You don't understand, the transactional integrity in ZFS isn't just to
protect the data you put in, it's also meant to protect ZFS' internal
structure (i.e. the metadata). This includes the layout of your zvols
(which are also just another dataset). I understand that you want to
view a this kind of fat-provisioned zvol as a simple contiguous
container block, but it is probably more hassle to implement than it's
worth.

> Likewise, swap/dump breed of zvols shouldn't really have snapshots,
> especially not automatic ones (and the installer should take care
> of this at least for the two zvols it creates) ;)

If you are talking about the standard opensolaris-style
boot-environments, then yes, this is taken into account. Your BE lives
under rpool/ROOT, while swap and dump are rpool/swap and rpool/dump
respectively (both thin-provisioned, since they are rarely needed).

> Compression for swap is an interesting matter... for example, how
> should it be accounted? As dynamic expansion and/or shrinking of
> available swap space (or just of space needed to store it)?

Since compression occurs way below the dataset layer, your zvol capacity
doesn't change with compression, even though how much space it actually
uses in the pool can. A zvol's capacity pertains to its logical
attributes, i.e. most importantly the maximum byte offset within it
accessible to an application (in this case, swap). How the underlying
blocks are actually stored and how much space they take up is up to the
lower layers.

> If the latter, and we still intend to preallocate and guarantee
> that the swap has its administratively predefined amount of
> gigabytes, compressed blocks can be aligned on those starting
> locations as if they were not compressed. In effect this would
> just decrease the bandwidth requirements, maybe.

But you forget that a compressed block's physical size fundamentally
depends on its contents. That's why compressed zvols still appear the
same size as before. What changes is how much space they occupy on the
underlying pool.

> For dump this might be just a bulky compressed write from start
> to however much it needs, within the preallocated psize limits...

I hope you now understand the distinction between the logical size of a
zvol and its actual in-pool size. We can't tie one to other, since it
would result in unpredictable behavior for the application (write one
set of data, get capacity X, write another set, get capacity Y - how to
determine in advance how much fits in? You can't).

Cheers,
--
Saso
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE: Un-dedup for unique blocks

Reply via email to