Re: [zfs-discuss] RFE: Un-dedup for unique blocks

Jim Klimov Tue, 22 Jan 2013 17:19:20 -0800

The discussion gets suddenly hot and interesting - albeit quite diverged
from the original topic ;)


First of all, as a disclaimer, when I have earlier proposed such changes
to datasets for swap (and maybe dump) use, I've explicitly proposed that
this be a new dataset type - compared to zvol and fs and snapshot that
we have today. Granted, this distinction was lost in today's exchange
of words, but it is still an important one - especially since it means
that while basic ZFS (or rather ZPOOL) rules are maintained, the dataset
rules might be redefined ;)

I'll try to reply to a few points below, snipping a lot of older text.

>> Well, dump and swap devices are kind of special in that they need

verifiable storage (i.e. detectable to have no bit-errors) but not
really consistency as in sudden-power-off transaction protection.


I get your point, but I would argue that if you are willing to
preallocate storage for these, then putting dump/swap on an iSCSI LUN as
opposed to having it locally is kind of pointless anyway. Since they are
used rarely, having them "thin provisioned" is probably better in a
iSCSI environment than wasting valuable network-storage resources on
something you rarely need.


I am not sure what in my post led you to think that I meant iSCSI
or otherwise networked storage to keep swap and dump. Some servers
have local disks, you know - and in networked storage environments
the local disks are only used to keep the OS image, swap and dump ;)

Besides, if you plan to shred your dump contents after
reboot anyway, why fat-provision them? I can understand swap, but dump?


Guarantee that the space is there... Given the recent mischiefs
with dumping (i.e. the context is quite stripped compared to the
general kernel work, so multithreading broke somehow) I guess that
pre-provisioned sequential areas might also reduce some risks...
though likely not - random metadata would still have to get into
the pool.

You don't understand, the transactional integrity in ZFS isn't just to
protect the data you put in, it's also meant to protect ZFS' internal
structure (i.e. the metadata). This includes the layout of your zvols
(which are also just another dataset). I understand that you want to
view a this kind of fat-provisioned zvol as a simple contiguous
container block, but it is probably more hassle to implement than it's
worth.


I'd argue that transactional integrity in ZFS primarily protects
metadata, so that there is a tree of always-actual block pointers.
There is this octopus of a block-pointer tree whose leaf nodes
point to data blocks - but only as DVAs and checksums, basically.
Nothing really requires data to be or not be COWed and stored at
a different location than the previous version of the block at
the same logical offset for the data consumers (FS users, zvol
users), except that we want that data to be readable even after
a catastrophic pool close (system crash, poweroff, etc.).

We don't (AFAIK) have such a requirement for swap. If the pool
which contained swap kicked the bucket, we probably have a
larger problem whose solution will likely involve reboot and thus
recycling of all swap data.

And for single-device errors with (contiguous) preallocated
unrelocatable swap, we can protect with mirrors and checksums
(used upon read, within this same uptime that wrote the bits).

Likewise, swap/dump breed of zvols shouldn't really have snapshots,
especially not automatic ones (and the installer should take care
of this at least for the two zvols it creates) ;)


If you are talking about the standard opensolaris-style
boot-environments, then yes, this is taken into account. Your BE lives
under rpool/ROOT, while swap and dump are rpool/swap and rpool/dump
respectively (both thin-provisioned, since they are rarely needed).


I meant the attribute for zfs-auto-snapshots service, i.e.:
rpool/swap  com.sun:auto-snapshot  false                  local

As I wrote, I'd argue that for "new" swap (and maybe dump) datasets
the snapshot action should not even be implemented.

Compression for swap is an interesting matter... for example, how
should it be accounted? As dynamic expansion and/or shrinking of
available swap space (or just of space needed to store it)?


Since compression occurs way below the dataset layer, your zvol capacity
doesn't change with compression, even though how much space it actually
uses in the pool can. A zvol's capacity pertains to its logical
attributes, i.e. most importantly the maximum byte offset within it
accessible to an application (in this case, swap). How the underlying
blocks are actually stored and how much space they take up is up to the
lower layers.

...

But you forget that a compressed block's physical size fundamentally
depends on its contents. That's why compressed zvols still appear the
same size as before. What changes is how much space they occupy on the
underlying pool.


I won't argue with this, as it is perfectly correct for zvols and
undefined for the mythical new dataset type ;)

However, regarding dump and size prediction - when I created dump
zvol's manually and fed them to dumpadm, it can complain that the
device is too small. Then at some point it accepts the given size,
even though it is some value not like the system RAM or anything.
So I guess the system also does some guessing in this case?..
If so, preallocating as many bytes as it thinks minimally required
and then allowing compression to stuff more data in, might help to
actually save the larger dumps in cases the system (dumpadm) made
a wrong guess.

//Jim


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] RFE: Un-dedup for unique blocks

Reply via email to