On 01/22/2013 11:22 PM, Jim Klimov wrote: > On 2013-01-22 23:03, Sašo Kiselkov wrote: >> On 01/22/2013 10:45 PM, Jim Klimov wrote: >>> On 2013-01-22 14:29, Darren J Moffat wrote: >>>> Preallocated ZVOLs - for swap/dump. >>> >>> Or is it also supported to disable COW for such datasets, so that >>> the preallocated swap/dump zvols might remain contiguous on the >>> faster tracks of the drive (i.e. like a dedicated partition, but >>> with benefits of ZFS checksums and maybe compression)? >> >> I highly doubt it, as it breaks one of the fundamental design principles >> behind ZFS (always maintain transactional consistency). Also, >> contiguousness and compression are fundamentally at odds (contiguousness >> requires each block to remain the same length regardless of contents, >> compression varies block length depending on the entropy of the >> contents). > > Well, dump and swap devices are kind of special in that they need > verifiable storage (i.e. detectable to have no bit-errors) but not > really consistency as in sudden-power-off transaction protection.
I get your point, but I would argue that if you are willing to preallocate storage for these, then putting dump/swap on an iSCSI LUN as opposed to having it locally is kind of pointless anyway. Since they are used rarely, having them "thin provisioned" is probably better in a iSCSI environment than wasting valuable network-storage resources on something you rarely need. > Both have a lifetime span of a single system uptime - like L2ARC, > for example - and will be reused anew afterwards - after a reboot, > a power-surge, or a kernel panic. For the record, the L2ARC is not transactionally consistent. It use a completely different allocation strategy from the main pool (essentially a simple rotor). Besides, if you plan to shred your dump contents after reboot anyway, why fat-provision them? I can understand swap, but dump? > So while metadata used to address the swap ZVOL contents may and > should be subject to common ZFS transactions and COW and so on, > and jump around the disk along with rewrites of blocks, the ZVOL > userdata itself may as well occupy the same positions on the disk, > I think, rewriting older stuff. With mirroring likely in place as > well as checksums, there are other ways than COW to ensure that > the swap (at least some component thereof) contains what it should, > even with intermittent errors of some component devices. You don't understand, the transactional integrity in ZFS isn't just to protect the data you put in, it's also meant to protect ZFS' internal structure (i.e. the metadata). This includes the layout of your zvols (which are also just another dataset). I understand that you want to view a this kind of fat-provisioned zvol as a simple contiguous container block, but it is probably more hassle to implement than it's worth. > Likewise, swap/dump breed of zvols shouldn't really have snapshots, > especially not automatic ones (and the installer should take care > of this at least for the two zvols it creates) ;) If you are talking about the standard opensolaris-style boot-environments, then yes, this is taken into account. Your BE lives under rpool/ROOT, while swap and dump are rpool/swap and rpool/dump respectively (both thin-provisioned, since they are rarely needed). > Compression for swap is an interesting matter... for example, how > should it be accounted? As dynamic expansion and/or shrinking of > available swap space (or just of space needed to store it)? Since compression occurs way below the dataset layer, your zvol capacity doesn't change with compression, even though how much space it actually uses in the pool can. A zvol's capacity pertains to its logical attributes, i.e. most importantly the maximum byte offset within it accessible to an application (in this case, swap). How the underlying blocks are actually stored and how much space they take up is up to the lower layers. > If the latter, and we still intend to preallocate and guarantee > that the swap has its administratively predefined amount of > gigabytes, compressed blocks can be aligned on those starting > locations as if they were not compressed. In effect this would > just decrease the bandwidth requirements, maybe. But you forget that a compressed block's physical size fundamentally depends on its contents. That's why compressed zvols still appear the same size as before. What changes is how much space they occupy on the underlying pool. > For dump this might be just a bulky compressed write from start > to however much it needs, within the preallocated psize limits... I hope you now understand the distinction between the logical size of a zvol and its actual in-pool size. We can't tie one to other, since it would result in unpredictable behavior for the application (write one set of data, get capacity X, write another set, get capacity Y - how to determine in advance how much fits in? You can't). Cheers, -- Saso _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss