Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 11/17/10 12:04, Miles Nordin wrote: black-box crypto is snake oil at any level, IMNSHO. Absolutely. Congrats again on finishing your project, but every other disk encryption framework I've seen taken remotely seriously has a detailed paper describing the algorithm, not just a list of features and a configuration guide. It should be a requirement for anything treated as more than a toy. I might have missed yours, or maybe it's coming soon. In particular, the mechanism by which dedup-friendly block IV's are chosen based on the plaintext needs public scrutiny. Knowing Darren, it's very likely that he got it right, but in crypto, all the details matter and if a spec detailed enough to allow for interoperability isn't available, it's safest to assume that some of the details are wrong. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS - Sudden decrease in write performance
That's for pointing me towards that site! Saying that txg_synctime_ms controls zfs's breathing was how I was thinking about it. Great way to describe it! Unfortunately setting txg_synctime_ms to 1000 or even 1 didn't make an improvement. I tried adding the disable-ohci=true to the GRUB boot menu via SSH and it didn't come back from it's reboot so I'm not going to be able to due much more tonight (I'm working remotely). I do notice that when the ARC size reaches capacity, that's when things slow down. Also, it never appears to drop after I kill the IO. If I stop all IO, arcstat shows all numbers but the arcsz drop. Should arcsz drop at all? On Mon, Nov 15, 2010 at 7:27 PM, Khushil Dep khushil@gmail.com wrote: That controls zfs breathing, I'm on a phone writing this so u hope you won't mind me pointing you to listware.net/201005/opensolaris-zfs/115564-zfs-discuss-small-stalls-slowing-down-rsync-from-holding-network-saturation-every-5-seconds.html On 16 Nov 2010 00:20, Louis Carreiro carreir...@gmail.com wrote: Almost! It seems like it held out a bit further than last time. Now arcsz hit's 2G (matching 'c'). But it still drops off. It started at 5.6GB/Min and fell off to less than 700MB/Min. A snippet of my arcstat.pl output looks like the following: Time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c 19:14:31 14K 283 1 2832 00 2831 2G2G 19:14:32 45K 120 0 1020180 1200 2G2G 19:14:339K 228 2 2132150 2232 2G2G 19:14:34 14K 285 2 2742110 2852 2G2G 19:14:35 14K 294 1 2762180 2941 2G2G The above is what it looks like when my speed falls off. Is txg_synctime_ms something I can tweek or is what you suggested a normal value? I've read a few articles that have mentioned values lower than 12288 ms. On Mon, Nov 15, 2010 at 6:35 PM, Khushil Dep khushil@gmail.com wrote: Set your txg_synct... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] possible zfs recv bug?
I verified that this bug exists in OpenSolaris as well. The problem is that we can't destroy the old filesystem a (which has been renamed to rec2/recv-2176-1 in this case). We can't destroy it because it has a child, b. We need to rename b to be under the new a. However, we are not renaming it, which is the root cause of the problem. This code in recv_incremental_replication() should detect that we should rename b: if ((stream_parent_fromsnap_guid != 0 parent_fromsnap_guid != 0 stream_parent_fromsnap_guid != parent_fromsnap_guid) || ... But this will not trigger because we have already destroyed the snapshots of b's parent (the old a, now rec2/recv-2176-1), so parent_fromsnap_guid will be 0. I believe that the fix for bug 6921421 introduced this code in build 135, it used to read: if ((stream_parent_fromsnap_guid != 0 stream_parent_fromsnap_guid != parent_fromsnap_guid) || ... So we will have to investigate and see why the parent_fromsnap_guid != 0 is now needed. --matt On Tue, Nov 23, 2010 at 6:16 AM, James Van Artsdalen james-opensola...@jrv.org wrote: I am seeing a zfs recv bug on FreeBSD and am wondering if someone could test this in the Solaris code. If it fails there then I guess a bug report into Solaris is needed. This is a perverse case of filesystem renaming between snapshots. kraken:/root# cat zt zpool create rec1 da3 zpool create rec2 da4 zfs create rec1/a zfs create rec1/a/b zfs snapshot -r r...@s1 zfs send -R r...@s1 | zfs recv -dvuF rec2 zfs rename rec1/a/b rec1/c zfs destroy -r rec1/a zfs create rec1/a zfs rename rec1/c rec1/a/b # if the rename target is anything other than rec1/a/b the zfs recv result is right zfs snapshot -r r...@s2 zfs send -R -I @s1 r...@s2 | zfs recv -dvuF rec2 kraken:/root# sh -x zt + zpool create rec1 da3 + zpool create rec2 da4 + zfs create rec1/a + zfs create rec1/a/b + zfs snapshot -r r...@s1 + zfs send -R r...@s1 + zfs recv -dvuF rec2 receiving full stream of r...@s1 into r...@s1 received 47.4KB stream in 2 seconds (23.7KB/sec) receiving full stream of rec1/a...@s1 into rec2/a...@s1 received 47.9KB stream in 1 seconds (47.9KB/sec) receiving full stream of rec1/a/b...@s1 into rec2/a/b...@s1 received 46.3KB stream in 1 seconds (46.3KB/sec) + zfs rename rec1/a/b rec1/c + zfs destroy -r rec1/a + zfs create rec1/a + zfs rename rec1/c rec1/a/b + zfs snapshot -r r...@s2 + zfs send -R -I @s1 r...@s2 + zfs recv -dvuF rec2 attempting destroy rec2/a...@s1 success attempting destroy rec2/a failed - trying rename rec2/a to rec2/recv-2176-1 local fs rec2/a/b new parent not found cannot open 'rec2/a/b': dataset does not exist another pass: attempting destroy rec2/recv-2176-1 failed (0) receiving incremental stream of r...@s2 into r...@s2 received 10.8KB stream in 2 seconds (5.41KB/sec) receiving full stream of rec1/a...@s2 into rec2/a...@s2 received 47.9KB stream in 1 seconds (47.9KB/sec) receiving incremental stream of rec1/a/b...@s2 into rec2/recv-2176-1/b...@s2 received 312B stream in 2 seconds (156B/sec) local fs rec2/a does not have fromsnap (s1 in stream); must have been deleted locally; ignoring attempting destroy rec2/recv-2176-1 failed (0) kraken:/root# zfs list | grep rec1 rec1 238K 1.78T32K /rec1 rec1/a 63K 1.78T32K /rec1/a rec1/a/b31K 1.78T31K /rec1/a/b kraken:/root# zfs list | grep rec2 rec2 293K 1.78T32K /rec2 rec2/a 32K 1.78T32K /rec2/a rec2/recv-2176-164K 1.78T32K /rec2/recv-2176-1 rec2/recv-2176-1/b 32K 1.78T31K /rec2/recv-2176-1/b kraken:/root# -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS snapshot limit?
W dniu 2010-12-01 15:19, Menno Lageman pisze: f...@ll wrote: Hi, I must send zfs snaphost from one server to another. Snapshot have size 130GB. Now I have question, the zfs have any limit of sending file? If you are sending the snapshot to another zpool (i.e. using 'zfs send | zfs recv') then no, there is no limit. If you however send the snapshot to a file on the other system (i.e. 'zfs send somefile') then you are limited by what the file system you are creating the file on supports. Menno Hi, In my situation is first option, I send snapshot to another server using zfs send | zfs recv and I have problem when data send is completed, after reboot the zpool have error or have state: faulted. First server is physical, second is a virtual machine running under xenserver 5.6 f...@ll ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS imported into GRUB
Hi, Following our new strategy with regard to Oracle code, we (GRUB maintainers) have decided to grant an exception to our usual policy and import ZFS code from grub-extras into official GRUB. Our usual policy is to require copyright assignment for all new code, so that FSF can use it to defend users' freedom in court. If that's not possible, at least a disclaimer asserting authorship (i.e. that no copyright infringement has been committed). The purpose of this, as always, is ensuring that GRUB is a legally safe codebase. The ZFS code that has been imported into GRUB derives from the OpenSolaris version of GRUB Legacy. On one hand, this code was released to the public under the terms of the GNU GPL. On the other, binary releases of Solaris included this modified GRUB, and as a result Oracle/Sun is bound by the GPL. We believe that these two factors give us very strong reassurance that: a) Oracle owns the copyright to this code and b) Oracle is licensing it under GPL and therefore it is completely safe to use this in GRUB. We're looking forward to this code import will foster collaboration on ZFS support for GRUB. Our understanding is that next version of Solaris will ship with GRUB 2, and so we expect the whole OpenSolaris ecosystem to do this move as well. We encourage downstream distributors to anticipate this by preparing their transition from the old, legacy version of GRUB (0.97) which is no longer supported by GRUB developers. Finally, a word about patents. Software patents are terribly harmful to free software, and to IT in general. We believe they should be abolished. However, until that happens, we need to take measures to protect our users. We recognize it is practically impossible for end users to archieve a situation where they're completely safe from patent infringement (even if they pay so-called patent taxes to specific companies). However, we encourage our users to make careful choices when importing technology that is designed in an in-door development model (rather than in the community), because it's prone to be heavily patented. This is the reason why, when we (the GNU project) developed the GPL, we included certain provisions in it to ensure a patent holder can't benefit from the freedoms we gave them and at the same time use patents to undermine these freedoms for others. Thanks to this, and due to the fact that Oracle is bound to the terms of the GNU GPL when it comes to GRUB, we believe this renders patents covering ZFS basically harmless to GRUB users. If the patents covering GRUB are held by Oracle, they can't use them against GRUB users, and if they're held by other parties, the GPL provisions will prevent Oracle from paying a tax only for themselves, so if they will fight alongside the community instead of betraying it. Let this serve as yet another example on why so-called permissive licenses aren't always a guarantee that the code covered by them can be used freely. If you intend for your code to be free for all users, always use the latest version of the GPL. -- Robert Millan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On Wed, Nov 17, 2010 at 01:58:06PM -0800, Bill Sommerfeld wrote: On 11/17/10 12:04, Miles Nordin wrote: black-box crypto is snake oil at any level, IMNSHO. Absolutely. As Darren said, much of the design has been discussed in public, and reviewed by cryptographers. It'd be nicer if we had a detailed paper though. Congrats again on finishing your project, but every other disk encryption framework I've seen taken remotely seriously has a detailed paper describing the algorithm, not just a list of features and a configuration guide. It should be a requirement for anything treated as more than a toy. I might have missed yours, or maybe it's coming soon. In particular, the mechanism by which dedup-friendly block IV's are chosen based on the plaintext needs public scrutiny. Knowing Darren, it's very likely that he got it right, but in crypto, all the details matter and if a spec detailed enough to allow for interoperability isn't available, it's safest to assume that some of the details are wrong. Dedup + crypto does have security implications. Specifically: it facilitates traffic analysis, and then known- and even chosen-plaintext attacks (if there were any practical such attacks on the cipher). For example, IIUC, the ratio of dedup vs. non-dedup blocks + analysis of dnodes and their data sizes (in blocks) + per-dnode dedup ratios can probably be used to identify OS images, which would then help mount known-plaintext attacks. For a mailstore you'd be able to distinguish mail sent or kept by a single local user vs. mail sent to and kept by more than one local user, and by sending mail you could help mount chose-plaintext attacks. And so on. My advice would be to not bother encrypting OS images, and if you encrypt only documents, then dedup is likely of less or no interest to you -- in general, you may not want to bother with dedup + crypto. However, it is fantastic that crypto and dedup can work together. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Booting fails with `Can not read the pool label' error
Hi Cindy, I haven't seen this in a while but I wonder if you just need to set the bootfs property on your new root pool and/or reapplying the bootblocks. Can you import this pool booting from a LiveCD and to review the bootfs property value? I would also install the boot blocks on the rpool2 disk. I would also check the grub entries in /rpool2/boot/grub/menu.lst. I've now repeated everything with snv_151a and it worked out of the box on the Sun Fire V880, and (on second try) also on my Blade 1500: it seems the first time round I had the devalias for the second IDE disk wrong: /p...@1e,60/i...@d/d...@0,1 instead of /p...@1e,60/i...@d/d...@1,0 I'm now happyling running snv_151a on both machines (and still using Xsun on the Blade 1500, so still usable as a desktop :-) Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 17/11/2010 21:58, Bill Sommerfeld wrote: In particular, the mechanism by which dedup-friendly block IV's are chosen based on the plaintext needs public scrutiny. Knowing Darren, it's very likely that he got it right, but in crypto, all the details matter and if a spec detailed enough to allow for interoperability isn't available, it's safest to assume that some of the details are wrong. That is described here: http://blogs.sun.com/darren/entry/zfs_encryption_what_is_on If dedup=on for the dataset the per block IVs are generated differently. They are generated by taking an HMAC-SHA256 of the plaintext and using the left most 96 bits of that as the IV. The key used for the HMAC-SHA256 is different to the one used by AES for the data encryption, but is stored (wrapped) in the same keychain entry, just like the data encryption key a new one is generated when doing a 'zfs key -K dataset'. Obviously we couldn't calculate this IV when doing a read so it has to be stored. This was also suggested independently by other well known people involved in encrypted filesystems while it was discussed on a public forum (most of that thread was cross posted to zfs-crypto-discuss). -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send receive problem/questions
Hi Don, I'm no snapshot expert but I think you will have to remove the previous receiving side snapshots, at least. I created a file system hierarchy that includes a lower-level snapshot, created a recursive snapshot of that hierarchy and sent it over to a backup pool. Then, did the same steps again. See the example below. You can see from my example that this process fails if I don't remove the existing snapshots first. And, because I didn't remove the original recursive snapshots on the sending side, the snapshots become nested. I'm sure someone else has a better advice. I had an example of sending root pool snapshots on the ZFS troubleshooting wiki but it was removed so I will try to restore that example. Thanks, Cindy # zfs list -r tank/home NAME USED AVAIL REFER MOUNTPOINT tank/home1.12M 66.9G25K /tank/home tank/h...@snap2 0 -25K - tank/home/anne280K 66.9G 280K /tank/home/anne tank/home/a...@snap2 0 - 280K - tank/home/bob 280K 66.9G 280K /tank/home/bob tank/home/b...@snap2 0 - 280K - tank/home/cindys 561K 66.9G 281K /tank/home/cindys tank/home/cin...@snap2 0 - 281K - tank/home/cindys/dir1 280K 66.9G 280K /tank/home/cindys/dir1 tank/home/cindys/d...@snap1 0 - 280K - tank/home/cindys/d...@snap2 0 - 280K - # zfs send -R tank/h...@snap2 | zfs recv -d bpool # zfs list -r bpool/home NAME USED AVAIL REFER MOUNTPOINT bpool/home1.12M 33.2G25K /bpool/home bpool/h...@snap2 0 -25K - bpool/home/anne280K 33.2G 280K /bpool/home/anne bpool/home/a...@snap2 0 - 280K - bpool/home/bob 280K 33.2G 280K /bpool/home/bob bpool/home/b...@snap2 0 - 280K - bpool/home/cindys 561K 33.2G 281K /bpool/home/cindys bpool/home/cin...@snap2 0 - 281K - bpool/home/cindys/dir1 280K 33.2G 280K /bpool/home/cindys/dir1 bpool/home/cindys/d...@snap1 0 - 280K - bpool/home/cindys/d...@snap2 0 - 280K - # zfs snapshot -r tank/h...@snap3 # zfs send -R tank/h...@snap3 | zfs recv -dF bpool cannot receive new filesystem stream: destination has snapshots (eg. bpool/h...@snap2) must destroy them to overwrite it # zfs destroy -r bpool/h...@snap2 # zfs destroy bpool/home/cindys/d...@snap1 # zfs send -R tank/h...@snap3 | zfs recv -dF bpool # zfs list -r bpool NAME USED AVAIL REFER MOUNTPOINT bpool 1.35M 33.2G23K /bpool bpool/home1.16M 33.2G25K /bpool/home bpool/h...@snap2 0 -25K - bpool/h...@snap3 0 -25K - bpool/home/anne280K 33.2G 280K /bpool/home/anne bpool/home/a...@snap2 0 - 280K - bpool/home/a...@snap3 0 - 280K - bpool/home/bob 280K 33.2G 280K /bpool/home/bob bpool/home/b...@snap2 0 - 280K - bpool/home/b...@snap3 0 - 280K - bpool/home/cindys 582K 33.2G 281K /bpool/home/cindys bpool/home/cin...@snap2 0 - 281K - bpool/home/cin...@snap3 0 - 281K - bpool/home/cindys/dir1 280K 33.2G 280K /bpool/home/cindys/dir1 bpool/home/cindys/d...@snap1 0 - 280K - bpool/home/cindys/d...@snap2 0 - 280K - bpool/home/cindys/d...@snap3 0 - 280K - On 12/01/10 11:30, Don Jackson wrote: Hello, I am attempting to move a bunch of zfs filesystems from one pool to another. Mostly this is working fine, but one collection of file systems is causing me problems, and repeated re-reading of man zfs and the ZFS Administrators Guide is not helping. I would really appreciate some help/advice. Here is the scenario. I have a nested (hierarchy) of zfs file systems. Some of the deeper fs are snapshotted. All this exists on the source zpool First I recursively snapshotted the whole subtree: zfs snapshot -r nasp...@xfer-11292010 Here is a subset of the source zpool: # zfs list -r naspool NAME USED AVAIL REFER MOUNTPOINT naspool 1.74T 42.4G 37.4K /naspool nasp...@xfer-11292010 0 - 37.4K - naspool/openbsd113G 42.4G 23.3G /naspool/openbsd naspool/open...@xfer-11292010 0 - 23.3G - naspool/openbsd/4.4 21.6G 42.4G 2.33G /naspool/openbsd/4.4 naspool/openbsd/4...@xfer-11292010 0 - 2.33G - naspool/openbsd/4.4/ports 592M 42.4G 200M /naspool/openbsd/4.4/ports
Re: [zfs-discuss] ashift and vdevs
dm == David Magda dma...@ee.ryerson.ca writes: dm The other thing is that with the growth of SSDs, if more OS dm vendors support dynamic sectors, SSD makers can have dm different values for the sector size okay, but if the size of whatever you're talking about is a multiple of 512, we don't actually need (or, probably, want!) any SCSI sector size monkeying around. Just establish a minimum write size in the filesystem, and always write multiple aligned 512-sectors at once instead. the 520-byte sectors you mentioned can't be accomodated this way, but for 4kByte it seems fine. dm to allow for performance dm changes as the technology evolves. Currently everything is dm hard-coded, XFS is hardcoded. NTFS has settable block size. ZFS has ashift (almost). ZFS slog is apparently hardcoded though. so, two of those four are not hardcoded, and the two hardcoded ones are hardcoded to 4kByte. dm Until you're in a virtualized environment. I believe that in dm the combination of NetApp and VMware, a 64K alignment is best dm practice last I head. Similarly with the various stripe widths dm available on traditional RAID arrays, it could be advantageous dm for the OS/FS to know it. There is another setting in XFS for RAID stripe size, but I don't know what it does. It's separate from the (unsettable) XFS block size setting. so...this 64kByte thing might not be the same thing as what we're talking about so far...though in terms of aligning partitions it's the same, I guess. pgpKhRGPwJZ8d.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
Also, when the IV is stored you can more easily look for accidental IV re-use, and if you can find hash collisions, them you can even cause IV re-use (if you can write to the filesystem in question). For GCM IV re-use is rather fatal (for CCM it's bad, but IIRC not fatal), so I'd not use GCM with dedup either. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recover data from detached ZFS mirror
Thank you, it is exactly what I needed. Trying to compile this thing on a SPARC system :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss