Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express

2010-12-02 Thread Bill Sommerfeld

On 11/17/10 12:04, Miles Nordin wrote:

black-box crypto is snake oil at any level, IMNSHO.


Absolutely.


Congrats again on finishing your project, but every other disk
encryption framework I've seen taken remotely seriously has a detailed
paper describing the algorithm, not just a list of features and a
configuration guide.  It should be a requirement for anything treated
as more than a toy.  I might have missed yours, or maybe it's coming
soon.


In particular, the mechanism by which dedup-friendly block IV's are 
chosen based on the plaintext needs public scrutiny.  Knowing Darren, 
it's very likely that he got it right, but in crypto, all the details 
matter and if a spec detailed enough to allow for interoperability isn't 
available, it's safest to assume that some of the details are wrong.


- Bill
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS - Sudden decrease in write performance

2010-12-02 Thread Louis Carreiro
That's for pointing me towards that site! Saying that txg_synctime_ms
controls zfs's breathing was how I was thinking about it. Great way to
describe it! Unfortunately setting txg_synctime_ms to 1000 or even 1 didn't
make an improvement.

I tried adding the disable-ohci=true to the GRUB boot menu via SSH and it
didn't come back from it's reboot so I'm not going to be able to due much
more tonight (I'm working remotely).

I do notice that when the ARC size reaches capacity, that's when things slow
down. Also, it never appears to drop after I kill the IO. If I stop all IO,
arcstat shows all numbers but the arcsz drop. Should arcsz drop at all?

On Mon, Nov 15, 2010 at 7:27 PM, Khushil Dep khushil@gmail.com wrote:

 That controls zfs breathing, I'm on a phone writing this so u hope you
 won't mind me pointing you to
 listware.net/201005/opensolaris-zfs/115564-zfs-discuss-small-stalls-slowing-down-rsync-from-holding-network-saturation-every-5-seconds.html

 On 16 Nov 2010 00:20, Louis Carreiro carreir...@gmail.com wrote:
 Almost! It seems like it held out a bit further than last time. Now arcsz
 hit's 2G (matching 'c'). But it still drops off. It started at 5.6GB/Min and
 fell off to less than 700MB/Min.

 A snippet of my arcstat.pl output looks like the following:

   Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
 19:14:31   14K   283  1   2832 00   2831 2G2G
 19:14:32   45K   120  0   1020180   1200 2G2G
 19:14:339K   228  2   2132150   2232 2G2G
 19:14:34   14K   285  2   2742110   2852 2G2G
 19:14:35   14K   294  1   2762180   2941 2G2G

 The above is what it looks like when my speed falls off. Is txg_synctime_ms
 something I can tweek or is what you suggested a normal value? I've read a
 few articles that have mentioned values lower than 12288 ms.



 On Mon, Nov 15, 2010 at 6:35 PM, Khushil Dep khushil@gmail.com
 wrote:
 
  Set your txg_synct...


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] possible zfs recv bug?

2010-12-02 Thread Matthew Ahrens
I verified that this bug exists in OpenSolaris as well.  The problem is that
we can't destroy the old filesystem a (which has been renamed to
rec2/recv-2176-1
in this case).  We can't destroy it because it has a child, b.  We need to
rename b to be under the new a.  However, we are not renaming it, which
is the root cause of the problem.

This code in recv_incremental_replication() should detect that we should
rename b:

if ((stream_parent_fromsnap_guid != 0 
parent_fromsnap_guid != 0 
stream_parent_fromsnap_guid != parent_fromsnap_guid) ||
...

But this will not trigger because we have already destroyed the snapshots of
b's parent (the old a, now rec2/recv-2176-1), so parent_fromsnap_guid
will be 0.  I believe that the fix for bug 6921421 introduced this code in
build 135, it used to read:

if ((stream_parent_fromsnap_guid != 0 
stream_parent_fromsnap_guid != parent_fromsnap_guid) ||
...

So we will have to investigate and see why the  parent_fromsnap_guid !=
0 is now needed.

--matt

On Tue, Nov 23, 2010 at 6:16 AM, James Van Artsdalen 
james-opensola...@jrv.org wrote:

 I am seeing a zfs recv bug on FreeBSD and am wondering if someone could
 test this in the Solaris code.  If it fails there then I guess a bug report
 into Solaris is needed.

 This is a perverse case of filesystem renaming between snapshots.

 kraken:/root# cat zt

 zpool create rec1 da3
 zpool create rec2 da4

 zfs create rec1/a
 zfs create rec1/a/b

 zfs snapshot -r r...@s1
 zfs send -R r...@s1 | zfs recv -dvuF rec2

 zfs rename rec1/a/b rec1/c
 zfs destroy -r rec1/a
 zfs create rec1/a
 zfs rename rec1/c rec1/a/b # if the rename target is anything other than
 rec1/a/b the zfs recv result is right

 zfs snapshot -r r...@s2
 zfs send -R -I @s1 r...@s2 | zfs recv -dvuF rec2
 kraken:/root# sh -x zt
 + zpool create rec1 da3
 + zpool create rec2 da4
 + zfs create rec1/a
 + zfs create rec1/a/b
 + zfs snapshot -r r...@s1
 + zfs send -R r...@s1
 + zfs recv -dvuF rec2
 receiving full stream of r...@s1 into r...@s1
 received 47.4KB stream in 2 seconds (23.7KB/sec)
 receiving full stream of rec1/a...@s1 into rec2/a...@s1
 received 47.9KB stream in 1 seconds (47.9KB/sec)
 receiving full stream of rec1/a/b...@s1 into rec2/a/b...@s1
 received 46.3KB stream in 1 seconds (46.3KB/sec)
 + zfs rename rec1/a/b rec1/c
 + zfs destroy -r rec1/a
 + zfs create rec1/a
 + zfs rename rec1/c rec1/a/b
 + zfs snapshot -r r...@s2
 + zfs send -R -I @s1 r...@s2
 + zfs recv -dvuF rec2
 attempting destroy rec2/a...@s1
 success
 attempting destroy rec2/a
 failed - trying rename rec2/a to rec2/recv-2176-1
 local fs rec2/a/b new parent not found
 cannot open 'rec2/a/b': dataset does not exist
 another pass:
 attempting destroy rec2/recv-2176-1
 failed (0)
 receiving incremental stream of r...@s2 into r...@s2
 received 10.8KB stream in 2 seconds (5.41KB/sec)
 receiving full stream of rec1/a...@s2 into rec2/a...@s2
 received 47.9KB stream in 1 seconds (47.9KB/sec)
 receiving incremental stream of rec1/a/b...@s2 into rec2/recv-2176-1/b...@s2
 received 312B stream in 2 seconds (156B/sec)
 local fs rec2/a does not have fromsnap (s1 in stream); must have been
 deleted locally; ignoring
 attempting destroy rec2/recv-2176-1
 failed (0)
 kraken:/root# zfs list | grep rec1
 rec1   238K
 1.78T32K  /rec1
 rec1/a  63K
 1.78T32K  /rec1/a
 rec1/a/b31K
 1.78T31K  /rec1/a/b
 kraken:/root# zfs list | grep rec2
 rec2   293K
 1.78T32K  /rec2
 rec2/a  32K
 1.78T32K  /rec2/a
 rec2/recv-2176-164K
 1.78T32K  /rec2/recv-2176-1
 rec2/recv-2176-1/b  32K
  1.78T31K  /rec2/recv-2176-1/b
 kraken:/root#
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS snapshot limit?

2010-12-02 Thread f...@ll

W dniu 2010-12-01 15:19, Menno Lageman pisze:

f...@ll wrote:

Hi,

I must send zfs snaphost from one server to another. Snapshot have size
130GB. Now I have question, the zfs have any limit of sending file?


If you are sending the snapshot to another zpool (i.e. using 'zfs send 
| zfs recv') then no, there is no limit. If you however send the 
snapshot to a file on the other system (i.e. 'zfs send  somefile') 
then you are limited by what the file system you are creating the file 
on supports.


Menno

Hi,

In my situation is first option, I send snapshot to another server using 
zfs send | zfs recv and I have problem when data send is completed, 
after reboot the zpool have error or have state: faulted.
First server is physical, second is a virtual machine running under 
xenserver 5.6


f...@ll
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS imported into GRUB

2010-12-02 Thread Robert Millan
Hi,

Following our new strategy with regard to Oracle code, we (GRUB
maintainers) have decided to grant an exception to our usual policy and
import ZFS code from grub-extras into official GRUB.

Our usual policy is to require copyright assignment for all new code, so
that FSF can use it to defend users' freedom in court.  If that's not
possible, at least a disclaimer asserting authorship (i.e. that no
copyright infringement has been committed).  The purpose of this, as
always, is ensuring that GRUB is a legally safe codebase.

The ZFS code that has been imported into GRUB derives from the
OpenSolaris version of GRUB Legacy.  On one hand, this code was released
to the public under the terms of the GNU GPL.  On the other, binary
releases of Solaris included this modified GRUB, and as a result
Oracle/Sun is bound by the GPL.

We believe that these two factors give us very strong reassurance that:

a) Oracle owns the copyright to this code
and
b) Oracle is licensing it under GPL

and therefore it is completely safe to use this in GRUB.

We're looking forward to this code import will foster collaboration on
ZFS support for GRUB.  Our understanding is that next version of
Solaris will ship with GRUB 2, and so we expect the whole OpenSolaris
ecosystem to do this move as well.  We encourage downstream distributors
to anticipate this by preparing their transition from the old, legacy
version of GRUB (0.97) which is no longer supported by GRUB developers.


Finally, a word about patents.  Software patents are terribly harmful to
free software, and to IT in general.  We believe they should be
abolished.  However, until that happens, we need to take measures to
protect our users.  We recognize it is practically impossible for end
users to archieve a situation where they're completely safe from patent
infringement (even if they pay so-called patent taxes to specific
companies).

However, we encourage our users to make careful choices when importing
technology that is designed in an in-door development model (rather
than in the community), because it's prone to be heavily patented.

This is the reason why, when we (the GNU project) developed the GPL, we
included certain provisions in it to ensure a patent holder can't
benefit from the freedoms we gave them and at the same time use patents
to undermine these freedoms for others.

Thanks to this, and due to the fact that Oracle is bound to the terms
of the GNU GPL when it comes to GRUB, we believe this renders patents
covering ZFS basically harmless to GRUB users.  If the patents
covering GRUB are held by Oracle, they can't use them against GRUB
users, and if they're held by other parties, the GPL provisions will
prevent Oracle from paying a tax only for themselves, so if they will
fight alongside the community instead of betraying it.

Let this serve as yet another example on why so-called permissive
licenses aren't always a guarantee that the code covered by them can be
used freely.  If you intend for your code to be free for all users,
always use the latest version of the GPL.

-- 
Robert Millan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express

2010-12-02 Thread Nicolas Williams
On Wed, Nov 17, 2010 at 01:58:06PM -0800, Bill Sommerfeld wrote:
 On 11/17/10 12:04, Miles Nordin wrote:
 black-box crypto is snake oil at any level, IMNSHO.
 
 Absolutely.

As Darren said, much of the design has been discussed in public, and
reviewed by cryptographers.  It'd be nicer if we had a detailed paper
though.

 Congrats again on finishing your project, but every other disk
 encryption framework I've seen taken remotely seriously has a detailed
 paper describing the algorithm, not just a list of features and a
 configuration guide.  It should be a requirement for anything treated
 as more than a toy.  I might have missed yours, or maybe it's coming
 soon.
 
 In particular, the mechanism by which dedup-friendly block IV's are
 chosen based on the plaintext needs public scrutiny.  Knowing
 Darren, it's very likely that he got it right, but in crypto, all
 the details matter and if a spec detailed enough to allow for
 interoperability isn't available, it's safest to assume that some of
 the details are wrong.

Dedup + crypto does have security implications.  Specifically: it
facilitates traffic analysis, and then known- and even
chosen-plaintext attacks (if there were any practical such attacks on
the cipher).

For example, IIUC, the ratio of dedup vs.  non-dedup blocks + analysis
of dnodes and their data sizes (in blocks) + per-dnode dedup ratios can
probably be used to identify OS images, which would then help mount
known-plaintext attacks.  For a mailstore you'd be able to distinguish
mail sent or kept by a single local user vs. mail sent to and kept by
more than one local user, and by sending mail you could help mount
chose-plaintext attacks.  And so on.

My advice would be to not bother encrypting OS images, and if you
encrypt only documents, then dedup is likely of less or no interest to
you -- in general, you may not want to bother with dedup + crypto.
However, it is fantastic that crypto and dedup can work together.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Booting fails with `Can not read the pool label' error

2010-12-02 Thread Rainer Orth
Hi Cindy,

 I haven't seen this in a while but I wonder if you just need to set the
 bootfs property on your new root pool and/or reapplying the bootblocks.

 Can you import this pool booting from a LiveCD and to review the
 bootfs property value? I would also install the boot blocks on the
 rpool2 disk.

 I would also check the grub entries in /rpool2/boot/grub/menu.lst.

I've now repeated everything with snv_151a and it worked out of the box
on the Sun Fire V880, and (on second try) also on my Blade 1500: it
seems the first time round I had the devalias for the second IDE disk
wrong:

/p...@1e,60/i...@d/d...@0,1 instead of /p...@1e,60/i...@d/d...@1,0

I'm now happyling running snv_151a on both machines (and still using
Xsun on the Blade 1500, so still usable as a desktop :-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express

2010-12-02 Thread Darren J Moffat

On 17/11/2010 21:58, Bill Sommerfeld wrote:

In particular, the mechanism by which dedup-friendly block IV's are
chosen based on the plaintext needs public scrutiny. Knowing Darren,
it's very likely that he got it right, but in crypto, all the details
matter and if a spec detailed enough to allow for interoperability isn't
available, it's safest to assume that some of the details are wrong.


That is described here:

http://blogs.sun.com/darren/entry/zfs_encryption_what_is_on

If dedup=on for the dataset the per block IVs are generated 
differently.  They are generated by taking an HMAC-SHA256 of the 
plaintext and using the left most 96 bits of that as the IV.  The key 
used for the HMAC-SHA256 is different to the one used by AES for the 
data encryption, but is stored (wrapped) in the same keychain entry, 
just like the data encryption key a new one is generated when doing a 
'zfs key -K dataset'.  Obviously we couldn't calculate this IV when 
doing a read so it has to be stored.


This was also suggested independently by other well known people 
involved in encrypted filesystems while it was discussed on a public 
forum (most of that thread was cross posted to zfs-crypto-discuss).


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send receive problem/questions

2010-12-02 Thread Cindy Swearingen

Hi Don,

I'm no snapshot expert but I think you will have to remove the previous
receiving side snapshots, at least.

I created a file system hierarchy that includes a lower-level snapshot,
created a recursive snapshot of that hierarchy and sent it over to
a backup pool. Then, did the same steps again. See the example below.

You can see from my example that this process fails if I don't remove
the existing snapshots first. And, because I didn't remove the original
recursive snapshots on the sending side, the snapshots become nested. 
I'm sure someone else has a better advice.


I had an example of sending root pool snapshots on the ZFS 
troubleshooting wiki but it was removed so I will try to restore that

example.

Thanks,

Cindy

# zfs list -r tank/home
NAME  USED  AVAIL  REFER  MOUNTPOINT
tank/home1.12M  66.9G25K  /tank/home
tank/h...@snap2  0  -25K  -
tank/home/anne280K  66.9G   280K  /tank/home/anne
tank/home/a...@snap2 0  -   280K  -
tank/home/bob 280K  66.9G   280K  /tank/home/bob
tank/home/b...@snap2  0  -   280K  -
tank/home/cindys  561K  66.9G   281K  /tank/home/cindys
tank/home/cin...@snap2   0  -   281K  -
tank/home/cindys/dir1 280K  66.9G   280K  /tank/home/cindys/dir1
tank/home/cindys/d...@snap1  0  -   280K  -
tank/home/cindys/d...@snap2  0  -   280K  -
# zfs send -R tank/h...@snap2 | zfs recv -d bpool
# zfs list -r bpool/home
NAME   USED  AVAIL  REFER  MOUNTPOINT
bpool/home1.12M  33.2G25K  /bpool/home
bpool/h...@snap2  0  -25K  -
bpool/home/anne280K  33.2G   280K  /bpool/home/anne
bpool/home/a...@snap2 0  -   280K  -
bpool/home/bob 280K  33.2G   280K  /bpool/home/bob
bpool/home/b...@snap2  0  -   280K  -
bpool/home/cindys  561K  33.2G   281K  /bpool/home/cindys
bpool/home/cin...@snap2   0  -   281K  -
bpool/home/cindys/dir1 280K  33.2G   280K  /bpool/home/cindys/dir1
bpool/home/cindys/d...@snap1  0  -   280K  -
bpool/home/cindys/d...@snap2  0  -   280K  -
# zfs snapshot -r tank/h...@snap3
# zfs send -R tank/h...@snap3 | zfs recv -dF bpool
cannot receive new filesystem stream: destination has snapshots (eg. 
bpool/h...@snap2)

must destroy them to overwrite it
# zfs destroy -r bpool/h...@snap2
# zfs destroy bpool/home/cindys/d...@snap1
# zfs send -R tank/h...@snap3 | zfs recv -dF bpool
# zfs list -r bpool
NAME   USED  AVAIL  REFER  MOUNTPOINT
bpool 1.35M  33.2G23K  /bpool
bpool/home1.16M  33.2G25K  /bpool/home
bpool/h...@snap2  0  -25K  -
bpool/h...@snap3  0  -25K  -
bpool/home/anne280K  33.2G   280K  /bpool/home/anne
bpool/home/a...@snap2 0  -   280K  -
bpool/home/a...@snap3 0  -   280K  -
bpool/home/bob 280K  33.2G   280K  /bpool/home/bob
bpool/home/b...@snap2  0  -   280K  -
bpool/home/b...@snap3  0  -   280K  -
bpool/home/cindys  582K  33.2G   281K  /bpool/home/cindys
bpool/home/cin...@snap2   0  -   281K  -
bpool/home/cin...@snap3   0  -   281K  -
bpool/home/cindys/dir1 280K  33.2G   280K  /bpool/home/cindys/dir1
bpool/home/cindys/d...@snap1  0  -   280K  -
bpool/home/cindys/d...@snap2  0  -   280K  -
bpool/home/cindys/d...@snap3  0  -   280K  -


On 12/01/10 11:30, Don Jackson wrote:
Hello, 


I am attempting to move a bunch of zfs filesystems from one pool to another.

Mostly this is working fine, but one collection of file systems is causing me problems, 
and repeated re-reading of man zfs and the ZFS Administrators Guide is not 
helping.  I would really appreciate some help/advice.

Here is the scenario.
I have a nested (hierarchy) of zfs file systems.
Some of the deeper fs are snapshotted.
All this exists on the source zpool
First I recursively snapshotted the whole subtree:

   zfs snapshot -r nasp...@xfer-11292010 


Here is a subset of the source zpool:

# zfs list -r naspool
NAME   USED  AVAIL  REFER  
MOUNTPOINT
naspool   1.74T  42.4G  37.4K  /naspool
nasp...@xfer-11292010 0  -  37.4K  -
naspool/openbsd113G  42.4G  23.3G  
/naspool/openbsd
naspool/open...@xfer-11292010 0  -  23.3G  -
naspool/openbsd/4.4   21.6G  42.4G  2.33G  
/naspool/openbsd/4.4
naspool/openbsd/4...@xfer-11292010 0  -  2.33G  -
naspool/openbsd/4.4/ports  592M  42.4G   200M  
/naspool/openbsd/4.4/ports

Re: [zfs-discuss] ashift and vdevs

2010-12-02 Thread Miles Nordin
 dm == David Magda dma...@ee.ryerson.ca writes:

dm The other thing is that with the growth of SSDs, if more OS
dm vendors support dynamic sectors, SSD makers can have
dm different values for the sector size 

okay, but if the size of whatever you're talking about is a multiple
of 512, we don't actually need (or, probably, want!) any SCSI sector
size monkeying around.  Just establish a minimum write size in the
filesystem, and always write multiple aligned 512-sectors at once
instead.

the 520-byte sectors you mentioned can't be accomodated this way, but
for 4kByte it seems fine.

dm to allow for performance
dm changes as the technology evolves.  Currently everything is
dm hard-coded,

XFS is hardcoded.  NTFS has settable block size.  ZFS has ashift
(almost).  ZFS slog is apparently hardcoded though.  so, two of those
four are not hardcoded, and the two hardcoded ones are hardcoded to
4kByte.

dm Until you're in a virtualized environment. I believe that in
dm the combination of NetApp and VMware, a 64K alignment is best
dm practice last I head. Similarly with the various stripe widths
dm available on traditional RAID arrays, it could be advantageous
dm for the OS/FS to know it.

There is another setting in XFS for RAID stripe size, but I don't know
what it does.  It's separate from the (unsettable) XFS block size
setting.  so...this 64kByte thing might not be the same thing as what
we're talking about so far...though in terms of aligning partitions
it's the same, I guess.


pgpKhRGPwJZ8d.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express

2010-12-02 Thread Nicolas Williams
Also, when the IV is stored you can more easily look for accidental IV
re-use, and if you can find hash collisions, them you can even cause IV
re-use (if you can write to the filesystem in question).  For GCM IV
re-use is rather fatal (for CCM it's bad, but IIRC not fatal), so I'd
not use GCM with dedup either.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover data from detached ZFS mirror

2010-12-02 Thread maciej kaminski
Thank you, it is exactly what I needed. Trying to compile this thing on a SPARC 
system :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss