Re: [zfs-discuss] encfs on top of zfs

2012-08-02 Thread Richard Elling

On Jul 31, 2012, at 8:05 PM, opensolarisisdeadlongliveopensolaris wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Richard Elling
 
 I believe what you meant to say was dedup with HDDs sux. If you had
 used fast SSDs instead of HDDs, you will find dedup to be quite fast.
  -- richard
 
 Yes, but this is a linear scale.  

No, it is definitely NOT a linear scale. Study Amdahl's law a little more 
carefully.

 Suppose an SSD without dedup is 100x faster than a HDD without dedup.  And 
 suppose dedup slows down a system by a factor of 10x.  Now your SSD with 
 dedup is only 10x faster than the HDD without dedup.  So quite fast is a 
 relative term.

Of course it is.

  The SSD with dedup is still faster than the HDD without dedup, but it's also 
 slower than the SSD without dedup.

duh. With dedup you are trading IOPS for space. In general, HDDs have lots of 
space and
terrible IOPS. SSDs have less space, but more IOPS. Obviously, as you point 
out, the best
solution is lots of space and lots of IOPS.

 The extent of fibbing I'm doing is thusly:  In reality, an SSD is about 
 equally fast with HDD for sequential operations, and about 100x faster for 
 random IO.  It just so happens that the dedup performance hit is almost 
 purely random IO, so it's right in the sweet spot of what SSD's handle well.  

In the vast majority of modern systems, there are no sequential I/O workloads. 
That is a myth 
propagated by people who still think HDDs can be fast.

 You can't use an overly simplified linear model like I described above - In 
 reality, there's a grain of truth in what Richard said, and also a grain of 
 truth in what I said.  The real truth is somewhere in between what he said 
 and what I said.

But closer to my truth :-)

 No, the SSD will not perform as well with dedup as it does without dedup.  
 But the suppose dedup slows down by 10x that I described above is not 
 accurate.  Depending on what you're doing, dedup might slow down an HDD by 
 20x, and it might only slow down SSD by 4x doing the same work load.  Highly 
 variable, and highly dependent on the specifics of your workload.

You are making the assumption that the system is not bandwidth limited. This is 
a
good assumption for the HDD case, because the media bandwidth is much less 
than the interconnect bandwidth. For SSDs, this assumption is not necessarily 
true.
There are SSDs that are bandwidth constrained on the interconnect, and in those
cases, your model fails.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] encfs on top of zfs

2012-07-31 Thread opensolarisisdeadlongliveopensolaris
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Tristan Klocke
 
 I want to switch to ZFS, but still want to encrypt my data. Native Encryption
 for ZFS was added in ZFS Pool Version Number 30, but I'm using ZFS on
 FreeBSD with Version 28. My question is how would encfs (fuse encryption)
 affect zfs specific features like data Integrity and deduplication?

Data integrity:  ZFS  encfs will work great together.

Dedup:  First of all, I don't recommend using dedup under any circumstance.  
Not that it's unstable or anything, just that the performance is so horrible, 
it's never worth while.  But particularly with encrypted data, you're 
guaranteed to have no duplicate data anyway, so it would be a pure waste.  
Don't do it.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] encfs on top of zfs

2012-07-31 Thread Ray Arachelian
On 07/31/2012 09:46 AM, opensolarisisdeadlongliveopensolaris wrote:
 Dedup: First of all, I don't recommend using dedup under any
 circumstance. Not that it's unstable or anything, just that the
 performance is so horrible, it's never worth while. But particularly
 with encrypted data, you're guaranteed to have no duplicate data
 anyway, so it would be a pure waste. Don't do it.
 ___ zfs-discuss mailing
 list zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

One thing you can do is enable dedup when you copy all your data from
one zpool to another, then, when you're done, disable dedup.  It will no
longer waste a ton of memory, and your new volume will have a high dedup
ratio. (Obviously anything you add after you turn dedup off won't be
deduped.)  You can keep the old pool as a backup, or wipe it or whatever
and later on do the same operation in the other direction.

Anyone know if zfs send | zfs get will maintain the deduped files after
this?  Maybe once deduped you can wipe the old pool, then use get|send
to get a deduped backup?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] encfs on top of zfs

2012-07-31 Thread Nigel W
On Tue, Jul 31, 2012 at 9:36 AM, Ray Arachelian r...@arachelian.com wrote:
 On 07/31/2012 09:46 AM, opensolarisisdeadlongliveopensolaris wrote:
 Dedup: First of all, I don't recommend using dedup under any
 circumstance. Not that it's unstable or anything, just that the
 performance is so horrible, it's never worth while. But particularly
 with encrypted data, you're guaranteed to have no duplicate data
 anyway, so it would be a pure waste. Don't do it.
 ___ zfs-discuss mailing
 list zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 One thing you can do is enable dedup when you copy all your data from
 one zpool to another, then, when you're done, disable dedup.  It will no
 longer waste a ton of memory, and your new volume will have a high dedup
 ratio. (Obviously anything you add after you turn dedup off won't be
 deduped.)  You can keep the old pool as a backup, or wipe it or whatever
 and later on do the same operation in the other direction.

Once something is written deduped you will always use the memory when
you want to read any files that were written when dedup was enabled,
so you do not save any memory unless you do not normally access most
of your data.

Also don't let the system crash :D or try to delete too much from the
deduped dataset :D (including snapshots or the dataset itself) because
then you have to reload all (most) of the DDT in order to delete the
files.  This gets a lot of people in trouble (including me at $work
:|) because you need to have the ram available at all times to load
the most (75% to grab a number out of the air) in case the server
crashes. Otherwise you are stuck with a machine trying to verify its
filesystem for hours. I have one test system that has 4 GB of RAM and
2 TB of deduped data, when it crashes (panic, powerfailure, etc) it
would take 8-12 hours to boot up again.  It now has 1TB of data and
will boot in about 5 minutes or so.


 Anyone know if zfs send | zfs get will maintain the deduped files after
 this?  Maybe once deduped you can wipe the old pool, then use get|send
 to get a deduped backup?

No zfs send | zfs recv will not dedup new data unless dedup is on,
however, the existing files remain deduped until the snapshot that is
retaining them is removed.


From practical work, dedup is only worth is if you have a 10 dedup
ratio or a small deduped dataset because in practice at $work we have
found that 1TB of deduped data require 6-8GB of RAM (using multi
hundred GB files, so 128K record sizes, with smaller record sizes the
ram numbers balloon).  Which means even a low dedup ratio of 3 it is
still cheaper to just buy more disks than find a system that can hold
enough ram to dedup 4-5TB of data (64-128GB of RAM). Of course keep in
mind we at $work only have practical knowledge of systems up to 4 TB
of deduped data.  We are planning some 45+TB and plan on using
compression=gzip-9 and dedup=off.

As far as the OP is concerned, unless you have a dataset that will
dedup well don't bother with it, use compression instead (don't use
both compression and dedup because you will shrink the average record
size and balloon the memory usage).  As mentioned by Ned Harvey, dedup
of encrypted data is probably completely useless (it depends on the
chaining mode, duplicate data in the encrypted dataset may be
encrypted the same way allow for dedup of the encrypted data stream).
However, if the encryption is done below zfs, like what GELI would be
doing by giving the encrypted block device to zfs, then the use of
dedup reverts back the the standard, are you going to have enough
duplicate data to get a ratio that is high enough to be worth the RAM
requirements? question.

As I reread the OP, to make sure that the OPs question is actually
answered; use of encryption will not affect zfs integrity beyond
normal issues associated with zfs integrity and disk encryption. That
is, missing encryption keys is like a failed disk (if below zfs) or
any other file on a zfs filesystem (if above zfs).  Zfs integrity
(checksumming and the transactions) are not affected by the underlying
block device (if encryption is below zfs) or the contents of a file on
the system system (if encryption is above zfs).  The raid features
of zfs also work no differently, 'n' encrypted block devices is no
different that 'n' plain hard drives as far as ZFS is concerned.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] encfs on top of zfs

2012-07-31 Thread Richard Elling

On Jul 31, 2012, at 10:07 AM, Nigel W wrote:

 On Tue, Jul 31, 2012 at 9:36 AM, Ray Arachelian r...@arachelian.com wrote:
 On 07/31/2012 09:46 AM, opensolarisisdeadlongliveopensolaris wrote:
 Dedup: First of all, I don't recommend using dedup under any
 circumstance. Not that it's unstable or anything, just that the
 performance is so horrible, it's never worth while. But particularly
 with encrypted data, you're guaranteed to have no duplicate data
 anyway, so it would be a pure waste. Don't do it.
 ___ zfs-discuss mailing
 list zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 One thing you can do is enable dedup when you copy all your data from
 one zpool to another, then, when you're done, disable dedup.  It will no
 longer waste a ton of memory, and your new volume will have a high dedup
 ratio. (Obviously anything you add after you turn dedup off won't be
 deduped.)  You can keep the old pool as a backup, or wipe it or whatever
 and later on do the same operation in the other direction.
 
 Once something is written deduped you will always use the memory when
 you want to read any files that were written when dedup was enabled,
 so you do not save any memory unless you do not normally access most
 of your data.
 
 Also don't let the system crash :D or try to delete too much from the
 deduped dataset :D (including snapshots or the dataset itself) because
 then you have to reload all (most) of the DDT in order to delete the
 files.  This gets a lot of people in trouble (including me at $work
 :|) because you need to have the ram available at all times to load
 the most (75% to grab a number out of the air) in case the server
 crashes. Otherwise you are stuck with a machine trying to verify its
 filesystem for hours. I have one test system that has 4 GB of RAM and
 2 TB of deduped data, when it crashes (panic, powerfailure, etc) it
 would take 8-12 hours to boot up again.  It now has 1TB of data and
 will boot in about 5 minutes or so.

I believe what you meant to say was dedup with HDDs sux. If you had
used fast SSDs instead of HDDs, you will find dedup to be quite fast.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] encfs on top of zfs

2012-07-31 Thread Robert Milkowski
 Once something is written deduped you will always use the memory when
 you want to read any files that were written when dedup was enabled, so
 you do not save any memory unless you do not normally access most of
 your data.

For reads you don't need ddt. Also in Solaris 11 (not in Illumos
unfortunately AFAIK) on reads the in-memory ARC will also stay deduped (so
if 10x logical blocks are deduped to 1 and you read all 10 logical copies,
only one block in arc will be allocated). If there are no further
modifications and you only read dedupped data, apart from disk space
savings, there can be very nice improvement in performance as well (less
i/o, more ram for caching, etc.).


 
 As far as the OP is concerned, unless you have a dataset that will
 dedup well don't bother with it, use compression instead (don't use
 both compression and dedup because you will shrink the average record
 size and balloon the memory usage).

Can you expand a little bit more here?
Dedup+compression works pretty well actually (not counting standard
problems with current dedup - compression or not).


-- 
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] encfs on top of zfs

2012-07-31 Thread Patrick Heinson
HI

I use GELI with ZFS all the time. Works fine for me so far.


Am 31.07.12 21:54, schrieb Robert Milkowski:
 Once something is written deduped you will always use the memory when
 you want to read any files that were written when dedup was enabled, so
 you do not save any memory unless you do not normally access most of
 your data.
 
 For reads you don't need ddt. Also in Solaris 11 (not in Illumos
 unfortunately AFAIK) on reads the in-memory ARC will also stay deduped (so
 if 10x logical blocks are deduped to 1 and you read all 10 logical copies,
 only one block in arc will be allocated). If there are no further
 modifications and you only read dedupped data, apart from disk space
 savings, there can be very nice improvement in performance as well (less
 i/o, more ram for caching, etc.).
 
 

 As far as the OP is concerned, unless you have a dataset that will
 dedup well don't bother with it, use compression instead (don't use
 both compression and dedup because you will shrink the average record
 size and balloon the memory usage).
 
 Can you expand a little bit more here?
 Dedup+compression works pretty well actually (not counting standard
 problems with current dedup - compression or not).
 
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] encfs on top of zfs

2012-07-31 Thread opensolarisisdeadlongliveopensolaris
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Ray Arachelian
 
 One thing you can do is enable dedup when you copy all your data from
 one zpool to another, then, when you're done, disable dedup.  It will no
 longer waste a ton of memory, and your new volume will have a high dedup
 ratio. 

That's not correct.  It sounds like you are mistakenly believing the DDT gets 
held in memory, but actually, it's held on disk and since it gets used so much, 
large portions of it will likely be in ARC/L2ARC.  Unfortunately, after you 
dedup a pool and disable dedup, the DDT will still get used frequently, and 
still take just as much memory most likely.  But that's not the main concern 
anyway - The main concern is things like snapshot destroy (or simply rm) which 
need to unlink blocks.  This requires decrementing the refcount, which requires 
finding and writing the DDT entry, which means a flurry of essentially small 
random IO.  So the memory  performance with dedup disabled is just as bad, as 
long as you previously had dedup enabled for a signfiicant percentage of your 
pool.


 Anyone know if zfs send | zfs get will maintain the deduped files after
 this?  Maybe once deduped you can wipe the old pool, then use get|send
 to get a deduped backup?

You can enable the dedup property on a receiving pool, and then the data 
received will be dedup'd.  The behavior is dependent on the properties of the 
receiving pool.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] encfs on top of zfs

2012-07-31 Thread opensolarisisdeadlongliveopensolaris
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Richard Elling
  
 I believe what you meant to say was dedup with HDDs sux. If you had
 used fast SSDs instead of HDDs, you will find dedup to be quite fast.
  -- richard

Yes, but this is a linear scale.  Suppose an SSD without dedup is 100x faster 
than a HDD without dedup.  And suppose dedup slows down a system by a factor of 
10x.  Now your SSD with dedup is only 10x faster than the HDD without dedup.  
So quite fast is a relative term.  The SSD with dedup is still faster than 
the HDD without dedup, but it's also slower than the SSD without dedup.

The extent of fibbing I'm doing is thusly:  In reality, an SSD is about equally 
fast with HDD for sequential operations, and about 100x faster for random IO.  
It just so happens that the dedup performance hit is almost purely random IO, 
so it's right in the sweet spot of what SSD's handle well.  You can't use an 
overly simplified linear model like I described above - In reality, there's a 
grain of truth in what Richard said, and also a grain of truth in what I said.  
The real truth is somewhere in between what he said and what I said.

No, the SSD will not perform as well with dedup as it does without dedup.  But 
the suppose dedup slows down by 10x that I described above is not accurate.  
Depending on what you're doing, dedup might slow down an HDD by 20x, and it 
might only slow down SSD by 4x doing the same work load.  Highly variable, and 
highly dependent on the specifics of your workload.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] encfs on top of zfs

2012-07-30 Thread Tristan Klocke
Dear ZFS-Users,

I want to switch to ZFS, but still want to encrypt my data. Native
Encryption for ZFS was added in ZFS Pool Version Number
30http://en.wikipedia.org/wiki/ZFS#Release_history,
but I'm using ZFS on FreeBSD with Version 28. My question is how would
encfs (fuse encryption) affect zfs specific features like data Integrity
and deduplication?

Regards

Tristan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] encfs on top of zfs

2012-07-30 Thread Freddie Cash
On Mon, Jul 30, 2012 at 5:20 AM, Tristan Klocke
tristan.klo...@googlemail.com wrote:
 I want to switch to ZFS, but still want to encrypt my data. Native
 Encryption for ZFS was added in ZFS Pool Version Number 30, but I'm using
 ZFS on FreeBSD with Version 28. My question is how would encfs (fuse
 encryption) affect zfs specific features like data Integrity and
 deduplication?

If you are using FreeBSD, why not use GELI to provide the block
devices used for the ZFS vdevs?  That's the standard way to get
encryption and ZFS working on FreeBSD.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss