Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-25 Thread Edward Ned Harvey
> From: Roy Sigurd Karlsbakk [mailto:r...@karlsbakk.net]
> 
> > For now, the rule of thumb is 3G ram for every 1TB of unique data,
> > including
> > snapshots and vdev's.
> 
> 3 gigs? Last I checked it was a little more than 1GB, perhaps 2 if you
> have small files.

http://opensolaris.org/jive/thread.jspa?threadID=131761

The true answer is "it varies" depending on things like block size, etc, so if 
you want to say 1G or 3G, despite sounding like a big difference, it's in the 
noise.  We're only talking "rule of thumb" here, based on vague (vague) and 
widely variable estimates of your personal usage characteristics.

It's just a rule of thumb, and slightly over 1G ~= slightly under 3G in this 
context.

Hence, the comment:

> After a system is running, I don't know how/if you can measure current
> mem usage, to gauge the results of your own predictions.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-25 Thread Scott Meilicke
When I do the calculations, assuming 300bytes per block to be conservative, 
with 128K blocks, I get 2.34G of cache (RAM, L2ARC) per Terabyte of deduped 
data. But block size is dynamic, so you will need more than this.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-25 Thread Roy Sigurd Karlsbakk
> > For de-duplication to perform well you need to be able to fit the
> > de-
> > dup table in memory. Is a good rule-of-thumb for needed RAM
> > Size=(pool
> > capacity/avg block size)*270 bytes? Or perhaps it's
> > Size/expected_dedup_ratio?
> 
> For now, the rule of thumb is 3G ram for every 1TB of unique data,
> including
> snapshots and vdev's.
> 
> After a system is running, I don't know how/if you can measure current
> mem usage, to gauge the results of your own predictions.

3 gigs? Last I checked it was a little more than 1GB, perhaps 2 if you have 
small files.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-25 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Brad Stone
> 
> For de-duplication to perform well you need to be able to fit the de-
> dup table in memory. Is a good rule-of-thumb for needed RAM  Size=(pool
> capacity/avg block size)*270 bytes? Or perhaps it's
> Size/expected_dedup_ratio?

For now, the rule of thumb is 3G ram for every 1TB of unique data, including
snapshots and vdev's.

After a system is running, I don't know how/if you can measure current mem
usage, to gauge the results of your own predictions.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-24 Thread Brad Stone
For de-duplication to perform well you need to be able to fit the de-dup table 
in memory. Is a good rule-of-thumb for needed RAM  Size=(pool capacity/avg 
block size)*270 bytes? Or perhaps it's Size/expected_dedup_ratio?

And if you limit de-dup to certain datasets in the pool, how would this
calculation change?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Peter Taps
> 
> The dedup property is set on a filesystem, not on the pool.
> 
> However, the dedup ratio is reported on the pool and not on the
> filesystem.

As with most other ZFS concepts, the core functionality of ZFS is
implemented in zpool.  Hence, zpool is up to what ... version 25 or so now?
Think of ZFS (the posix filesystem) as just an interface which tightly
integrates the zpool features.  ZFS is only up to what, version 4 now?

Perfect example:  

If you create a zvol in linux, without formatting it zfs, and format it
ext3/4, then you can snapshot it, and I believe you can even "zfs send" and
receive.  And so on.  The core functionality is mostly present.  But if you
want to access the snapshot, you have to create some mountpoint, and mount
read-only the snapshot zvol to the mountpoint.  It's not automatic.  It's
barely any better than the crappy "snapshot" concept linux has in LVM.  If
you want good automatic snapshot creation & seamless mounting & automatic
mounting, then you need the ZFS filesystem on top of the zpool.  Cuz the ZFS
filesystem knows about that underlying zpool feature, and makes it
convenient and easy good experience.  ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread Scott Meilicke
Hi Peter,

dedupe is pool wide. File systems can opt in or out of dedupe. So if multiple 
file systems are set to dedupe, then they all benefit from using the same pool 
of deduped blocks. In this way, if two files share some of the same blocks, 
even if they are in different file systems, they will dedupe.

I am not sure why reporting is not done at the file system level. It may be an 
accounting issue, i.e. which file system owns the dedupe blocks. But it seems 
some fair estimate could be made. Maybe the overhead to keep a file system 
updated with these stats is too high?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread zfs user

I believe it goes a something like this -

ZPS filesystems with dedupe turned on can be thought of as hippie/socialist 
filesystems, wanting to "share", etc.  Filesystems with dedupe turned off are 
a grey Randian landscape where sharing blocks between files is seen as a 
weakness/defect. They all live together in a zpool, let's call it "San 
Francisco"...


The hippies store their shared blocks together in a communal store at the pool 
level and everything works pretty well until one of the hippie filesystems 
wants to pull a large number of their blocks out of the communal store; then 
all hell breaks loose and the grey Randians laugh at the hippies and their 
chaos but it is a joyless laughter.


That is the technical explanation, someone else may have a better explanation 
in layman's terms.


On 9/23/10 3:36 PM, Peter Taps wrote:

Folks,

I am a bit confused on the dedup relationship between the filesystem and its 
pool.

The dedup property is set on a filesystem, not on the pool.

However, the dedup ratio is reported on the pool and not on the filesystem.

Why is it this way?

Thank you in advance for your help.

Regards,
Peter

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread Darren J Moffat

On 09/23/10 15:36, Peter Taps wrote:

I am a bit confused on the dedup relationship between the filesystem and its 
pool.

The dedup property is set on a filesystem, not on the pool.


Dedup is a pool wide concept, blocks from multiple filesystems
maybe deduplicated.


However, the dedup ratio is reported on the pool and not on the filesystem.


The dedup property is on the dataset (filesystem | ZVOL) so that
you can opt in/out on a per dataset basis.  For example if you have
one or two datasets you know will never have duplicate data then don't
enable dedup on those.  For example:

zpool create tank 

zfs set dedup=on tank
zfs create tank/1
zfs create tank/1/1
zfs create tank/2
zfs create -o dedup=off tank/2/2
zfs create tank/2/2/3

In this case all datasets in the pool will participate in deduplication
with the exception of tank/2/2 and its decendents.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread Peter Taps
Folks,

I am a bit confused on the dedup relationship between the filesystem and its 
pool.

The dedup property is set on a filesystem, not on the pool.

However, the dedup ratio is reported on the pool and not on the filesystem.

Why is it this way?

Thank you in advance for your help.

Regards,
Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss