Re: [zfs-discuss] Dedup memory overhead

2010-02-04 Thread Mertol Ozyoney
Sorry fort he late answer. 

Approximately it's 150 bytes per individual block. So increasing the
blocksize is a good idea. 
Also when L1 and L2 arc is not enough system will start making disk IOPS and
RaidZ is not very effective for random IOPS and it's likely that when your
dram is not enough your perfor ance will suffer. 
You may choose to use Raid 10 which is a lot better on random loads
Mertol 




Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email mertol.ozyo...@sun.com



-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of erik.ableson
Sent: Thursday, January 21, 2010 6:05 PM
To: zfs-discuss
Subject: [zfs-discuss] Dedup memory overhead

Hi all,

I'm going to be trying out some tests using b130 for dedup on a server with
about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks).  What
I'm trying to get a handle on is how to estimate the memory overhead
required for dedup on that amount of storage.  From what I gather, the dedup
hash keys are held in ARC and L2ARC and as such are in competition for the
available memory.

So the question is how much memory or L2ARC would be necessary to ensure
that I'm never going back to disk to read out the hash keys. Better yet
would be some kind of algorithm for calculating the overhead. eg - averaged
block size of 4K = a hash key for every 4k stored and a hash occupies 256
bits. An associated question is then how does the ARC handle competition
between hash keys and regular ARC functions?

Based on these estimations, I think that I should be able to calculate the
following:
1,7 TB
1740,8  GB
1782579,2   MB
1825361100,8KB
4   average block size
456340275,2 blocks
256 hash key size-bits
1,16823E+11 hash key overhead - bits
1460206,4   hash key size-bytes
14260633,6  hash key size-KB
13926,4 hash key size-MB
13,6hash key overhead-GB

Of course the big question on this will be the average block size - or
better yet - to be able to analyze an existing datastore to see just how
many blocks it uses and what is the current distribution of different block
sizes. I'm currently playing around with zdb with mixed success  on
extracting this kind of data. That's also a worst case scenario since it's
counting really small blocks and using 100% of available storage - highly
unlikely. 

# zdb -ddbb siovale/iphone
Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects

ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0,
flags 0x0

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  57.0K64K   77.34  DMU dnode
 1116K 1K  1.50K 1K  100.00  ZFS master node
 2116K512  1.50K512  100.00  ZFS delete queue
 3216K16K  18.0K32K  100.00  ZFS directory
 4316K   128K   408M   408M  100.00  ZFS plain file
 5116K16K  3.00K16K  100.00  FUID table
 6116K 4K  4.50K 4K  100.00  ZFS plain file
 7116K  6.50K  6.50K  6.50K  100.00  ZFS plain file
 8316K   128K   952M   952M  100.00  ZFS plain file
 9316K   128K   912M   912M  100.00  ZFS plain file
10316K   128K   695M   695M  100.00  ZFS plain file
11316K   128K   914M   914M  100.00  ZFS plain file
 
Now, if I'm understanding this output properly, object 4 is composed of
128KB blocks with a total size of 408MB, meaning that it uses 3264 blocks.
Can someone confirm (or correct) that assumption? Also, I note that each
object  (as far as my limited testing has shown) has a single block size
with no internal variation.

Interestingly, all of my zvols seem to use fixed size blocks - that is,
there is no variation in the block sizes - they're all the size defined on
creation with no dynamic block sizes being used. I previously thought that
the -b option set the maximum size, rather than fixing all blocks.  Learned
something today :-)

# zdb -ddbb siovale/testvol
Dataset siovale/testvol [ZVOL], ID 45, cr_txg 4717890, 23.9K, 2 objects

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  21.0K16K6.25  DMU dnode
 1116K64K  064K0.00  zvol object
 2116K512  1.50K512  100.00  zvol prop

# zdb -ddbb siovale/tm-media
Dataset siovale/tm-media [ZVOL], ID 706, cr_txg 4426997, 240G, 2 objects

ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0,
flags 0x0

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  21.0K16K6.25  DMU dnode
 1516K 8K   240G   250G   97.33  zvol object
 2116K512  1.50K512  100.00  zvol prop

Re: [zfs-discuss] Dedup memory overhead

2010-01-22 Thread erik.ableson

On 21 janv. 2010, at 22:55, Daniel Carosone wrote:

 On Thu, Jan 21, 2010 at 05:04:51PM +0100, erik.ableson wrote:
 
 What I'm trying to get a handle on is how to estimate the memory
 overhead required for dedup on that amount of storage.   
 
 We'd all appreciate better visibility of this. This requires:
 - time and observation and experience, and
 - better observability tools and (probably) data exposed for them

I'd guess that since every written block is going to go and ask for the hash 
keys, this should result in this data living in the ARC based on the MFU 
ruleset.  The theory being that as a result if I can determine the maximum 
memory requirement for these keys, I know what my minimum memory baseline 
requirements will be to guarantee that I won't be caught short.

 So the question is how much memory or L2ARC would be necessary to
 ensure that I'm never going back to disk to read out the hash keys. 
 
 I think that's a wrong-goal for optimisation.
 
 For performance (rather than space) issues, I look at dedup as simply
 increasing the size of the working set, with a goal of reducing the
 amount of IO (avoided duplicate writes) in return.

True.  but as a practical aspect, we've seen that overall performance drops off 
the cliff if you overstep your memory bounds and the system is obliged to go to 
disk to evaluate a new block to write against the hash keys. Compounded by the 
fact that the ARC is full so it's obliged to go straight to disk, further 
exacerbating the problem.

It's this particular scenario that I'm trying to avoid and from a business 
aspect of selling ZFS based solutions (whether to a client or to an internal 
project) we need to be able to ensure that the performance is predictable with 
no surprises.

Realizing of course that all of this is based on a slew of uncontrollable 
variables (size of the working set, IO profiles, ideal block sizes, etc.).  The 
empirical approach of give it lots and we'll see if we need to add an L2ARC 
later is not really viable for many managers (despite the fact that the real 
world works like this).

 The trouble is that the hash function produces (we can assume) random
 hits across the DDT, so the working set depends on the amount of
 data and the rate of potentially dedupable writes as well as the
 actual dedup hit ratio.  A high rate of writes also means a large
 amount of data in ARC waiting to be written at the same time. This
 makes analysis very hard (and pushes you very fast towards that very
 steep cliff, as we've all seen). 

I don't think  it would be random since _any_ write operation on a deduplicated 
filesystem would require a hash check, forcing them to live in the MFU.  
However I agree that a high write rate would result in memory pressure on the 
ARC which could result in the eviction of the hash keys. So the next factor to 
include in memory sizing is the maximum write rate (determined by IO 
availability). So with a team of two GbE cards, I could conservatively say that 
I need to size for inbound write IO of 160MB/s, worst case accumulated for the 
30 second flush cycle so, say about 5GB of memory (leaving aside ZIL issues 
etc.). Noting that this is all very back of the napkin estimations, and I also 
need to have some idea of what my physical storage is capable of ingesting 
which could add to this value.

 I also think a threshold on the size of blocks to try deduping would
 help.  If I only dedup blocks (say) 64k and larger, i might well get
 most of the space benefit for much less overhead.

Well - since my primary use case is iSCSI presentation to VMware backed by 
zvols and I can manually force the block size on volume creation to 64, this 
reduces the unpredictability a little bit. That's based on the hypothesis that 
zvols use a fixed block size.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Dedup memory overhead

2010-01-21 Thread erik.ableson
Hi all,

I'm going to be trying out some tests using b130 for dedup on a server with 
about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks).  What 
I'm trying to get a handle on is how to estimate the memory overhead required 
for dedup on that amount of storage.  From what I gather, the dedup hash keys 
are held in ARC and L2ARC and as such are in competition for the available 
memory.

So the question is how much memory or L2ARC would be necessary to ensure that 
I'm never going back to disk to read out the hash keys. Better yet would be 
some kind of algorithm for calculating the overhead. eg - averaged block size 
of 4K = a hash key for every 4k stored and a hash occupies 256 bits. An 
associated question is then how does the ARC handle competition between hash 
keys and regular ARC functions?

Based on these estimations, I think that I should be able to calculate the 
following:
1,7 TB
1740,8  GB
1782579,2   MB
1825361100,8KB
4   average block size
456340275,2 blocks
256 hash key size-bits
1,16823E+11 hash key overhead - bits
1460206,4   hash key size-bytes
14260633,6  hash key size-KB
13926,4 hash key size-MB
13,6hash key overhead-GB

Of course the big question on this will be the average block size - or better 
yet - to be able to analyze an existing datastore to see just how many blocks 
it uses and what is the current distribution of different block sizes. I'm 
currently playing around with zdb with mixed success  on extracting this kind 
of data. That's also a worst case scenario since it's counting really small 
blocks and using 100% of available storage - highly unlikely. 

# zdb -ddbb siovale/iphone
Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects

ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, 
flags 0x0

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  57.0K64K   77.34  DMU dnode
 1116K 1K  1.50K 1K  100.00  ZFS master node
 2116K512  1.50K512  100.00  ZFS delete queue
 3216K16K  18.0K32K  100.00  ZFS directory
 4316K   128K   408M   408M  100.00  ZFS plain file
 5116K16K  3.00K16K  100.00  FUID table
 6116K 4K  4.50K 4K  100.00  ZFS plain file
 7116K  6.50K  6.50K  6.50K  100.00  ZFS plain file
 8316K   128K   952M   952M  100.00  ZFS plain file
 9316K   128K   912M   912M  100.00  ZFS plain file
10316K   128K   695M   695M  100.00  ZFS plain file
11316K   128K   914M   914M  100.00  ZFS plain file
 
Now, if I'm understanding this output properly, object 4 is composed of 128KB 
blocks with a total size of 408MB, meaning that it uses 3264 blocks.  Can 
someone confirm (or correct) that assumption? Also, I note that each object  
(as far as my limited testing has shown) has a single block size with no 
internal variation.

Interestingly, all of my zvols seem to use fixed size blocks - that is, there 
is no variation in the block sizes - they're all the size defined on creation 
with no dynamic block sizes being used. I previously thought that the -b option 
set the maximum size, rather than fixing all blocks.  Learned something today 
:-)

# zdb -ddbb siovale/testvol
Dataset siovale/testvol [ZVOL], ID 45, cr_txg 4717890, 23.9K, 2 objects

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  21.0K16K6.25  DMU dnode
 1116K64K  064K0.00  zvol object
 2116K512  1.50K512  100.00  zvol prop

# zdb -ddbb siovale/tm-media
Dataset siovale/tm-media [ZVOL], ID 706, cr_txg 4426997, 240G, 2 objects

ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, 
flags 0x0

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  21.0K16K6.25  DMU dnode
 1516K 8K   240G   250G   97.33  zvol object
 2116K512  1.50K512  100.00  zvol prop

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup memory overhead

2010-01-21 Thread Richard Elling
On Jan 21, 2010, at 8:04 AM, erik.ableson wrote:

 Hi all,
 
 I'm going to be trying out some tests using b130 for dedup on a server with 
 about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks).  What 
 I'm trying to get a handle on is how to estimate the memory overhead required 
 for dedup on that amount of storage.  From what I gather, the dedup hash keys 
 are held in ARC and L2ARC and as such are in competition for the available 
 memory.

... and written to disk, of course.

For ARC sizing, more is always better.

 So the question is how much memory or L2ARC would be necessary to ensure that 
 I'm never going back to disk to read out the hash keys. Better yet would be 
 some kind of algorithm for calculating the overhead. eg - averaged block size 
 of 4K = a hash key for every 4k stored and a hash occupies 256 bits. An 
 associated question is then how does the ARC handle competition between hash 
 keys and regular ARC functions?

AFAIK, there is no special treatment given to the DDT. The DDT is stored like
other metadata and (currently) not easily accounted for.

Also the DDT keys are 320 bits. The key itself includes the logical and physical
block size and compression. The DDT entry is even larger.

I think it is better to think of the ARC as caching the uncompressed DDT
blocks which were written to disk.  The number of these will be data dependent.
zdb -S poolname will give you an idea of the number of blocks and how well
dedup will work on your data, but that means you already have the data in a
pool.
 -- richard


 Based on these estimations, I think that I should be able to calculate the 
 following:
 1,7   TB
 1740,8GB
 1782579,2 MB
 1825361100,8  KB
 4 average block size
 456340275,2   blocks
 256   hash key size-bits
 1,16823E+11   hash key overhead - bits
 1460206,4 hash key size-bytes
 14260633,6hash key size-KB
 13926,4   hash key size-MB
 13,6  hash key overhead-GB
 
 Of course the big question on this will be the average block size - or better 
 yet - to be able to analyze an existing datastore to see just how many blocks 
 it uses and what is the current distribution of different block sizes. I'm 
 currently playing around with zdb with mixed success  on extracting this kind 
 of data. That's also a worst case scenario since it's counting really small 
 blocks and using 100% of available storage - highly unlikely. 
 
 # zdb -ddbb siovale/iphone
 Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects
 
ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, 
 flags 0x0
 
Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  57.0K64K   77.34  DMU dnode
 1116K 1K  1.50K 1K  100.00  ZFS master node
 2116K512  1.50K512  100.00  ZFS delete queue
 3216K16K  18.0K32K  100.00  ZFS directory
 4316K   128K   408M   408M  100.00  ZFS plain file
 5116K16K  3.00K16K  100.00  FUID table
 6116K 4K  4.50K 4K  100.00  ZFS plain file
 7116K  6.50K  6.50K  6.50K  100.00  ZFS plain file
 8316K   128K   952M   952M  100.00  ZFS plain file
 9316K   128K   912M   912M  100.00  ZFS plain file
10316K   128K   695M   695M  100.00  ZFS plain file
11316K   128K   914M   914M  100.00  ZFS plain file
 
 Now, if I'm understanding this output properly, object 4 is composed of 128KB 
 blocks with a total size of 408MB, meaning that it uses 3264 blocks.  Can 
 someone confirm (or correct) that assumption? Also, I note that each object  
 (as far as my limited testing has shown) has a single block size with no 
 internal variation.
 
 Interestingly, all of my zvols seem to use fixed size blocks - that is, there 
 is no variation in the block sizes - they're all the size defined on creation 
 with no dynamic block sizes being used. I previously thought that the -b 
 option set the maximum size, rather than fixing all blocks.  Learned 
 something today :-)
 
 # zdb -ddbb siovale/testvol
 Dataset siovale/testvol [ZVOL], ID 45, cr_txg 4717890, 23.9K, 2 objects
 
Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  21.0K16K6.25  DMU dnode
 1116K64K  064K0.00  zvol object
 2116K512  1.50K512  100.00  zvol prop
 
 # zdb -ddbb siovale/tm-media
 Dataset siovale/tm-media [ZVOL], ID 706, cr_txg 4426997, 240G, 2 objects
 
ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, 
 flags 0x0
 
Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  21.0K16K6.25  DMU dnode
 1516K 8K   240G   250G   97.33  zvol object
 2116K512  1.50K512  100.00  zvol prop
 
 ___
 

Re: [zfs-discuss] Dedup memory overhead

2010-01-21 Thread Andrey Kuzmin
On Thu, Jan 21, 2010 at 10:00 PM, Richard Elling
richard.ell...@gmail.com wrote:
 On Jan 21, 2010, at 8:04 AM, erik.ableson wrote:

 Hi all,

 I'm going to be trying out some tests using b130 for dedup on a server with 
 about 1,7Tb of useable storage (14x146 in two raidz vdevs of 7 disks).  What 
 I'm trying to get a handle on is how to estimate the memory overhead 
 required for dedup on that amount of storage.  From what I gather, the dedup 
 hash keys are held in ARC and L2ARC and as such are in competition for the 
 available memory.

 ... and written to disk, of course.

 For ARC sizing, more is always better.

 So the question is how much memory or L2ARC would be necessary to ensure 
 that I'm never going back to disk to read out the hash keys. Better yet 
 would be some kind of algorithm for calculating the overhead. eg - averaged 
 block size of 4K = a hash key for every 4k stored and a hash occupies 256 
 bits. An associated question is then how does the ARC handle competition 
 between hash keys and regular ARC functions?

 AFAIK, there is no special treatment given to the DDT. The DDT is stored like
 other metadata and (currently) not easily accounted for.

 Also the DDT keys are 320 bits. The key itself includes the logical and 
 physical
 block size and compression. The DDT entry is even larger.

Looking at dedupe code, I noticed that on-disk DDT entries are
compressed less efficiently than possible: key is not compressed at
all (I'd expect roughly 2:1 compression ration with sha256 data),
while other entry data is currently passed through zle compressor only
(I'd expect this one to be less efficient than off-the-shelf
compressors, feel free to correct me if I'm wrong). Is this v1, going
to be improved in the future?

Further, with huge dedupe memory footprint and heavy performance
impact when DDT entries need to be read from disk, it might be
worthwhile to consider compression of in-core ddt entries
(specifically for DDTs or, more generally, making ARC/L2ARC
compression-aware). Has this been considered?

Regards,
Andrey


 I think it is better to think of the ARC as caching the uncompressed DDT
 blocks which were written to disk.  The number of these will be data 
 dependent.
 zdb -S poolname will give you an idea of the number of blocks and how well
 dedup will work on your data, but that means you already have the data in a
 pool.
  -- richard


 Based on these estimations, I think that I should be able to calculate the 
 following:
 1,7   TB
 1740,8        GB
 1782579,2     MB
 1825361100,8  KB
 4     average block size
 456340275,2   blocks
 256   hash key size-bits
 1,16823E+11   hash key overhead - bits
 1460206,4 hash key size-bytes
 14260633,6    hash key size-KB
 13926,4       hash key size-MB
 13,6  hash key overhead-GB

 Of course the big question on this will be the average block size - or 
 better yet - to be able to analyze an existing datastore to see just how 
 many blocks it uses and what is the current distribution of different block 
 sizes. I'm currently playing around with zdb with mixed success  on 
 extracting this kind of data. That's also a worst case scenario since it's 
 counting really small blocks and using 100% of available storage - highly 
 unlikely.

 # zdb -ddbb siovale/iphone
 Dataset siovale/iphone [ZPL], ID 2381, cr_txg 3764691, 44.6G, 99 objects

    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, 
 flags 0x0

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         0    7    16K    16K  57.0K    64K   77.34  DMU dnode
         1    1    16K     1K  1.50K     1K  100.00  ZFS master node
         2    1    16K    512  1.50K    512  100.00  ZFS delete queue
         3    2    16K    16K  18.0K    32K  100.00  ZFS directory
         4    3    16K   128K   408M   408M  100.00  ZFS plain file
         5    1    16K    16K  3.00K    16K  100.00  FUID table
         6    1    16K     4K  4.50K     4K  100.00  ZFS plain file
         7    1    16K  6.50K  6.50K  6.50K  100.00  ZFS plain file
         8    3    16K   128K   952M   952M  100.00  ZFS plain file
         9    3    16K   128K   912M   912M  100.00  ZFS plain file
        10    3    16K   128K   695M   695M  100.00  ZFS plain file
        11    3    16K   128K   914M   914M  100.00  ZFS plain file

 Now, if I'm understanding this output properly, object 4 is composed of 
 128KB blocks with a total size of 408MB, meaning that it uses 3264 blocks.  
 Can someone confirm (or correct) that assumption? Also, I note that each 
 object  (as far as my limited testing has shown) has a single block size 
 with no internal variation.

 Interestingly, all of my zvols seem to use fixed size blocks - that is, 
 there is no variation in the block sizes - they're all the size defined on 
 creation with no dynamic block sizes being used. I previously thought that 
 the -b option set the maximum size, rather than fixing all blocks.  Learned 
 something today :-)

 # zdb -ddbb 

Re: [zfs-discuss] Dedup memory overhead

2010-01-21 Thread Daniel Carosone
On Thu, Jan 21, 2010 at 05:04:51PM +0100, erik.ableson wrote:

 What I'm trying to get a handle on is how to estimate the memory
 overhead required for dedup on that amount of storage.   

We'd all appreciate better visibility of this. This requires:
 - time and observation and experience, and
 - better observability tools and (probably) data exposed for them

 So the question is how much memory or L2ARC would be necessary to
 ensure that I'm never going back to disk to read out the hash keys. 

I think that's a wrong-goal for optimisation.

For performance (rather than space) issues, I look at dedup as simply
increasing the size of the working set, with a goal of reducing the
amount of IO (avoided duplicate writes) in return.

If saving one large async write costs several small sync reads, you
fall off a very steep performance cliff, especially for IOPS-limited
seeking media. However, it doesn't matter whether those reads are for
DDT entries or other filesystem metadata necessary to complete the
write. Nor does it even matter if those reads are data reads, for
other processes that have been pushed out of ARC because of the larger
working set.  So I think it's right that arc doesn't treat DDT entries
specially.

The trouble is that the hash function produces (we can assume) random
hits across the DDT, so the working set depends on the amount of
data and the rate of potentially dedupable writes as well as the
actual dedup hit ratio.  A high rate of writes also means a large
amount of data in ARC waiting to be written at the same time. This
makes analysis very hard (and pushes you very fast towards that very
steep cliff, as we've all seen). 

Separately, what might help is something like dedup=opportunistic
that would keep the working set smaller:
 - dedup the block IFF the DDT entry is already in (l2)arc
 - otherwise, just write another copy
 - maybe some future async dedup cleaner, using bp-rewrite, to tidy
   up later.
I'm not sure what, in this scheme, would ever bring DDT entries into
cache, though.  Reads for previously dedup'd data?

I also think a threshold on the size of blocks to try deduping would
help.  If I only dedup blocks (say) 64k and larger, i might well get
most of the space benefit for much less overhead.

--
Dan.

pgpfZ1iTPb0nB.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup memory overhead

2010-01-21 Thread Daniel Carosone
On Fri, Jan 22, 2010 at 08:55:16AM +1100, Daniel Carosone wrote:
 For performance (rather than space) issues, I look at dedup as simply
 increasing the size of the working set, with a goal of reducing the
 amount of IO (avoided duplicate writes) in return.

I should add and avoided future duplicate reads in those parentheses
as well. 

A CVS checkout, with identical CVS/Root files in every directory, is a
great example. Every one of those files is read on cvs update.
Developers often have multiple checkouts (different branches) from the
same server. Good performance gains can be had by avoiding potentially
many thousands of extra reads and cache entries, whether with dedup or
simply by hardlinking them all together.   I've hit the 64k limit on
hardlinks to the one file more than once with this, on bsd FFS.

It's not a great example for my suggestion of a threshold lower
blocksize for dedup, however :-/

--
Dan.



pgpleAwmVO8zb.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup memory overhead

2010-01-21 Thread Mike Gerdts
On Thu, Jan 21, 2010 at 2:51 PM, Andrey Kuzmin
andrey.v.kuz...@gmail.com wrote:
 Looking at dedupe code, I noticed that on-disk DDT entries are
 compressed less efficiently than possible: key is not compressed at
 all (I'd expect roughly 2:1 compression ration with sha256 data),

A cryptographic hash such as sha256 should not be compressible.  A
trivial example shows this to be the case:

for i in {1..1} ; do
echo $i | openssl dgst -sha256 -binary
done  /tmp/sha256

$ gzip -c sha256 sha256.gz
$ compress -c sha256 sha256.Z
$ bzip2 -c sha256 sha256.bz2

$ ls -go sha256*
-rw-r--r--   1  32 Jan 22 04:13 sha256
-rw-r--r--   1  428411 Jan 22 04:14 sha256.Z
-rw-r--r--   1  321846 Jan 22 04:14 sha256.bz2
-rw-r--r--   1  320068 Jan 22 04:14 sha256.gz

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss