Re: Hammer deduplication needs for RAM size

2011-04-23 Thread Tomas Bodzar
On Fri, Apr 22, 2011 at 10:12 PM, Matthew Dillon
dil...@apollo.backplane.com wrote:

 :Hi all,
 :
 :can someone compare/describe need of RAM size by deduplication in
 :Hammer? There's something interesting about deduplication in ZFS
 :http://openindiana.org/pipermail/openindiana-discuss/2011-April/003574.html
 :
 :Thx

    The ram is basically needed to store matching CRCs.  The on-line dedup
    uses a limited fixed-sized hash table to remember CRCs, designed to
    match recently read data with future written data (e.g. 'cp').

    The off-line dedup (when you run 'hammer dedup ...' or
    'hammer dedup-simulate ...' will keep track of ALL data CRCs when
    it scans the filesystem B-Tree.  It will happily use lots of swap
    space if it comes down to it, which is probably a bug.  But that's
    how it works now.

    Actual file data is not persistently cached in memory.  It is read only
    when the dedup locates a potential match and sticks around in a limited
    cache before getting thrown away, and will be re-read as needed.

Their discussion continues and they talk about rule 1 - 3GB of RAM per
1TB of data. Regarding this
http://blogs.sun.com/roch/entry/dedup_performance_considerations1 it
looks like those data are persistent as cache in memory. So is this a
reason for higher RAM usage with ZFS and dedup when comparing with
Hammer?


                                        -Matt
                                        Matthew Dillon
                                        dil...@backplane.com




Re: Hammer deduplication needs for RAM size

2011-04-22 Thread Venkatesh Srinivas
I deduped a dataset that from ~600G - 396G on a system with 256MB of
physical RAM and a 32GB swapcache. Peak Virt size of 'hammer dedup'
was in the 700MB range. double_buffer was on. Performance was pretty
reasonable and the system was plenty usable the whole time. Don't
remember how long it took, though.

-- vs


Re: Hammer deduplication needs for RAM size

2011-04-22 Thread Matthew Dillon

:Hi all,
:
:can someone compare/describe need of RAM size by deduplication in
:Hammer? There's something interesting about deduplication in ZFS
:http://openindiana.org/pipermail/openindiana-discuss/2011-April/003574.html
:
:Thx

The ram is basically needed to store matching CRCs.  The on-line dedup
uses a limited fixed-sized hash table to remember CRCs, designed to
match recently read data with future written data (e.g. 'cp').

The off-line dedup (when you run 'hammer dedup ...' or
'hammer dedup-simulate ...' will keep track of ALL data CRCs when
it scans the filesystem B-Tree.  It will happily use lots of swap
space if it comes down to it, which is probably a bug.  But that's
how it works now.

Actual file data is not persistently cached in memory.  It is read only
when the dedup locates a potential match and sticks around in a limited
cache before getting thrown away, and will be re-read as needed.

-Matt
Matthew Dillon 
dil...@backplane.com


Hammer deduplication needs for RAM size

2011-04-21 Thread Tomas Bodzar
Hi all,

can someone compare/describe need of RAM size by deduplication in
Hammer? There's something interesting about deduplication in ZFS
http://openindiana.org/pipermail/openindiana-discuss/2011-April/003574.html

Thx