On Fri, Apr 22, 2011 at 10:12 PM, Matthew Dillon <dil...@apollo.backplane.com> wrote: > > :Hi all, > : > :can someone compare/describe need of RAM size by deduplication in > :Hammer? There's something interesting about deduplication in ZFS > :http://openindiana.org/pipermail/openindiana-discuss/2011-April/003574.html > : > :Thx > > The ram is basically needed to store matching CRCs. The on-line dedup > uses a limited fixed-sized hash table to remember CRCs, designed to > match recently read data with future written data (e.g. 'cp'). > > The off-line dedup (when you run 'hammer dedup ...' or > 'hammer dedup-simulate ...' will keep track of ALL data CRCs when > it scans the filesystem B-Tree. It will happily use lots of swap > space if it comes down to it, which is probably a bug. But that's > how it works now. > > Actual file data is not persistently cached in memory. It is read only > when the dedup locates a potential match and sticks around in a limited > cache before getting thrown away, and will be re-read as needed.
Their discussion continues and they talk about rule 1 - 3GB of RAM per 1TB of data. Regarding this http://blogs.sun.com/roch/entry/dedup_performance_considerations1 it looks like those data are persistent as cache in memory. So is this a reason for higher RAM usage with ZFS and dedup when comparing with Hammer? > > -Matt > Matthew Dillon > <dil...@backplane.com> >