:Hi all, : :can someone compare/describe need of RAM size by deduplication in :Hammer? There's something interesting about deduplication in ZFS :http://openindiana.org/pipermail/openindiana-discuss/2011-April/003574.html : :Thx
The ram is basically needed to store matching CRCs. The on-line dedup uses a limited fixed-sized hash table to remember CRCs, designed to match recently read data with future written data (e.g. 'cp'). The off-line dedup (when you run 'hammer dedup ...' or 'hammer dedup-simulate ...' will keep track of ALL data CRCs when it scans the filesystem B-Tree. It will happily use lots of swap space if it comes down to it, which is probably a bug. But that's how it works now. Actual file data is not persistently cached in memory. It is read only when the dedup locates a potential match and sticks around in a limited cache before getting thrown away, and will be re-read as needed. -Matt Matthew Dillon <dil...@backplane.com>