On Dec 11, 2011 5:12 AM, "Nathan Kroenert" <nat...@tuneunix.com> wrote:
> On 12/11/11 01:05 AM, Pawel Jakub Dawidek wrote:
>> On Wed, Dec 07, 2011 at 10:48:43PM +0200, Mertol Ozyoney wrote:
>>> Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.
>>> The only vendor i know that can do this is Netapp
>> And you really work at Oracle?:)
>> The answer is definiately yes. ARC caches on-disk blocks and dedup just
>> reference those blocks. When you read dedup code is not involved at all.
>> Let me show it to you with simple test:
>> Create a file (dedup is on):
>> # dd if=/dev/random of=/foo/a bs=1m count=1024
>> Copy this file so that it is deduped:
>> # dd if=/foo/a of=/foo/b bs=1m
>> Export the pool so all cache is removed and reimport it:
>> # zpool export foo
>> # zpool import foo
>> Now let's read one file:
>> # dd if=/foo/a of=/dev/null bs=1m
>> 1073741824 bytes transferred in 10.855750 secs (98909962
>> We read file 'a' and all its blocks are in cache now. The 'b' file
>> shares all the same blocks, so if ARC caches blocks only once, reading
>> 'b' should be much faster:
>> # dd if=/foo/b of=/dev/null bs=1m
>> 1073741824 bytes transferred in 0.870501 secs (1233475634
>> Now look at it, 'b' was read 12.5 times faster than 'a' with no disk
>> activity. Magic?:)
> Hey all,
> That reminds me of something I have been wondering about... Why only 12x
faster? If we are effectively reading from memory - as compared to a disk
reading at approximately 100MB/s (which is about an average PC HDD reading
sequentially), I'd have thought it should be a lot faster than 12x.
> Can we really only pull stuff from cache at only a little over one
gigabyte per second if it's dedup data?
The second file may gave the same data, but not the same metadata -the
inode number at least must be different- so the znode for it must get read
in, and that will slow reading the copy down a bit.
zfs-discuss mailing list