On 12/11/11 01:05 AM, Pawel Jakub Dawidek wrote:
On Wed, Dec 07, 2011 at 10:48:43PM +0200, Mertol Ozyoney wrote:
Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.
The only vendor i know that can do this is Netapp
And you really work at Oracle?:)
The answer is definiately yes. ARC caches on-disk blocks and dedup just
reference those blocks. When you read dedup code is not involved at all.
Let me show it to you with simple test:
Create a file (dedup is on):
# dd if=/dev/random of=/foo/a bs=1m count=1024
Copy this file so that it is deduped:
# dd if=/foo/a of=/foo/b bs=1m
Export the pool so all cache is removed and reimport it:
# zpool export foo
# zpool import foo
Now let's read one file:
# dd if=/foo/a of=/dev/null bs=1m
1073741824 bytes transferred in 10.855750 secs (98909962 bytes/sec)
We read file 'a' and all its blocks are in cache now. The 'b' file
shares all the same blocks, so if ARC caches blocks only once, reading
'b' should be much faster:
# dd if=/foo/b of=/dev/null bs=1m
1073741824 bytes transferred in 0.870501 secs (1233475634 bytes/sec)
Now look at it, 'b' was read 12.5 times faster than 'a' with no disk
That reminds me of something I have been wondering about... Why only 12x
faster? If we are effectively reading from memory - as compared to a
disk reading at approximately 100MB/s (which is about an average PC HDD
reading sequentially), I'd have thought it should be a lot faster than 12x.
Can we really only pull stuff from cache at only a little over one
gigabyte per second if it's dedup data?
zfs-discuss mailing list