On Jul 31, 2012, at 8:05 PM, opensolarisisdeadlongliveopensolaris wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Richard Elling
>> I believe what you meant to say was "dedup with HDDs sux." If you had
>> used fast SSDs instead of HDDs, you will find dedup to be quite fast.
>> -- richard
> Yes, but this is a linear scale.
No, it is definitely NOT a linear scale. Study Amdahl's law a little more
> Suppose an SSD without dedup is 100x faster than a HDD without dedup. And
> suppose dedup slows down a system by a factor of 10x. Now your SSD with
> dedup is only 10x faster than the HDD without dedup. So "quite fast" is a
> relative term.
Of course it is.
> The SSD with dedup is still faster than the HDD without dedup, but it's also
> slower than the SSD without dedup.
duh. With dedup you are trading IOPS for space. In general, HDDs have lots of
terrible IOPS. SSDs have less space, but more IOPS. Obviously, as you point
out, the best
solution is lots of space and lots of IOPS.
> The extent of fibbing I'm doing is thusly: In reality, an SSD is about
> equally fast with HDD for sequential operations, and about 100x faster for
> random IO. It just so happens that the dedup performance hit is almost
> purely random IO, so it's right in the sweet spot of what SSD's handle well.
In the vast majority of modern systems, there are no sequential I/O workloads.
That is a myth
propagated by people who still think HDDs can be fast.
> You can't use an overly simplified linear model like I described above - In
> reality, there's a grain of truth in what Richard said, and also a grain of
> truth in what I said. The real truth is somewhere in between what he said
> and what I said.
But closer to my truth :-)
> No, the SSD will not perform as well with dedup as it does without dedup.
> But the "suppose dedup slows down by 10x" that I described above is not
> accurate. Depending on what you're doing, dedup might slow down an HDD by
> 20x, and it might only slow down SSD by 4x doing the same work load. Highly
> variable, and highly dependent on the specifics of your workload.
You are making the assumption that the system is not bandwidth limited. This is
good assumption for the HDD case, because the media bandwidth is much less
than the interconnect bandwidth. For SSDs, this assumption is not necessarily
There are SSDs that are bandwidth constrained on the interconnect, and in those
cases, your model fails.
ZFS Performance and Training
zfs-discuss mailing list