Thanks for running and publishing the tests :)

A comment on your testing technique follows, though.

2011-12-29 1:14, Brad Diggs wrote:
As promised, here are the findings from my testing. I created 6
directory server instances ...

However, once I started modifying the data of the replicated directory
server topology, the caching efficiency
quickly diminished. The following table shows that the delta for each
instance increased by roughly 2GB
after only 300k of changes.

I suspect the divergence in data as seen by ZFS deduplication most
likely occurs because reduplication
occurs at the block level rather than at the byte level. When a write is
sent to one directory server instance,
the exact same write is propagated to the other 5 instances and
therefore should be considered a duplicate.
However this was not the case. There could be other reasons for the
divergence as well.

Hello, Brad,

If you tested with Sun DSEE (and I have no reason to
believe other descendants of iPlanet Directory server
would work differently under the hood), then there are
two factors hindering your block-dedup gains:

1) The data is stored in the backend BerkeleyDB binary
file. In Sun DSEE7 and/or in ZFS this could also be
compressed data. Since for ZFS you dedup unique blocks,
including same data at same offsets, it is quite unlikely
you'd get the same data often enough. For example, each
database might position same userdata blocks at different
offsets due to garbage collection or whatever other
optimisation the DB might think of, making on-disk
blocks different and undedupable.

You might look if it is possible to tune the database
to write in sector-sized -> min.block-sized (512b/4096b)
records and consistently use the same DSEE compression
(or lack thereof) - in this case you might get more same
blocks and win with dedup. But you'll likely lose with
compression, especially of the empty sparse structure
which a database initially is.

2) During replication each database actually becomes
unique. There are hidden records with "ns" prefix which
mark when the record was created and replicated, who
initiated it, etc. Timestamps in the data already
warrant uniqueness ;)

This might be an RFE for the DSEE team though - to keep
such volatile metadata separately from userdata. Then
your DS instances would more likely dedup well after
replication, and unique metadata would be stored
separately and stay unique. You might even keep it in
a different dataset with no dedup, then... :)


So, at the moment, this expectation does not hold true:
  "When a write is sent to one directory server instance,
  the exact same write is propagated to the other five
  instances and therefore should be considered a duplicate."
These writes are not exact.

//Jim Klimov

zfs-discuss mailing list

Reply via email to