> From: Matthew Ahrens [mailto:mahr...@delphix.com]
> Sent: Wednesday, May 25, 2011 6:50 PM
> 
> The DDT is a ZAP object, so it is an on-disk hashtable, free of O(log(n))
> rebalancing operations.  It is written asynchronously, from syncing
> context.  That said, for each block written (unique or not), the DDT must
be
> updated, which means reading and then writing the block that contains that
> dedup table entry, and the indirect blocks to get to it.  With a
reasonably
> large DDT, I would expect about 1 write to the DDT for every block written
to
> the pool (or "written" but actually dedup'd).

So ... If the DDT were already cached completely in ARC, and I write a new
unique block to a file, ideally I would hope (after write buffering because
all of this will be async) that one write will be completed to disk - It
would be the aggregate of the new block plus the new DDT entry, but because
of write aggregation it should literally be a single seek+latency penalty. 

Most likely in reality, additional writes will be necessary, to update the
parent block pointers or parent DDT branches and so forth, but hopefully
that's all managed well and kept to a minimum.  So maybe a single new write
ultimately yields a dozen times the disk access time...

I'm honing this in closer, but so far what I'm seeing is ... zpool iostat
indicates 1000 reads taking place for every 20 writes.  This is on a
literally 100% idle pool, where the only activity in the system is me
performing this write benchmark.  The only logical explanation I see for
this behavior is to conclude the DDT must not be cached in ARC.  So every
write yields a flurry of random reads...  50 or so...

Anyway, like I said, still exploring this.  No conclusions drawn yet.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to