[EMAIL PROTECTED] wrote:
I have been looking at zfs source trying to get up to speed on the internals. One thing that interests me about the fs is what appears to be a low hanging fruit for block squishing CAS (Content Addressable Storage). I think that in addition to lzjb compression, squishing blocks that contain the same data would buy a lot of space for administrators working in many common workflows. I am writing to see if I can get some feedback from people that know the code better than I -- are there any gotchas in my logic? Assumptions: SHA256 hash used (Fletcher2/4 have too many collisions, SHA256 is 2^128 if I remember correctly) SHA256 hash is taken on the data portion of the block as it exists on disk. the metadata structure is hashed separately. In the current metadata structure, there is a reserved bit portion to be used in the future. Description of change: Creates: The filesystem goes through its normal process of writing a block, and creating the checksum. Before the step where the metadata tree is pushed, the checksum is checked against a global checksum tree to see if there is any match. If match exists, insert a metadata placeholder for the block, that references the already existing block on disk, increment a number_of_links pointer on the metadata blocks to keep track of the pointers pointing to this block. free up the new block that was written and check-summed to be used in the future. else if no match, update the checksum tree with the new checksum and continue as normal. Deletes: normal process, except verifying that the number_of_links count is lowered and if it is non zero then do not free the block. clean up checksum tree as needed. What this requires: A new flag in metadata that can tag the block as a CAS block. A checksum tree that allows easy fast lookup of checksum keys. a counter in the metadata or hash tree that tracks links back to blocks. Some additions to the userland apps to push the config/enable modes. Does this seem feasible? Are there any blocking points that I am missing or unaware of? I am just posting this for discussion, it seems very interesting to me.
Note that you'd actually have to verify that the blocks were the same; you cannot count on the hash function. If you didn't do this, anyone discovering a collision could destroy the colliding blocks/files. Val Henson wrote a paper on this topic; there's a copy here: http://infohost.nmt.edu/~val/review/hash.pdf - Bart Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss