david wrote: > > Wow, the project looks really great! I'm curious why you chose to use > Merkle trees, however, instead of a simple hash list. What's the advantage > of the additional complexity?
Thanks! The reason for the Merkle Tree is so that you can verify the correctness of the first part of the file sooner. We call this "alacrity" -- the elapsed time between deciding you want to download a file and being able to read the first few bits of cleartext. With a simple hash list and a very large file, you have to download a very long hash list (or else a short hash list and a very large block to check if it matches the first hash in the list) before you can safely use the first byte of the data. Brian Warner wrote a script [1] to calculate the overhead (bytes used for verification metadata divided by bytes used for data) and alacrity (bytes downloaded before being able to use the first byte of cleartext) for the hash-list and Merkle Tree options. See appended output from that script. It turns out that this might be useful not only for people who are impatient to use the first part of a file that they are downloading, but also for probabilistic verifiers who want to sample only one or a few blocks instead of downloading the whole thing. By the way, here is a document with pictures describing how this part of Tahoe works: [2]. Regards, Zooko [1] http://allmydata.org/trac/tahoe/browser/misc/sizes.py [2] http://allmydata.org/trac/tahoe/wiki/TahoeIssues/FileEncoding In the following, k is the arity of the hash tree, and d is the depth of it. So the hash list approach is where d == 1. ------- begin appended output from misc/sizes.py $ python -OOu ./misc/sizes.py --mode beta mode=beta # k=num_subblocks, d=1 # each subblock has a 20-byte hash storage storage Size blocksize overhead overhead k d alacrity (bytes) (%) ------- ------- -------- -------- ---- -- -------- 4 0.16 1.95k 50000.00 1 1 0.16 16 0.64 1.95k 12500.00 1 1 0.64 64 2.56 1.95k 3125.00 1 1 2.56 256 10.24 1.95k 781.25 1 1 10.24 1k 40.96 1.95k 195.31 1 1 40.96 4k 163.84 1.95k 48.83 1 1 163.84 16k 655.36 1.95k 12.21 1 1 655.36 64k 2.56k 1.95k 3.05 1 1 2.56k 256k 10.24k 1.95k 0.76 1 1 10.24k 1M 40.96k 1.95k 0.19 1 1 40.96k 4M 163.84k 3.91k 0.10 2 1 81.94k 16M 655.36k 15.62k 0.10 8 1 82.06k 64M 2.56M 62.50k 0.10 32 1 82.53k 256M 10.24M 250 k 0.10 128 1 84.40k 1G 40.96M 1000 k 0.10 512 1 91.90k 4G 163.84M 3.91M 0.10 2048 1 121.90k 16G 655.36M 15.62M 0.10 8192 1 241.90k 64G 2.56G 62.50M 0.10 32768 1 721.90k 256G 10.24G 250 M 0.10 131072 1 2.58M 1T 40.96G 1000 M 0.10 524288 1 10.08M $ python -OOu ./misc/sizes.py --mode gamma mode=gamma # self.subblock_arity = k = arity storage storage Size blocksize overhead overhead k d alacrity (bytes) (%) ------- ------- -------- -------- ---- -- -------- 4 0.16 0 0.00 2 0 0.16 16 0.64 0 0.00 2 0 0.64 64 2.56 0 0.00 2 0 2.56 256 10.24 0 0.00 2 0 10.24 1k 40.96 0 0.00 2 0 40.96 4k 163.84 0 0.00 2 0 163.84 16k 655.36 0 0.00 2 0 655.36 64k 2.56k 0 0.00 2 0 2.56k 256k 10.24k 0 0.00 2 0 10.24k 1M 40.96k 0 0.00 2 0 40.96k 4M 163.84k 3.91k 0.10 2 1 81.94k 16M 655.36k 27.34k 0.17 2 3 81.98k 64M 2.56M 121.09k 0.18 2 5 82.02k 256M 10.24M 496.09k 0.19 2 7 82.06k 1G 40.96M 1.95M 0.19 2 9 82.10k 4G 163.84M 7.81M 0.19 2 11 82.13k 16G 655.36M 31.25M 0.19 2 13 82.17k 64G 2.56G 125 M 0.19 2 15 82.21k 256G 10.24G 500 M 0.19 2 17 82.25k 1T 40.96G 1.95G 0.19 2 19 82.29k _______________________________________________ p2p-hackers mailing list [email protected] http://lists.zooko.com/mailman/listinfo/p2p-hackers
