On Jan 7, 2009, at 17:07 PM, Shawn Willden wrote: > If the images are coming from a handful of 256 kbps connections > where Tahoe is bandwidth-capped to use no more than 100 kbps in > order to keep some bandwidth available for other stuff (does Tahoe > have bandwidth limiting? If not, it probably needs it), then the > aggregate data stream may be no more than a 400-500 kbps.
No it hasn't, and yes it probably does: http://allmydata.org/trac/tahoe/ticket/224 # bandwidth throttling What if the data is coming from ten different connections, each of which runs at about 100 kbps. Do you think that might be sufficiently high bandwidth for photo sharing? > And let's not even talk about HD video. Oh but I like HD video! http://testgrid.allmydata.org:3567/uri/URI%3ADIR2% 3Adjrdkfawoqihigoett4g6auz6a% 3Ajx5mplfpwexnoqff7y5e4zjus4lidm76dcuarpct7cckorh2dpgq/video/ dirac_codec-use_VLC_to_play > I'd expect a lot bigger performance issue from the erasure coding > (BTW: ever considered Tornado coding instead of Reed-Solomon?). Yes! In fact, allmydata.com once had a commercial licence from Digital Fountain to use their patented erasure codes and their proprietary implementation. That erasure code was used in the previous generation of allmydata.com's commercial product. When we decided to open source the next generation, which became Tahoe and Allmydata v3.0, we also investigated whether we could make do with an unpatented, open source erasure code, and discovered that Rizzo's classic Reed-Solomon implementation worked great -- which became zfec. Also, zfec and tahoe cheat as much as possible. The best optimization is always to cheat and not do the work at all. So, the first K shares that zfec creates are actually just the content of the file -- i.e., what you would get from running the unix "split" command, and tahoe tries to download those first K shares (what we call "primary shares") first, and the more of them that it gets, the less math that it actually has to do to reconstruct the file. If it gets all K of the primary shares then it doesn't do any math at all when downloading, it just "unsplits". :-) By the way, there is a paper coming out in FAST '09 about the performance of open source software erasure codes that measures zfec among others. > Okay, so here's a possibility. If I can ensure that K shares are > stored on my mom's machine, and if Tahoe is clever enough to use > those shares when she's browsing those files (doesn't seem > difficult), rather than pulling from the network, then perhaps > browsing my photos will be fast enough. The RS reconstruction and > the decryption shouldn't be a big deal, and neither should applying > a short sequence of forward deltas. Some performance testing is in > order. Yes! Performance testing! We have some automated performance measurements here: http://allmydata.org/trac/tahoe/wiki/Performance It it telling me that my patches to refactor immutable files last week and make them more robust against bizarre patterns of corruption have slowed down the upload and download speeds. :-( By the way, one unfortunate thing about the way that we are using rrdtool to keep those performance graphs is that we're losing history which is more than one year old. :-( > Cool. That's probably good enough that the added optimization of > avoiding the storage of common files completely isn't worth the > effort. Have you seen this thread? It might be a good project for you, as it is self-contained, requires minimal changes to the tahoe core itself, and is closely related to your idea about good backup: http://allmydata.org/pipermail/tahoe-dev/2008-September/000809.html > Okay. I grabbed the darcs repo (dang is that sloowww! Anybody for > switching to git? ;-)) and I'll start from there. I updated the instructions on http://allmydata.org/trac/tahoe/wiki/ Dev to suggest using darcs-v2 and to warn that using darcs-v1 will take tens of minutes for the initial get. I would entertain the idea of switching to git, even though I love darcs and contribute to darcs and use it all the time, solely in order to be more friendly toward potential contributors who love git. > I haven't had a chance to look through the code much yet. Is there > an overview document somewhere that covers the structure? Start here: http://allmydata.org/trac/tahoe/wiki/Doc Then update the wiki and/or submit patches making it easier for the next person who starts there to find what they are looking for. :-) Regards, Zooko --- Tahoe, the Least-Authority Filesystem -- http://allmydata.org store your data: $10/month -- http://allmydata.com/?tracking=zsig _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
