Dear Jimmy and Terrell: Very interesting! I hope you proceed with this.
I would like to second Terrell's suggestion about performance: "try it and see". There are some detailed resources about performance, but I don't advise that you look at these, *yet*: * https://tahoe-lafs.org/trac/tahoe-lafs/wiki/Performance * https://tahoe-lafs.org/trac/tahoe-lafs/ticket/932# benchmark Tahoe-LAFS compared to nosql dbs * https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/docs/performance.rst Keep in mind that Tahoe-LAFS itself may get performance improvements in future releases. Most of the performance problems are not "deep" slowdowns caused by some unique Tahoe-LAFS feature, but rather "shallow" slowdowns, such as the client unnecessarily pausing between sending successive network requests. The new MDMF format that Kevan wrote accidentally came in at about three times as fast as the current immutable (CHK) format on a LAN. That probably indicates that someone could optimize the CHK format to be *more* than three times as fast as it is now, without having to change the format. Since performance optimization is one of those things that the open source community is especially good at -- the goals are relatively easy for everyone to agree on, and it is fun and rewarding -- I'm hopeful that someone will volunteer to work on that. There is one simple strategy you can take when you are doing this to avoid unnecessary performance problems: don't store things under a relative pathname, like $DIRCAP/foo/bar, or $DIRCAP/foo, when you can instead store those things directly under their cap. So for example, uploading a file and getting back its immutable (CHK) cap and then using that cap to later download that file will always be somewhat faster than uploading a file *into a directory* and then using the cap to the directory plus the filename to download that file. It is easy to understand why: the latter sort of upload -- the sort that uploads the file into a directory under a filename -- is implemented by first doing the former sort of upload -- uploading the file to an immutable cap -- and then downloading the current version of the directory, rewriting it to add the presence of the new child, and then uploading a new version of that directory. This is obviously always slower than just doing the first step, and it can be a *lot* slower if it is an SDMF-format directory and there are many child entries in it. I think that was the performance issue that I brought up with respect to git-annex -- that it was storing things under their filenames within a directory and I wondered if it could be changed to store things under their cap. Regards, Zooko _______________________________________________ tahoe-dev mailing list [email protected] http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
