dear tahoe-dev, (first of all, i'd like to apologize for making the following questions about concepts for which I don't have a solid grasp. please point me to concepts or docs you feel I should review more in-depth before addressing these things)
I'm very interested in tahoe-lafs, from a "cypherpunk" perspective. I've successfully installed and used it in a small grid. I believe it has the design features that we were looking for (in terms of resilient, secure, self-managed infrastructure). But I feel sort of stuck thinking about how to build applications on top of it. Reading through the fs API, it makes sense, but it is still a magical black box for me. My programming background is with webapps, so I find myself mostly lost when handling concepts related to crypto + p2p (although I *think* I understand them at high level). I'm trying to change this :). Maybe I'm too used to the classical single-server mvc way of doing things. If I understand it correctly, tahoe-lafs is "only" a *distributed, secure filesystem*. By reading this thread [0] I assume that if I want to design any kind of "distributed webapp", I would need a layer on top of tahoe-lafs that would take care of: - managing the write- and read-caps. if I don't trust the server in which app is running, I should manage my own node to upload files and be responsible for the management of my own keys. - implementing ACLs: "distributing" read-caps to whoever should be able to read them. (Is this correct so far?) So I assume the logical way to have something running "on top" of tahoe would be using a traditional database in any framework (looking at the django canopy implementation for instance), and delegating the storage of *files* to the grid. ie, I upload a file to my traditional app, and it stores it on the grid, storing the caps in the database (or on another file on the grid). If I want to share my file with a friend, I share the read-cap by any means I can think of (using ostatus protocols, or xmpp, for instance). This dual single-server database-for-data, distributed fs for "media" scheme is what I had in mind. But thinking about this, I was wondering: - how could the rest of the data, ie, all or part of the app's database, be also stored in the grid (I guess the obvious answer would be "serializing and uploading to grid", and then deserializing + syncing on the read side ?) and shared only with selected end-users? - Do I need to come up with an extra communication layer to share the read-caps with "friend" nodes, or could I somehow make use of the underlying DHT? .. by other side, i've been playing a little with alternatives to rdbms, like rdf triplestores (having them on the grid with proper acls sounds really good) or document-oriented db (some mongo and couch), and being delighted with their practical advantages (schema-free) to build stuff (although again I must be lacking background to fully appreciate their implications). I say this because I was a bit confused when I discovered [1][2] that the the fs layer on tahoe-lafs is in fact build upon a distributed key-value store layer. Could this key-value store be used for other purposes than the top fs abstraction? (thinking about indexing and querying data chunks). I guess the answer might be no, being them non-human-meaningful? I came through these questions thinking again about how something like diaspora could be ported to work on top of tahoe, and again, I see some conceptual barriers from my limited webdev optic: a key-value or document store is something I can readily query and filter on the fly, while a "file" is an abstraction I have to write/read as a unit, and process before building any complex app that needs to be able to filter/sort data in a efficient way. besides, the couchapp diaspora port [3] seems very interesting by the couchdb builtin features for selective replication. I'd really like to contribute towards seing something similar based on strong crypto and distributedness, but as I explain in the lines above, I'm completely lost just by starting to think how to connect the html+js frontend to the storage grid, and which should be the role of the database in between (hmm something in the lines of what's discussed here [4]... is html/js <--> storage grid the only possible answer? ) Again, I'd like to apologize if something of the above is completely nonsense; I'd be grateful if you see any errors in my understanding and can point to anything I should assimilate before shooting this kind of questions :) thanks for your time and any thoughts, cal. [0] http://www.mail-archive.com/[email protected]/msg10782.html [1] http://events.ccc.de/congress/2009/wiki/Tahoe-LAFS_Workshop [2] http://tahoe-lafs.org/trac/tahoe-lafs/ticket/869 [3] http://github.com/maxogden/couchappspora [4] http://identi.ca/conversation/54300294#notice-54798703 _______________________________________________ tahoe-dev mailing list [email protected] http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
