For a while now I have been planning a project for a distributed database that integrates every kind of data. I have primarily been focusing my effort on the database schema aspect of it, trying to find more sophisticated ways of structuring data than the concepts of traditional relational databases offer. I was not spending much time pursuing information and ideas about the network infrastructure of the system; although, what I had in mind of that regard was something like Freenet. Recently I became familiar with the Freenet Project, and I am happy to see such a project in existence. I share the same core values with which it is being built.
Although, whereas the Freenet Project aims at creating a simple document storage and retrieval system, my project aims at created a distributed database. There are many things that the latter could do that the former cannot. As a first example, you can imagine the kind of database applications you use over the web - things like apartment finding, personal ads, restaurant and entertainment info, etc.. The major way in which this project can be an advancement over the web is to, at a certain layer of abstraction, disassociate data from its location on the network. So when you want to find a particular piece of data, you simply ask for that data. You don't have to figure out what website to look on, or go through pages and pages of Google search results, or deal with different websites which all have unique, unfamiliar user-interfaces. As for inserting data, you simply insert it and it will reach individuals interested in that kind of data. At the layer of abstraction the common user will deal with on a daily bases, the data is simply "on the network"; it is not at a particular location on the network. As for e-commerce, companies can have their inventory in this distributed database freenet. Orders can be put into the freenet and propagated back to the companies via the general mechanisms which are part of the network infrastructure. E-mail would be defined via a simple database schema (To, From, Message Body, etc.) and propagate from sender to recipient via the same general mechanisms of the network infrastructure. By only a tiny stretch of my original idea, the project can become a more general case of a document storage and retrieval system. As a first approximation you could think of it as a multi-media database; that is, a database in which some of the entries are files. In this case, you can imagine the example of the data associated with a particular publication also including the publication itself (e.g. a file in PDF format). In finding how my project and the Freenet Project might come together, I came up with four "layers of abstraction" that describe such a system. It is by no means a complete description of an architecture. However, depending on what other people think of it, it may be a starting point for further discussion. I briefly read about "emergent systems" in a biology text book and find it very interesting. It seems to me that an "emergent system" is not a physically existing thing; rather, it is a tool of description for painting a "big picture" of a very complex system. I see the five/seven layers of network protocols as employing the same general method of description. Though, the latter case is slightly different in that we are creating the system itself, not just a description of it. -- The first layer I will talk about is the lowest level protocols that can be used to communicate via the internet; that is, IP, UDP, and TCP. This is the basic means available to us for implementing something more desirable and complex on top of. If we had the means of redefining this layer to suit the values of Freenet, that would be ideal. But I will assume that we have not the persuasive power to effect the make-up of this layer. Layer 1 has some properties that we, as believers in the Freenet cause, don't like; namely, IP addresses and the potential for eavesdroppers. Layer 2 is here to deal with those properties. Layer 2 provides some corrective measures to layer 1 in order to conceal the physical location of peers from each other, and to make an eavesdropper's success extremely unlikely. Layer 2 exists as to be a precursor to layer 3. That is, with the problems of anonymous message exchange and secure communication channels being dealt with by layer 2, layer 3 can focus on its more high-level job. So, the end result of creating layer 2 is a more ideal version of the layer 1 network, on top of which we are in a better situation for creating layer 3. The problem of creating layer 2 could basically be stated like this: Create an environment in which messages can be passed from sender to receiver such that neither party has knowledge of the other's physical location; additionally, there is little or no chance of the success of an eavesdropper gaining information about who is communicating or the contents of the message being sent. From this arises the concept of a physical-location-independent identifier which machines use in order to talk to each other. (Note that a single machine could have more than one location-independent identifier if it wants.) Above are the design goals of layer 2. I am not an expert in the area, so I am hesitant to share the rough idea I have of how they could be achieved. But from what I read of anonymous remailers, it seems like you could use the same basic notions here. That is, a node on the network communicates via anonymous remailer-type things. Certain configuration information would have to be set for a networked node, such as what anonymous remailers are trustworthy for what types of data, and reliableness. So then the two participants in a message exchange each have a "line of defense": their list of anonymous remailer-type things through which the message passes. (The configuration information itself could be shared between users at layer 4.) -- Layer 3 is the distributed-database layer. It provides the functionality for inserting, updating, and retrieving data distributed across the network. Its goal is to make data independent from physical location for the sake of layer 4, while managing the complexities of a distributed-database. Knowledge of where data on the network is located has to be maintained in some hierarchical fashion as it is in DNS. Although, it will have to differ from DNS, which has "authoritative" sources of data. Instead, data will be spread across the network at different locations. Inserts and updates of data will propagate to all those locations. Requests for data may be fulfilled by any of those locations. Our goal is to design a system in which a request for a piece of data will result in the most up-to-date as possible data. So, we want routing which is (a) complete - all subscribed nodes are informed of inserts and updates, and (b) efficient - inserts and updates reach all subscribed nodes as quickly as possible. -- While layer 4 "knows" what data it wants to receive, or what it wants to insert or update, it doesn't itself "know" how to get it. It passes those requests onto layer 3 as if it were a single database. As a first approximation, you can imagine layer 4 passing SQL-like commands to layer 3. _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. _______________________________________________ freenet-tech mailing list [EMAIL PROTECTED] http://lists.freenetproject.org/mailman/listinfo/tech
