I now have some actual code for DistribNet which I will post latter this week as soon as the DistribNet sourceforge project becomes active.
The only thing I have implemented is storing and retrieval of data keys. Here is the latest overview of DistribNet. I describe the details of how data keys are managed at the end. Please let me know what you think of them. I am especially interested in what you think of my choice of block sizes. DistribNet Global peer-to-peer internet file system in which anyone can tap into or add content to. Meta Goals: *) To allow anyone, possibly anonymously, to publish web sites with out having to pay to for the bandwidth for a commercial provider or having to put up with the increasingly add ridden free web sites. One should not have to worry about bandwidth considerations at all. *) Bring back the sense of community on the Internet that was once present before the internet become so commercialized. *) Serve as an efficient replacement for current file sharing networks such as Morpheus and Gnutella. *) To have the network stable and working before some Commercial company designs a propitiatory network similar to what I envision that can only be accesses via freely available but not FSF approved free license. (Possibly Impossible) Goals: *) *Really* fast lookup to find data. The worst case should be O(log(n)) and the average case should be O(1) or very close to it. *) Actually retrieving the data should also be really fast. Popular data should be sitting on the same subnet. On average it should be as fast or faster than a typical web site (such as slashdot, google, etc.). It should make effective use of the topology of the internet to to minimize network load and maximize performance. *) General searching based on keywords will be build into the protocol from the beginning. The searching faculty will be designed in such a way to make message boards trivial to implement. *) Ability to update data while keeping old revisions around so data never disappears until it is truly unwanted. No one person will have the power to delete data once it spreads throughout the network. *) Will try very hard to keep all but the most unpopular content from falling off the network. Basically before deleting a locally unpopular key it will first check if other nodes are storing the key and how popular they find the key. If not enough nodes are storing the key and there is any indication that the data may be useful at a latter date it will not delete it unless it absolutely has to. And if it does delete it it will first try uploading it to other nodes with more disk space available. *) Ability to store data indefinitely if someone is willing to provide the space for it (and being able to find that data in log(n) time). *) Extremely robust so that the only way to kill the network is to disable almost all of the nodes. The network should still function even if say 90% of it goes down. *) Extremely effect cpu-wise so that a fully functional node can run in the background and only take 1-2% of the CPU. Applications: I would like the protocol to be able to effectually support (ie with out any ugly hacks that many of the application for Freenet use) 1) Efficient Web like sites (with HTTP gateway to make browsing easy) 2) Efficient sharing of files large and small. 3) Public message forms (with IMAP gateway to make reading easy) 4) Private Email (with the message encrypted so only the intended recipient can read it, again with IMAP gateway) 5) Streaming Media 6) Online Chat (with possible IRC or similar gateway) Anti-Goals: (Also see philosophy for why I don't find these issues that important) *) Complete anonymity for the browser. I want to focus first on performance than on anonymity. In fact I plan to use extensive logging in the development versions so that I track network performance and quickly cache performance bugs. As DistribNet stabilizes anonymity will be improved at the expense of logging. The initial version will only use cryptology when absolutely necessary (for example key signing). Most communications will be done in the clear. After DistribNet stabilizes encryption will slowly be added. When I add encryption I will carefully monitor the effect it has on CPU load and if proves to be expensive I will allow it to be optional. Please note that I still wish to allow for anonymous posting of content. However, without encryption, it probably won't be as anonymous as Freenet or your GNet. *) Data in the cache will be stored in a straight forward manner. No attempt will be made to prevent the node operate from knowing what is in his own cache. Also, very little attempt will be made to prevent others from knowing what is a particular node cache. Philosophy: *) I have nothing against complete anonymity, it is just that I am afraid that both Freenet and GNet or more designed around the anonymity and privacy issues then they are around the performance and scalability issues. *) For most type of things the level of anonymity that Freenet and GNet offers is simply not needed. Even for copyrighted and censored material there is, in general, little risk in actually viewing the information because it is simply impractical to go after every single person who access forbidden information. Most all of the time the lawsuits and such are after the original distributors of the information and not the viewers. There for DistribNet will aim to provide anonymity for distributing information, but not for actually viewing it. However, since there *is* some information where even viewing it is extremely risky, DistribNet will eventually be able to provide the same level of anonymity that Freenet or GNet offers, but it will be completely optional. *) I also believe that knowing what is in one owns datastore and being able to block certain type of material from one owns node is not that big of a deal. Unless almost everyone blocks a certain type of information the availability of blocked information will not be harmed. This is because even if 90% of the nodes block say, kiddie porn, the information will still be available on the other 10% of the nodes which, if the network is designed correctly, should be more than enough for anyone to get at blocked information. Furthermore, since the source code for DistribNet will be protected under the GPL or similar license, it will be completely impractical for other to force a significant number of nodes to block information. Due to the dynamic nature of the cache I find it legally difficult to hold anyone responsible for the contents of there cache as it is constantly changing. DistribNet Key Types: There will essentially be two types of keys. Map keys and data keys. Map keys will be uniquely identified in a similar manner as freenet SSK keys. Data keys will be identified in a similar manner as freenet's CHK keys. Map keys will contain the following information: * Short Description * Public Namespace Key * Timestamped Index pointers * Timestamped Data pointers _At any given point in time_ each map key will only be associated with one index pointer and one data pointer. Map keys can be updated by appending a new index or data pointer to the existing list. By default, when a map key is queried only the most recent pointer will be returned. However, older pointers are still there and may be retrieved by specifying a specific date. Thus, map keys may be updated, but information is never lost or overwritten. Data keys will be very much like freenet's CHK keys except that they will not be encrypted. Since they are not encrypted delta compression may be used to save space. There will not be anything like freenet's KSK keys as those proved to be completely insure. Instead Map keys may be requested with out a signature. If there is more than one map key by that name than a list of keys is presented sorted by popularity. To make such a list meaning full every public key in freenet will have a descriptive string associated with it. Data Key Details: Data keys will be stored in maximum size blocks of just under 32K. If an object is larger than 32K it will be broken down into smaller size chunks and an index block, also with a maximum size of about 32K, will be created so that the final object can be reassembled. If an object is too big to be indexed by one index block the index blocks themselves will be split up. This can be done as many times as unnecessary therefore providing the ability to store files of arbitrary size. DistribNet will use 64 bit integers to store the file size therefore supporting file sizes up to 2^64-1 bytes. Data keys will be retrieved by blocks rather than all at once. When a client first requests a data key that is too large to fit in a block an index block will be returned. It is then up the client to figure out how to retrieve the individual blocks. For efficiency reasons a node can be asked which blocks it has based on a given index block rather than having to ask for each and every data block. Data and index blocks will be indexed based on the SHA-1 hash of there contents. The content of the index block does not include the index header therefore allowing the client to verify that a block really is an index block. The exact numbers of as follows: Data Block Size: 2^15 - 128 = 32640; Index block header size: 40 Maximum number of keys per index block: 1630 Key Size: 20 Maximum object sizes: direct => 2^14.99 bytes , about 31.9 kilo 1 level => 2^25.66 bytes , about 50.7 megs 2 levels => 2^36.34 bytes , about 80.8 gigs 3 levels => 2^47.01 bytes , about 129 tera 4 levels => 2^57.68 bytes 5 levels => 2^68.35 bytes (but limited to 2^64 - 1) Index layout: struct IdxBlock { char id[6]; uint16 key_count; uint64 real_size; byte pad[24]; byte keys[1630][20]; }; Date blocks do not contain a header however the client is told ahead of time what type of block it is receiving. where id is "IDX?", ? is the level Why 32640? A block size of just under 32K was chosen because I wanted a size which will allow most text files to fix in one block, most other files with one level of indexing, and just about anything anybody would think of transferring on a public network in two levels and 32K worked out perfectly. Also, files around 32K are rather rare therefor preventing a lot of of unnecessary splitting of files that don't quite make it. 32640 rather than exactly 32K was chosen to allow some additional information to be transfered with the block without pushing the total size over 32K. 32640 can also be stored nicely in a 16 bit integer without having to worry if its signed or unsigned. Lookup Details: Lookup will probably be done by using the chord protocol. See http://www.pdos.lcs.mit.edu/chord/ -- http://kevin.atkinson.dhs.org _______________________________________________ freenet-tech mailing list [EMAIL PROTECTED] http://lists.freenetproject.org/mailman/listinfo/tech