[freenet-tech] More on DistribNet

Kevin Atkinson Sun, 17 Feb 2002 22:11:03 -0800


Here is an update to my original post about DistribNet with a good deal
of information added.


I really like the general idea behind freenet, however I believe
Freenet is overly concerned about anonymity.  Therefore, unless some
one talks me out of it, I am strongly considering starting my own
project called DistribNet which will be similar to Freenet but
different in a number of key areas.  It will also try to avoid some of
the problems freenet has been having.

*** Comparison to Freenet

*) Focus more on speed and scalability than anonymity.  The goal of
   DistribNet is to be as fast or faster than the Web for any sort of
   pages with reasonable popularity.

*) No fancy datastore.  Use the file system for storing keys.  No
   attempt to disguise what is in one owns datastore.  Nothing is
   encrypted by default.

   Since data is stored in a straight forward manner there is little
   change the "datastore" will get corrupted and have to be reset.
   Also, since data is no longer in a fixed size file, the size of the
   data store can be controlled via both soft and hard quotas.

   Finally support will be added for shrinking the datastore so that
   there would be no reason anyone can not donate almost all of
   there unused space to DistribNet.

*) The protocol will be well defined and kept as simple as possible.
   Transferring of data from one node to another will likely use the
   HTTP protocol for simplicity.

*) By default no attempt will be made to prevent other nodes from
   knowing what is in another nodes datastore.

*) Data will not have to be routed though other nodes.  Instead most
   data will be send directly from one node to another.

Please note the "by default" part.  The eventual goal is to support the
same level of anonymity that freenet offers, but that is not
DistribNet primary focus.

However, DistribNet will be the same as freenet in several key areas.

*) Will allow anyone to anonymously post content to the network 

*) Completely decentralized

*) Content will be stored in a similar fashion that data is stored in
   Freenet.

In addition DistribNet will differ by freenet as it is now with:

*) The ability for one to share content that is on one's hard drive
   or be able to fetch content from the Web or other networks when it
   is more effect to do so.

*) Searching and support for "updateable" keys will be build into the
   protocol from the beginning.  The searching faculty will be
   designed in such a way to make message boards trivial to implement.

*) Will try very hard to keep all but the most unpopular content from
   falling off the network.  I have not worked all the details out yet
   but basically before a node deletes content it will check to see
   that other nodes have the content in there datastore.  If there are
   not enough nodes which have the content it won't delete it unless
   it is truly unpopular content that has been around for while or can
   find another node to accept the content it wants to get rid of.

By providing support for updateable keys from the start and using a
simple datastore which will be very hard to corrupt and should never
have to be reset I hope to eliminate most queries for non existent
data which I have a feeling accounts for a good deal of the network
traffic in freenet.

DistribNet may also be able to participate in other networks like
Freenet itself and Gnutella.  However, due to the differences in the
way DistribNet and the other networks operate this may not be
possible.

*** Philosophy behind DistribNet

For most type of things the level of anonymity that freenet offers is
simply not needed.  Even for copyrighted and censored material there
is, in general, little risk in actually viewing the information
because it is simply impractical to go after every single person who
access forbidden information.  Most all of the time the lawsuits and
such are after the original distributors of the information and not the
viewers.  There for DistribNet will offer the same level of anonymity
that freenet offers for distributing information, but not for actually
viewing it.  However, since there *is* some information where even
viewing it is extremely risky, DistribNet will eventually be able to
provide the same level of anonymity that freenet offers, but it will
be completely optional.

I also believe that knowing what is in one owns datastore and being
able to block certain type of material from one owns node is not that
big of a deal.  Unless almost everyone blocks a certain type of
information the availability of blocked information will not be
harmed.  This is because even if 90% of the nodes block say, kiddie
porn, the information will still be available on the other 10% of the
nodes which, if the network is designed correctly, should be more than
enough for anyone to get at blocked information.   Furthermore, since
the source code for DistribNet will be protected under the GPL or
similar license, it will be completely impractical for other to force
a significant number of nodes to block information.

*** DistribNet Architecture

I have not worked all the details of how DistribNet will work, but
here is what I have so far:

There will essentially be two types of keys.  Map keys and data keys.
Map keys will be uniquely identified in a similar manner as freenet SSK
keys.  Data keys will be identified in a similar manner as freenet's
CHK keys.

Map keys will contain the following information:

  * Short Description
  * Public Namespace Key
  * Timestamped Index pointers
  * Timestamped Data pointers

_At any given point in time_ each map key will only be associated with
one index pointer and one data pointer.  Map keys can be updated by
appending a new index or data pointer to the existing list.  By
default, when a map key is queried only the most recent pointer will
be returned.  However, older pointers are still there and may be
retrieved by specifying a specific date.  Thus, map keys may be
updated, but information is never lost or overwritten.

Data keys will be very much like freenet's CHK keys except that they will
not be encrypted.  Since they are not encrypted delta compression may
be used to save space.

There will not be anything like freenet's KSK keys as those proved to
be completely insure.  Instead Map keys may be requested with out a
signature.  If there is more than one map key by that name than a list
of keys is presented sorted by popularity.  To make such a list
meaning full every public key in freenet will have a descriptive
string associated with it.

Query for keys will be handled similar to freenet but instead of
returning the actual data a pointer to the node which can easily
provide the data is returned.  The data can then be directly
transfered from one node to another.  Once transfered the data will be
cached in the local node.  If a particular node notices a large number
of query for a key that it does not have it may chose to store a copy
in its own cache therefore providing similar performance benefits that
freenet's routing provides.

*** Pseudo Code and Implementation Notes

Here is the beginning of how I think the network should function.  I
only deal with data keys here and very little is done in terms of
routing or protecting the network against attacks.  Also, even though
the code presented here is serial when actually implemented a good
deal of the network stuff will be done in paraller.  Both by using
threads and non-blocking IO.  Threads will be kept to a minimal.

I have not decided what language I will use but I most likely will be
doing this in C and C++ using POSIX system calls.  It should also work
under Win32 using GCC and the CYGWIN library however I will rely on
someone else to test and debug the Win32 port as I personally hate
windows and only use it when I have to.

# Note: Functions in mixed case LikeThis will likely involve
# contacting another node over the network.  Functions in lower case
# are local functions

# Global data structures

Node:
  other nodes
  key index
  local keys

OtherNode:
  id
  query responce time
  average download time
  network distance
  relability

KeyIndex:
  key
  last query:
    try
    date
  query log
  node list (where the data can be downloaded from)

LocalKey:
  key
  query log
  data

QUERY_BRANCH_FACTOR = 3;
UPLOAD_BRANCH_FACTOR = 5;

# DataQuery returnes a list of nodes which can easlly make the key
# available for download or AlreadyQueried if this node has already
# been queried for a given handle

DataQuery (key, try, handle)
  if already_responded(handle)
    return AlreadyQueried
  key_info = key_index[key];
  decrement try
  if try = 0 OR
      (not expired key_info.last_query AND last_query.try <= try)
    return key_info.node_list
  canadite_nodes = select_candidate_nodes(key, FOR_QUERY)
  nodes_queried = 0;
  need_to_query = min(try, QUERY_BRANCH_FACTOR)
  while (nodes_queried < need_to_query)
    node = canadate_nodes.pop;
    result = node.DataQuery(key, 
                            random_round(try / QUERY_BRANCH_FACTOR), 
                            handle)
    if result != AlreadyQueried
      push result onto key_info.node_list
      increment nodes_queried
  loop
  key_info.last_query.try = try
                     .date  = NOW
End

# Retrive data attemps to fetch a key from the network.  It returns
# the actual data or an error
RetriveData (key)
  if have_key return
  try = 1
  key_info = nil
  until acceptable_node(key_info) or try > MAX_TRY
    key_info = DataQuery(key, try, create_handle)
    try = next_try_level(try);
  if key_info = {}
    return DataNotFound
  node_to_use = best_node(key_info)
  return node_to_use.Download(key)
End

# download the key from this node
Download (key)
  unless have key
    return DontHaveKey
  return Data
End

discover
  foreach other_node in other_nodes
    ExchangeInfo(this_node,  other_node)
  loop
End

# Upload a key to the network.  Returns a list of nodes the data was
# uploaded to.
Upload (key, data, try, handle) 
  if (will_accept_key(key))
    store_key(key, data)
    decrement try
  canadite_nodes = select_candidate_nodes(key, FOR_UPLOAD)
  nodes_queried = 0;
  need_to_query = min(try, UPLOAD_BRANCH_FACTOR)
  nodes_with_data = {}
  while (nodes_queried < need_to_query)
    node = canadate_nodes.pop;
    result = node.DataUpload(key, data,
                             random_round(try / UPLOAD_BRANCH_FACTOR), 
                             handle)
    next if result = AlreadyUpload
    # verify that it made it
    node = select_random_node(result)
    new_data = node.Download(key)
    next if new_data != data # ie the download failed
    push result onto nodes_with_data
    increment nodes_queried
  loop
  # verify that the data is now properly indexed 
  result = DataQuery(key, MAX_TRY, create_handle);
  result = nodes_with_data also_in result
  if (|nodes_with_data| - |result| > ERROR_THRESHOLD)
    ... what to do?
  return result
End

*** Conclusion

I really think I can make this work and I strongly belive such a
network has great potential.  My eventual hope is that it will replase
networks like Gnutella and Morpheus and will also eliminate the need
for personal home pages on pop-up-city web sites.

-- 
http://kevin.atkinson.dhs.org


_______________________________________________
freenet-tech mailing list
[EMAIL PROTECTED]
http://lists.freenetproject.org/mailman/listinfo/tech

[freenet-tech] More on DistribNet

Reply via email to