[freenet-tech] Distributed ideas missing -- pls fwd to developers;:appropriate redundancy, sizing flexibility, appropriateauthentification and security, compression, gateways and caching

Brad Allen Sun, 15 Apr 2001 15:28:27 -0700
[ Postmaster:  Please forward this message to developers of your
  distributed programs, such as Freenet and Mojo Nation. -- Brad Allen
  <[EMAIL PROTECTED]> ]

Subject: Distributed ideas missing -- pls fwd to developers;:
         appropriate redundancy, sizing flexibility, appropriate
         authentification and security, compression, gateways and caching.

Back in 1984 (A.D., Gregorian Calendar, Earth, Sol, Milky Way) I
pretty much thought of all of the concepts needed for reliable
distributed information (and for that matter, thought, but I'm not
covering that here).  I noticed you missing some things from what I
thought of, according to one review in "MojoNation V0.956.1 Review
Posted by erik on Sun Mar 4th, 2001 02:58:26 PM , Reviews" on
InfoAnarchy.Org:

* Redundancy.  According to the system there, your system does not
  analyze the probability of pieces of a file being available or
  the probability of servers being available.  There need to be
  probability scores for each server, plus a duality of both a
  minimum probability redundancy plus a minimum absolute redundancy
  (for instance, a nice transparent via invisible virtual redundant
  network through some sort of transparent gateway with lots of
  well designed parallel/redundant systems with lots of effective
  RAID drives may get a rather high probability score and help
  add to some probability threshold, say 900% (I'm using the same
  as what DNS used for the TLDs just as an example), and yet the
  total resultant number of servers adding to more than 900%
  probability is still not enough in a combinatorial redundancy;
  a disconnection from a very good server would create such a big
  gap that that possibility itself is a big problem, and therefore
  a minimum of 15 servers per file would also be enforced.  Perhaps
  a downward-slope equation on the actual probabilities could be
  used before the add-up in order to use only one metric, so that
  2(100%*f) is less than 4(40%*f), where f is that downward function.
  This would make the fact that there are four servers a better thing.
  Choosing good functions may be part of the math of the program,
  and requires good long-term mathematics and quality control.

* Automatic flexibility in sizing.  Some bandwidth is really slow
  (like a low hertz connection to Pluto -- which brings up another
  issue -- high latency -- it may make sense to start programming
  networks today in order to be able to handle high latency
  situations just as reliably and efficiently as low latency
  situations, so that space travel will not require modifications
  to the code); for instance, a 56Kbps modem is quite slow compared
  to a 10megabit per second cable modem.  It is quite OK to stick
  a one megabyte "chunk" on a 10megabit per second cable modem with
  excellent connections (say, a server running in reverse to a
  typical Time Warner of Manhattan Road Runner service connection --
  typical download speeds on that system are quite impressive, even
  across administrative barriers to other networks, since they
  have great connectivity to other networks), whereas even a 20KB
  chunk can be equivilent or harder on some legacy phone modem
  networks.  This flexibility in sizing would use probability and
  speed experiences for the chunks as well.  In this way,
  a user could allow files on their system to become part of the
  overall network, and those files, rather than having to be
  explicitly uploaded, would instead have the redundancy, probability,
  sizing, and network closeness assessment metrics automatically rear
  their head as being wrong, and the (distributed programmed) network
  would automatically be jarred into migrating those files away from
  the server.  In this way, the tediousness of explicitly waiting
  for uploads would be mitigated, and mojo would be earned a bit
  less explicitly but more easily and efficiently to start with.
  The difference may be subtle -- simply "backgrounding" the
  initial uploads -- but since it will be indexed quicker, the
  availability is really better.  A larger set of files with
  clients wanting certain files before others will help.


I also did not see mention of these other ideas:

* Bitwise comparison of file subchunks using compression as a goal;
  unique IDs would be given to subchunks, so that files with subchunks
  that are equivilent will have those subchunks be stored much more
  efficiently and with a resultant better distribution.  This way, for
  example, two versions of a text document which are substantially
  similar will take a lot less space to exactly store on the network.
  In more complex cases, compression algorithms could use existant
  files as compression similarity cache for initialized compression
  sets which would get more compression done, and acceptability
  indexes can be used (for instance, where a certain amount of loss is
  acceptable, the downloading site can use a less perfect copy but
  still increase speed if there happens to be closer or more
  attainable previous or similar versions which correspond to
  initialized cache; this would have to be calculated with respect to
  resources used all around, including CPU.  This actually is useful
  in low bandwidth systems.  Use a supercomputer in space and a
  supercomputer on earth as a model and then interpolate to
  minicomputers and then finally lower grade computers -- it is easy
  to see that the supercomputer in space may in certain situations
  have bad communications with Earth, but may gain wonderful speed
  increases using this type of compression.  Also, in military or
  political situations, encryption may be limited and also be a
  limitor of bandwidth.  Also, bandwidth may be limited by other
  factors -- telecommunications sabatoge by corporate customer
  gouging; affordable amounts of cell phone digital signal bandwidth;
  etc.)

* Caching of various content types.  HTTP and FTP content are
  obvious.  These could be cached.  In fact, comparison in this realm
  is wonderful -- this could reduce the redundancy in the HTTP and FTP
  realms by mirrors to nothing more than the usual redundancy you have
  for one copy of the original plus the indexing of all the various
  ways to refer to the original mirrors.  A download for a different
  file that has not yet been compared can be bitwise fully downloaded
  by a set of computers that are closer to the HTTP and FTP sites, and
  then compared against in-store chunks of files that are also close
  to the requestor client, and then instead of sending the entire file
  across, simply acknowledging perfect copies using the above
  compression system would make a transparent download possible.  The
  additional data that the original sources are equal or similar (and
  compressable in that way) would also be marked.  This has obvious
  web speedup possibilities.  This type of functioning could be
  programmed into a general gateway function that could gateway to any
  service (such as also gopher, and any other file distribution
  system, such as any of or all of those mentioned on
  infoanarchy.org.)

* Indexing should just be another object type; I am surprised to hear
  that an index search takes a long time.  A dynamic distribution of
  indexing should be done so that indexing data can be found fast.
  For instance, if you order by strict byte (such as string, or
  lookup, order), you can assign subpieces by redundancy and
  probability needs such as:

  + server 2 contains index entries A-F and W
  + server 6 contains index entries A-Z
  + server 12 contains index entries G-V and X-Z
  + server 15 contains index entries A-G and Z
  + server 20 contains index entries H-Y

  The actual assignments would again be done by probability,
  redundancy, distribution metrics (in the case of mojonet including
  the Mojo metric), etc.

  This way, searches would be quite fast.


Of course, the entire protocol needs appropriate security, including
authentification and other cryptogaphic means.  The major problem is
finding someone who says that such and such a file is really the real
thing, but it isn't; imagine a chunk of a file that has a worm in it.
Of course, multiple hashes (MD5, RMD160, SHA1, CRCs, etc.) to
crosscheck entire files from a larger set of confirmation sources would
be entirely pertinent.  Files should habilitually contain lots of
signature from everywhere, and there should prettymuch be a full set
of authenticity for every object, whatever its purpose.  Also,
encryption should be usable where appropriate.  Groups, chunks of
encrypted files, etc. should be quite usable via public key
cryptography even in such a distributed system.

I copy this to a small set of distributed object transmission (and
caching/storage) developors so that the ideas will not be lost or
slowed down due to the lack of my being rich enough to implement them
on my own.  (My only lack is money to keep me alive and have solace to
program; where I live, dog barks interrupt my every thought as to make
them useless.  I can only give this information to you out of both
desperation and the fact that I already ingrained it in my brain in
1984.  This is not a total dump of my brain.  Omission does not imply
forfeiture of ownership.  Permission to use these obvious ideas so
long as you do not lie about their origin is granted.)

[This message was a quick hack, and is not intended to be a perfect
result of a quality meditation.  Please do your own brainstorming
using this as seed and start your own meditation, or if you already
have performing implemented application, just upgrade your application
to include these ideas so that other meditators can have more
seed sourced from implemented application usage.]

Sincerely yours,
Brad Allen <[EMAIL PROTECTED]>
PGP signature
[freenet-tech] Distributed ideas missing -- pls fwd to developers;:appropriate redundancy, sizing flexibility, appropriateauthentification and security, compression, gateways and caching

Reply via email to