Hi. Attached is a spec I would like some comment on. Apologies for the
cross-post; it could form a basis for numerous EOF protocols, and it is
certainly relevant to freenet-tech.
--
The road to Tycho is paved with good intentions
Hash Resolution System
----------------------
The Hash Resolution System is a distributed application for freenet that
allows verifiable retrieval of files based on their MD5 or SHA1 hashes (and
potentially any other suitable hash). It is useful in amongst other things:
a) apt-get over freenet, without a central trusted insertor; each client
downloads the Packages file from the official debian server, which includes
an MD5 checksum (don't ask me why they aren't using SHA-1) of each package
file. Then the client feeds the CHKs into the HRS, and if found it uses the
files from freenet. Since the downloads can occur in parallel, from different
nodes, this can lead to fairly good performance in terms of throughput, with
very little load on the debian origin servers (of course it will take some
time to get started). Those files not found are downloaded, and a certain
portion of them (a fixed size?) are inserted into freenet via the HRS. Thus,
there is no central authority (apart from the debian servers themselves), and
apt-get-over-freenet can be prototyped without completely swamping freenet.
This is a temporary solution, but far more reasonable than the alternatives.
b) web caching/prefetch over freenet. Some web servers support the Content-MD5
header; these same web servers generally allow one to open a connection,
fetch the headers (including the hash) of a load of files, then disconnect, all
in a very small number of packets (thanks to keep-alives and pipelining).
c) reconstructing a system, as much as possible, from a tripwire snapshot i.e.
there exist systems which take MD5's of every file on the system on a regular
basis, and store this in a database. Granted, this could be accomplished by
adding CHKs, calculated in the standard way, to the hashes supported by such
software
d) generally any time you want to know 'what file has this MD5 [optional: and
this length]'.
e) this could be used for looking up the decryption key for a CHK in a
datastore. This use is strongly discouraged as it is bad for freenet, but it
is possible.
HRS provides a way to resolve a {MD5, length} pair (or just an MD5) to a CHK
(possibly of a splitfile). Any data returned is valid by virtue of having been
checked against the hash. HRS does not modify the node core; it is purely
client side.
Requesting Data
---------------
a) Look up the sequence KSK@<md5>-<length>. This is a publicly writable
list set up using KeyIndexClient/FCP layer 2. Returned data will be one or
more of the following:
CHK of the actual data, or splitfile spec for it (in standard metadata format)
P - Public key of insertor
R - Random garbage for insertor (SSK encryption entropy)
Note that this is spammable.
b) Looking up SSK@<P>,<R>/<md5>-<length> for a known insertor. We keep an
ordered list of known insertors; each one has P, R and D, the total in bytes
that has been successfully downloaded from this insertor. We fetch from the
top <n> best insertors initially, moving on to less successful insertors.
When the list becomes too big, the least successful insertor is removed; this
is biased by the number of attempts to download data through that insertor,
so it is not necessarily the insertor with the least D (exact algorithm to
be decided). This request returns a simple redirect to the actual data. If it
is not a simple redirect to the data, the insertor is removed from the list.
If the data does not have the correct hash or length, the insertor is removed
from the list. Each insertor may have a file (sequence? DBR?)
SSK@<P>,<R>/friends, which lists insertors known to that insertor; this may be
used to find out about new insertors once the client has an insertor list (at
that point, using the KSK search is deprecated, a measure of last resort).
The insertors are listed in the order that the insertor referring to them
keeps them in, but they are all put into the list with D=0, at the bottom.
Thus we have a crude, but hopefully adequate, web of trust.
Inserting Data
--------------
An insertor instance creates a subspace P and some random (readable) entropy
R. Each file is inserted as a CHK, then the metadata described above is
inserted under the sequences KSK@<md5>-<length>, KSK@<md5>, (and whatever
other hashes), and under SSK@<P>,<R>/<md5>-<length> and SSK@<P>,<R>/<md5>
(these are not sequences, they are plain redirects). If the insertor instance
is also a client instance (as is useful in e.g. apt-get over freenet or other
applications where total anonymity is not absolutely critical), it inserts
SSK@<P>,<R>/friends (either a DBR, or a sequence, depending on update schedule;
each file is simply a list of SSK@<P>,<R>'s; metadata in the original file
specifies whether it is a redirect or a sequence).
Requirements
------------
Requird: FCP layer 2
Helpful: standard metadata for retrieving a sequence.
Note 1: MD5 security: It is well known that there are weaknesses in the MD5
algorithm; this is why Freenet uses SHA-1. However, I haven't seen a mechanism
to create a file with a given MD5 yet (collisions have been found?) - if
somebody has seen this, please email me. This will be particularly hard when
looking up the tuple {MD5, length}.