I just came up with some interesting ideas on splitfiles.  Some of theese may 
have already been discussed, although I have had a hard time finding things 
on theese topics in the documentation and mailing lists.  The ideas are:

1.  random segment choosing
  After recieving the list of the pieces, they should be downladed in a 
random order so that, because of download failure, the end pieces are not 
harder to grab.  Redundant segments should be included in the random list so 
that they are not harder to retreive.  The node can stop downloading when it 
has enough segments to create the file.

2.  Download recovery.
  Splitfiles allows nodes to easily recover from broken downloads so that all 
the data does not need to be retransferred.  This is basically automatic 
because of the nature of freenet, but it would be more convienent for 
inexperienced users to be able to click on a partial file and finish the 
download.  This would require a download manager client.

3.  Data synchronization.
  Let's say i'm using Debian's CD image tool (it downloads all the files to 
create a pseudo-image then does rsync to turn it into the official image) 
over freenet.  The tool could link to all the files that are needed to 
construct the pseudo-image and download them off from freenet.  Then, it 
could download special splitfile meta-data.  This metafile would include a 
rolling checksum on the entire file and then strong checksums on all the 
segments (which would already be there if CHKs are used for the segments). 
The pseudo-image program could then do some rsync-like magic to calculate, at 
every offset, where the checksums on the segments of the pseudo-image match 
those on the official image.  The program could then download just the 
segments it does not have and/or the redundant segments then reconstruct the 
file.  If a new version of the CD is posted on freenet, then the sync program 
can use this method to only download the changes.

4.  Redundency reduction.
  Take the previous example where two similar versions of the Debian CD are 
posted on freenet.  Let's say version A is already well-established in 
freenet and version B is going to be placed in it.  If variable-size 
splitfiles are used, B could be designed to use the identical segments in A.  
(This is much different than a diff).  The checksum data could be downloaded 
and B's data compared.  Matching segments could be indexed in order by the 
splitfile index, and missing ones would be added.  The problem though is that 
the missing data might not fit evenly between the already-existing data.  
This would make a few segments differ in size from the others.
Even more advanced, a complete checksum index with segment connectivity could 
be added to freenet.  Every linked segment in freenet would be listed here, 
and also each splitfile that refrences it.  Nodes wanting to put data in 
freenet could dowload a checksum list of the most linked segments, then 
upload data using these segments.  An index might imply centralization, but a 
decent checksum list could be pulled from the inserter's data store or some 
other decentralized resource.

If implemented, theese ideas would make freenet a litte more complex, but 
more effecient - both with storage and bandwidth requirements.  I would like 
to do get into some freenet development, but I don't have a good idea where 
to start.  Any suggestions?

-Scott Young

_______________________________________________
freenet-tech mailing list
[EMAIL PROTECTED]
http://lists.freenetproject.org/mailman/listinfo/tech

Reply via email to