[Tech] Distributed file system using routing inspired by Freenet

Matthew Toseland Tue, 8 Apr 2008 13:00:05 +0100

On Tuesday 08 April 2008 12:36, Matthew Toseland wrote:
> On Tuesday 08 April 2008 00:36, Ian Clarke wrote:
> > http://video.google.com/videoplay?docid=-2372664863607209585
> > 
> > He mentions Freenet's use of this technique about 10-15 minutes in,
> > they also use erasure codes, so it seems they are using a few
> > techniques that we also use (unclear about whether we were the direct
> > source of this inspiration).
> 
> They use 500% redundancy in their RS codes. Right now we use 150% (including 
> the original 100%). Maybe we should increase this? A slight increase to say 
> 200% or 250% may give significantly better performance, despite the 
increased 
> overhead...


In fact, I think I can justify a figure of 200% (the original plus 100%, so 
128 -> 255 blocks to fit within the 8 bit fast encoding limit). On average, 
in the long term, a block will be stored on 3 nodes. Obviously a lot of 
popular data will be stored on more nodes than 3, but in terms of the 
datastore, this is the approximate figure. On an average node with a 1GB 
datastore, the 512MB cache has a lifetime of less than a day; stuff lasts a 
lot longer in the store, and on average data is stored on 3 nodes (by 
design).

We then multiply that by two from splitfile redundancy, to get a total 
redundancy of 6. Wuala works well with a factor of 5 redundancy... but that's 
entirely due to FEC. They simulated ordinary redundancy and needed a factor 
of 24 to be reliable, but a factor of 5 for FEC.

So maybe what we need is less network level redundancy and more FEC level 
redundancy? So we're talking about the data itself. IMHO we can't reduce the 
network level redundancy much below the current store-in-3-nodes, because we 
do use freenet for things other than splitfiles - frost posts, the top level 
block, ... The top level block is a special case, it will usually be 
fetchable because anyone trying to fetch the splitfile will fetch it even if 
they give up afterwards, and even if they just followed a link in fproxy and 
got a size warning and changed their mind...

Wuala's simulations assume 25% uptime, and they don't allow nodes to have 
extra storage unless they have at least 17% uptime. Can we implement 
something similar? We would have to not take low uptime nodes into account 
when determining whether we are a sink for a key, the problem with this is 
that we'd have to reliably tell whether nodes are low uptime... On opennet, 
there is enough connection churn that we're unlikely to have had a node for 
the many days necessary to measure this. We could reduce the connection churn 
but this would come at the cost of reduced connectivity - when a node 
disconnects, we give it a few minutes to reconnect, and then we move on. A 
full blown reputation system as Wuala uses would be a lot of work and a lot 
of debugging...
>  
> Also, they discourage low uptime nodes by not giving them any extra storage. 
> I'm not sure exactly what we can do about this, but it's a problem we need 
to 
> deal with.
> 
> We should also think about randomising locations less frequently. It can 
take 
> a while to recover, and the current code randomizes roughly every 13 to 22 
> hours. It may be useful to increase this significantly? Unfortunately this 
> parameter is very dependant on the network size and so on, it's not really 
> something we can get a good value for from simulations... I suggest we 
> increase it by say a factor of 4, and if we get major location distribution 
> issues, we can reduce it again.

This may be important.
> > 
> > Ian.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20080408/7d7569f8/attachment.pgp>

[Tech] Distributed file system using routing inspired by Freenet

Reply via email to