[Tech] Distributed file system using routing inspired by Freenet

Matthew Toseland Wed, 9 Apr 2008 22:52:52 +0100

On Wednesday 09 April 2008 16:27, Daniel Cheng wrote:
> On Wed, Apr 9, 2008 at 10:29 PM, Matthew Toseland
> <toad at amphibian.dyndns.org> wrote:
> >
> > On Wednesday 09 April 2008 05:28, Daniel Cheng wrote:
> >  > 2008/4/8 Matthew Toseland <toad at amphibian.dyndns.org>:
> >  > > On Tuesday 08 April 2008 12:36, Matthew Toseland wrote:
> >  > >  > On Tuesday 08 April 2008 00:36, Ian Clarke wrote:
> >  > >  > > http://video.google.com/videoplay?docid=-2372664863607209585
> >  > >  > >
> >  > >  > > He mentions Freenet's use of this technique about 10-15 minutes 
in,
> >  > >  > > they also use erasure codes, so it seems they are using a few
> >  > >  > > techniques that we also use (unclear about whether we were the 
direct
> >  > >  > > source of this inspiration).
> >  > >  >
> >  > >  > They use 500% redundancy in their RS codes. Right now we use 150%
> >  (including
> >  > >  > the original 100%). Maybe we should increase this? A slight 
increase to
> >  say
> >  > >  > 200% or 250% may give significantly better performance, despite 
the
> >  > >  increased
> >  > >  > overhead...
> >  > >
> >  > >  In fact, I think I can justify a figure of 200% (the original plus 
100%,
> >  so
> >  > >  128 -> 255 blocks to fit within the 8 bit fast encoding limit). On
> >  average,
> >  > >  in the long term, a block will be stored on 3 nodes. Obviously a lot 
of
> >  > >  popular data will be stored on more nodes than 3, but in terms of 
the
> >  > >  datastore, this is the approximate figure. On an average node with a 
1GB
> >  > >  datastore, the 512MB cache has a lifetime of less than a day; stuff 
lasts
> >  a
> >  > >  lot longer in the store, and on average data is stored on 3 nodes 
(by
> >  > >  design).
> >  >
> >  > I think the downloader would "heal" a broken file by re-inserting the
> >  > missing FEC blocks, right?
> >  >
> >  > If that is the case, I think we can use 300% (or higher) redundancy,
> >  > but only insert a random portion of them. When a downloader download
> >  > this file, he insert (some other) random blocks of FEC for this file.
> >  > Under this scheme, the inserter don't have to pay for a high bandwidth
> >  > overhead cost, while increasing the redundancy.
> >
> >  I'm not worried about inserters paying a high bandwidth cost actually. 
Right
> >  now inserts are a lot faster than requests. What I'm worried about is if 
we
> >  have too much redundancy, our overhead in terms of data storage will be
> >  rather high, and that reduces the amount of data that is fetchable.
> 
> Disk are getting cheaper and cheaper...


Yes, but the flipside is people want to share bigger and bigger files.

> Also, high data redundancy means we can drop any blocks of them without
> problem, right?

Not necessarily. We have data redundancy precisely because blocks get dropped 
for various reasons e.g. because a node goes offline.
> 
> The only potential problem I have in mind is the LRU drop policy on store 
full.
> All blocks of an unpopular item may drops around the same time if we use
> this policy..
> 
> I think if the redundancy is high enough, we should use:
>    - Random drop old data on store full.

Hmmm. I dunno... simulations would be interesting.

>    - LRU drop on Cache full.
> which should give a good balance of data retention and load balancing
> 
> Regards,
> Daniel Cheng
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20080409/7917e224/attachment.pgp>

[Tech] Distributed file system using routing inspired by Freenet

Reply via email to