[Tech] Distributed file system using routing inspired by Freenet

Jano Wed, 09 Apr 2008 17:45:36 +0200

Matthew Toseland wrote:

> On Wednesday 09 April 2008 05:28, Daniel Cheng wrote:
>> 2008/4/8 Matthew Toseland
>> <toad at amphibian.dyndns.org>:
>> > On Tuesday 08 April 2008 12:36, Matthew Toseland wrote:
>> >  > On Tuesday 08 April 2008 00:36, Ian Clarke wrote:
>> >  > > http://video.google.com/videoplay?docid=-2372664863607209585
>> >  > >
>> >  > > He mentions Freenet's use of this technique about 10-15 minutes in,
>> >  > > they also use erasure codes, so it seems they are using a few
>> >  > > techniques that we also use (unclear about whether we were the direct
>> >  > > source of this inspiration).
>> >  >
>> >  > They use 500% redundancy in their RS codes. Right now we use 150%
> (including
>> >  > the original 100%). Maybe we should increase this? A slight increase to
> say
>> >  > 200% or 250% may give significantly better performance, despite the
>> >  increased
>> >  > overhead...
>> >
>> >  In fact, I think I can justify a figure of 200% (the original plus 100%,
> so
>> >  128 -> 255 blocks to fit within the 8 bit fast encoding limit). On
> average,
>> >  in the long term, a block will be stored on 3 nodes. Obviously a lot of
>> >  popular data will be stored on more nodes than 3, but in terms of the
>> >  datastore, this is the approximate figure. On an average node with a 1GB
>> >  datastore, the 512MB cache has a lifetime of less than a day; stuff lasts
> a
>> >  lot longer in the store, and on average data is stored on 3 nodes (by
>> >  design).
>> 
>> I think the downloader would "heal" a broken file by re-inserting the
>> missing FEC blocks, right?
>> 
>> If that is the case, I think we can use 300% (or higher) redundancy,
>> but only insert a random portion of them. When a downloader download
>> this file, he insert (some other) random blocks of FEC for this file.
>> Under this scheme, the inserter don't have to pay for a high bandwidth
>> overhead cost, while increasing the redundancy.
> 
> I'm not worried about inserters paying a high bandwidth cost actually. Right
> now inserts are a lot faster than requests. What I'm worried about is if we
> have too much redundancy, our overhead in terms of data storage will be
> rather high, and that reduces the amount of data that is fetchable.


FWIW, my store is 50% full (of 8GB) after several several weeks of mostly 24/7
uptime, whereas the cache gets filled pretty fast.

Plus reinsertions are requested quite often in frost shortly after an
announcement. Could this mean that stores aren't being currently fully
exploited?

[Tech] Distributed file system using routing inspired by Freenet

Reply via email to