Matthew Toseland wrote: > On Wednesday 09 April 2008 05:28, Daniel Cheng wrote: >> 2008/4/8 Matthew Toseland >> <toad at amphibian.dyndns.org>: >> > On Tuesday 08 April 2008 12:36, Matthew Toseland wrote: >> > > On Tuesday 08 April 2008 00:36, Ian Clarke wrote: >> > > > http://video.google.com/videoplay?docid=-2372664863607209585 >> > > > >> > > > He mentions Freenet's use of this technique about 10-15 minutes in, >> > > > they also use erasure codes, so it seems they are using a few >> > > > techniques that we also use (unclear about whether we were the direct >> > > > source of this inspiration). >> > > >> > > They use 500% redundancy in their RS codes. Right now we use 150% > (including >> > > the original 100%). Maybe we should increase this? A slight increase to > say >> > > 200% or 250% may give significantly better performance, despite the >> > increased >> > > overhead... >> > >> > In fact, I think I can justify a figure of 200% (the original plus 100%, > so >> > 128 -> 255 blocks to fit within the 8 bit fast encoding limit). On > average, >> > in the long term, a block will be stored on 3 nodes. Obviously a lot of >> > popular data will be stored on more nodes than 3, but in terms of the >> > datastore, this is the approximate figure. On an average node with a 1GB >> > datastore, the 512MB cache has a lifetime of less than a day; stuff lasts > a >> > lot longer in the store, and on average data is stored on 3 nodes (by >> > design). >> >> I think the downloader would "heal" a broken file by re-inserting the >> missing FEC blocks, right? >> >> If that is the case, I think we can use 300% (or higher) redundancy, >> but only insert a random portion of them. When a downloader download >> this file, he insert (some other) random blocks of FEC for this file. >> Under this scheme, the inserter don't have to pay for a high bandwidth >> overhead cost, while increasing the redundancy. > > I'm not worried about inserters paying a high bandwidth cost actually. Right > now inserts are a lot faster than requests. What I'm worried about is if we > have too much redundancy, our overhead in terms of data storage will be > rather high, and that reduces the amount of data that is fetchable.
FWIW, my store is 50% full (of 8GB) after several several weeks of mostly 24/7 uptime, whereas the cache gets filled pretty fast. Plus reinsertions are requested quite often in frost shortly after an announcement. Could this mean that stores aren't being currently fully exploited?