On Wednesday 09 April 2008 16:27, Daniel Cheng wrote: > On Wed, Apr 9, 2008 at 10:29 PM, Matthew Toseland > <toad at amphibian.dyndns.org> wrote: > > > > On Wednesday 09 April 2008 05:28, Daniel Cheng wrote: > > > 2008/4/8 Matthew Toseland <toad at amphibian.dyndns.org>: > > > > On Tuesday 08 April 2008 12:36, Matthew Toseland wrote: > > > > > On Tuesday 08 April 2008 00:36, Ian Clarke wrote: > > > > > > http://video.google.com/videoplay?docid=-2372664863607209585 > > > > > > > > > > > > He mentions Freenet's use of this technique about 10-15 minutes in, > > > > > > they also use erasure codes, so it seems they are using a few > > > > > > techniques that we also use (unclear about whether we were the direct > > > > > > source of this inspiration). > > > > > > > > > > They use 500% redundancy in their RS codes. Right now we use 150% > > (including > > > > > the original 100%). Maybe we should increase this? A slight increase to > > say > > > > > 200% or 250% may give significantly better performance, despite the > > > > increased > > > > > overhead... > > > > > > > > In fact, I think I can justify a figure of 200% (the original plus 100%, > > so > > > > 128 -> 255 blocks to fit within the 8 bit fast encoding limit). On > > average, > > > > in the long term, a block will be stored on 3 nodes. Obviously a lot of > > > > popular data will be stored on more nodes than 3, but in terms of the > > > > datastore, this is the approximate figure. On an average node with a 1GB > > > > datastore, the 512MB cache has a lifetime of less than a day; stuff lasts > > a > > > > lot longer in the store, and on average data is stored on 3 nodes (by > > > > design). > > > > > > I think the downloader would "heal" a broken file by re-inserting the > > > missing FEC blocks, right? > > > > > > If that is the case, I think we can use 300% (or higher) redundancy, > > > but only insert a random portion of them. When a downloader download > > > this file, he insert (some other) random blocks of FEC for this file. > > > Under this scheme, the inserter don't have to pay for a high bandwidth > > > overhead cost, while increasing the redundancy. > > > > I'm not worried about inserters paying a high bandwidth cost actually. Right > > now inserts are a lot faster than requests. What I'm worried about is if we > > have too much redundancy, our overhead in terms of data storage will be > > rather high, and that reduces the amount of data that is fetchable. > > Disk are getting cheaper and cheaper...
Yes, but the flipside is people want to share bigger and bigger files. > Also, high data redundancy means we can drop any blocks of them without > problem, right? Not necessarily. We have data redundancy precisely because blocks get dropped for various reasons e.g. because a node goes offline. > > The only potential problem I have in mind is the LRU drop policy on store full. > All blocks of an unpopular item may drops around the same time if we use > this policy.. > > I think if the redundancy is high enough, we should use: > - Random drop old data on store full. Hmmm. I dunno... simulations would be interesting. > - LRU drop on Cache full. > which should give a good balance of data retention and load balancing > > Regards, > Daniel Cheng -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://emu.freenetproject.org/pipermail/tech/attachments/20080409/7917e224/attachment.pgp>