On Mon, 05 Dec 2005 22:31:11 +0000, Matthew Toseland wrote: > On Tue, Dec 06, 2005 at 12:01:29AM +0200, Jusa Saari wrote: >> >From what I've understood, Freenet 0.7 is supposed to handle splitfiles >> transparently, so that the inserting node fragments and the retrieving >> node reassembles files automatically, without client programs needing to >> know or care about the block dis/reassembly. Am I correct ? > > Pretty much. However, clients (e.g. fproxy) can specify a maximum file > size. >> >> Now, suppose that you have a large file; say, a Linux DVD image. Suppose >> that you have inserted it with a program like Frost, which only inserts >> the file when it receives a request for it (a must for sharing a large >> amount of data). Suppose that just enough blocks have fallen to bitrot >> that the file cannot be reassembled anymore; getting just a single block >> reinserted might be enough. > > I dispute that inserting the file when you receive a request is "a must".
It is a must, because otherwise the only way to keep a not-very-popular file available is to keep periodically reinserting it. This wastes network resources and local bandwith, and might very well help push other content out of the network, causing a vicious cycle, since the authors of that content will then need to reinsert more often to keep their content available. Freenet is a combination of cache and transport system. In the long term, a file can be reached through Freenet only if it is either extremely popular or if it is periodically reinserted from the backing store (hard drives space outside the datastore). Insert-on-demand is an absolutely vital for such a scheme to work well; without it the reinserts will end up flushing each other out of the network, which leads to decreasing reinsert interval, which leads to more bitrot, which leads to decreasing reinsert interval and so on. And of course freesites and other content that can't really be reinserted on request due to the latencies involved gets flushed out too. So, basically, insert-on-request is vital for Freenet for it to function under any significant load. >> In current system such a situation is easy to handle. You can simply ask >> the inserter for the specific blocks. In the new Freenet, however, >> blocks are hidden, so the retriever doesn't know which blocks failed, >> and the inserter has no way of inserting just them. This means that he >> has to reinsert the entire multi-gigabyte DVD image, which is a huge >> waste of resources. >> >> Now, this could be solved by simply allowing access to the underlaying >> block system, but that is needlessly complex and might lead to problems >> if the block size or some other aspect of the system ever changes. >> Instead, I'm suggesting that the insert request can specify the range of >> bytes to insert; that is, when inserting the multi-gigabyte file, I can >> specify that I only want to insert bytes form offset to offset2 (and of >> course I should be able to specify multiple ranges). The retriever >> should similarly get information of what byte ranges failed. Checkblocks >> could be assigned to a logical range after the actual file data. > > Hmmm. You would of course end up reinserting an entire segment. Why ? If the splitfile code is deterministic, you get the exact same blocks from the same file. You know that blocks n, m and o failed, and the rest presumably succeeded, so why should you resinsert the rest of the blocks in the segment ? I can understand that some checkblock algorithm might need to recalculate each block at once, but you still don't need to *insert* them all. >> The good sides of this idea are that it should be trivial to implement >> (just don't insert the blocks that are completely outside all ranges) >> and would allow inserting just the missing blocks without programs even >> needing to know that Freenet uses block, much less any details of the >> implementation. >> >> Comments ? > > Not vital at present IMHO. We'll see in future. For the reasons stated above, I disagree with you. Please also note that this kind of thing is not something that can simply be added later. It needs support from insert and request tools; failing to include it until Freenet becomes popular and load goes up means that the popular tools will lack support for this feature - they can't support a feature that didn't exist at the time they were made, and it will take a long time to get everyone to upgrade to new versions. Better add this feature now, so that the support will be in the tools when it will be needed. It should also be noted that including this support now will stop any tool from implementing its own segmentation. Filesizes are growing all the time, so tool authors will need to include this feature, whether it is officially supported or not. Everyones life will be easier if it is.
