On Fri, Jan 24, 2003 at 12:01:20AM +0000, Matthew Toseland wrote: > On Thu, Jan 23, 2003 at 11:50:57PM +0000, Gordan Bobic wrote: > > On Thursday 23 Jan 2003 5:18 pm, Matthew Toseland wrote: > > > > > > Are files never separated into segments, unless FEC is used? What are the > > > > minimum and maximum sizes for FEC segments? > > > > > > It is possible to insert non-redundant splitfiles. They are unreliable > > > and slow. FEC splitfiles use "chunks" (segments are something else :)). > > > > OK, terminology noted. :-) > > > > > Fproxy uses 256kB to 1MB chunks, but other clients could use other > > > sizes. That is however the recommended range for most uses. > > > > ... > > > > > Splitfiles therefore can fail if too many of the chunks are no longer > > > fetchable. > > > > Is it possible to use smaller chunks? Can you give me a link to a document > > that explains how to control the use of FEC via fproxy? For example, can I > > force the use of FEC for files smaller than 1 MB? > No. Your application would not use Fproxy anyway, it would probably use > a library to talk directly to the node using FCP. What language are you > considering? > > > > > > > However if you need to store > > > > > chunks of more than a meg, you need to use redundant (FEC) splitfiles. > > > > > > > > As I said, I was looking for the limits on the small size, rather than > > > > the large size. Now I know not to go much below 1 KB because it is > > > > pointless. I doubt I'd ever need to use anything even remotely > > > > approaching 1 MB for my application. I was thinking about using a size > > > > between 1 KB and 4 KB, but wasn't sure if the minimum block size might > > > > have been something quite a bit larger, like 64 KB. > > > > > > Well... I don't know. You gain performance from downloading many files > > > at once (don't go completely over the top though... the splitfile > > > downloader uses around 10) > > > > Doesn't this depend entirely on the limits on the number of threads and > > concurrent connections, as set in the configuration file? And the hardware > > and network resources of course. > It depends on lots of things. The operating system, the memory limit set > for the process by the command line arguments to the java VM, and of > course the hardware. The current freenet node has fairly limited > performance due to not using nonblocking I/O, for example. Also, some > operating systems limit the number of file descriptors that can be open > at once... > > > > > - but it means you insert more data, and you > > > have to decode it; it's not designed for such small chunks, but we know > > > it can work with sizes close to that from work on streaming... The > > > overheads on a 1kB CHK are significant (something like 200 bytes?), I'd > > > use 4kB chunks, at least... > > > > Is this overhead included in the amount of space consumed? i.e. does this mean > > that 1 KB file + 200 bytes of overhead => 2KB of storage? Or is the overhead > > completely separate? > Overhead is separate. The actual data content of the file is rounded up > to the next power of 2.
Oh, one thing. Data content includes metadata. > > > > Is the overhead of 200 bytes fixed for all file sizes, or does it vary with > > the file size? > No, it varies depending on (mostly) the file type. > > > > > > > > The reason for this is that I am trying to design a database > > > > > > application that uses Frenet as the storage medium (yes, I know about > > > > > > FreeSQL, and it doesn't do what I want in the way I want it done). > > > > > > Files going missing are an obvious problem that needs to be tackled. > > > > > > I'd like to know what the block size is in order to implement > > > > > > redundancy padding in the data by exploiting the overheads produced > > > > > > by the block size, when a single item of data is smaller than the > > > > > > block that contains it. > > > > > > You do know that Freenet is lossy, right? Content which is not accessed > > > very much will eventually expire. > > > > Yes, this is why I am thinking about using DBR. There would potentially be a > > number of nodes that would once per day retrieve the data, compact it into > > bigger files, and re-insert it for the next day. This would be equivalent to > > vacuum (PostgreSQL) and optimize (MySQL) commands. > > > > The daily operation data would involve inserting many small files (one file > > per record in a table, one file per delete flag, etc.) > > > > This would all be gathered, compacted, and re-inserted. Any indices would also > > get re-generated in the same way. > Hmm. Interesting. > > > > > > > > This could be optimized out in run-time to make no impact on > > > > > > execution speed (e.g. skip downloads of blocks that we can > > > > > > reconstruct from already downloaded segments). > > > > > > > > > > Hmm. Not sure I follow. > > > > > > > > A bit like a hamming code, but allowing random access. Because it is > > > > latency + download that is slow, downloading fewer files is a good > > > > thing for performance, so I can re-construct some of the pending segments > > > > rather than downloading them. Very much like FEC, in fact. :-) > > > > > > Latency is slow. Downloading many files in series is slow. Downloading > > > many files in parallel, as long as you don't get bogged down waiting for > > > the last retry on the last failing block in a non-redundant splitfile, > > > is relatively fast. By all means use your own codes! > > > > I haven't decided what to use for redundancy yet. My biggest reason for using > > my own method is that it would allow me to pad files to a minimal sensible > > size (I was thinking about 4 KB), and enable me to skip chunks that are not > > needed, or back-track to reconstruct a "hole" in the data from the files that > > are already there. > > > > But FEC is very appealing because it already does most of that, so there would > > be less work involved in the implementation of my application. > > > > > > Of course, I might not bother if I can use FEC for it instead, provided > > > > it will work with very small file sizes (question I asked above). > > > > > > Well... > > > > > > FEC divides the file into segments of up to 128 chunks (I think). > > > It then creates 64 check blocks for the 128 chunks (obviously fewer if > > > fewer original chunks), and inserts the lot, along with a file > > > specifying the CHKs of all the different chunks inserted for each > > > segment. > > > > Doesn't that mean that with maximum chunk size of 1 MB, this limits the file > > size to 128 MB? Or did I misunderstand the maximum chunk size, and it is > > purely a matter of caching as a factor of the store size? > No. After 128MB, we use more than one segment. Within each segment, we > need any 128 of the 192 chunks to reconstruct the file. > > > > What is the smallest file size with which FEC can be use sensibly? Would I be > > correct in quessing this at 2 KB, to create 2 1KB chunks with 1 1KB check > > block? > I wouldn't recommend it. > > > > Is FEC fixed at 50% redundancy, or can the amount of redundancy be controlled > > (e.g. reduced to 25%, if requested)? Or has it been tried and tested that > > around 50% gives best results? > Hmm. At the moment it is hard-coded. At some point we may change this. > The original libraries support other amounts. > > > > Thanks. > > > > Gordan > > > > -- > Matthew Toseland > [EMAIL PROTECTED][EMAIL PROTECTED] > Full time freenet hacker. > http://freenetproject.org/ > Freenet Distribution Node (temporary) at >http://amphibian.dyndns.org:8889/x-aYyDpMj2E/ > ICTHUS. -- Matthew Toseland [EMAIL PROTECTED][EMAIL PROTECTED] Full time freenet hacker. http://freenetproject.org/ Freenet Distribution Node (temporary) at ICTHUS.
msg01057/pgp00000.pgp
Description: PGP signature
