Interesting background, indeed. Its as in 0.5 but with multiple levels of manifests... So its not needed for ~50% of the files to insert an extra manifest with only the mime type/filename, and for this files there would be an overhead. Agreed.
On 10/27/06, toad <toad at amphibian.dyndns.org> wrote: > Here's an important detail: > > Creating a splitfile: > > We have 700MB of data. We encode it to 1050MB. We insert these blocks. > > The metadata is maybe 30MB. We encode this to 45MB, and insert that. > > Then the metadata for the metadata is maybe 500kB. We encode this to > 750kB and insert it. > > Then the metadata for the metadata for the metadata is say 53kB before > compression, 31kB after. If we have a MIME type it can be included at > this stage. > > However, if we have a filename, we must insert an additional redirect > block, because the top level metadata is >32kB before compression. This > is an arbitrary limit introduced to prevent huge manifests; manifests > should normally consist of a bunch of filenames, each mapped to either > a redirect or a small splitfile. If there is more than 32kB of metadata, > then we insert it as a separate block. > > So the end result is that if the top level metadata (which by definition > fits in a block) is more than 32kB before compression, we indirect it > anyway. But if it isn't, we don't, because it fits in the manifest. > (This is an important optimisation IMHO in the case of normal manifests > pointing to multiple medium sized files big enough not to fit in > containers). > > Some useful background for our discussion? > > On Fri, Oct 27, 2006 at 03:13:07PM +0100, toad wrote: > > On Fri, Oct 27, 2006 at 03:55:16PM +0200, bbackde at googlemail.com wrote: > > > >Why would the key be different? As long as he uses the same MIME type > > > >and the same filename, it should be the same. > > > > > > I wrote he changed the filename because he wanted another name. If he > > > forgets the old name, or if he does not use the old name explicitely, > > > then he will get a different key. > > > This is a user error, I know. If he makes no mistake the key is the > > > same. But this is also not exactly user friendly ;) > > > > > > But nevertheless, no problem for me. And I know what you meant with > > > the checksum, you could do it, and it would be nice to use SHA-256, as > > > Frost does *g*. > > > > Of course. It's been suggested for ease of integration with other > > systems that we provide one or more enforced checksums in the top-level > > metadata. If we can spare the CPU cycles we could provide the full range > > - SHA-256, SHA-1, MD5. But realistically probably just SHA-256, which > > hopefully others will use anyway. This isn't entirely trivial to > > implement; how important is it? > > > > > If you want to implement it please tell me about this. The reason is > > > that the new Frost will rely on the fact that the same file creates > > > the same CHK key. I assume if you add a checksum to metadata the CHK > > > key changes... and yes I know its alpha and everything could change > > > later, but then I will have to reset the filebase for Frost and start > > > with a new filelist. I think also thaw and freemule would have a hit > > > when the CHK keys for same content change. > > > > Well ... the same data might yield a different key, but unless the top > > level manifest has disappeared, it would still propagate the original > > insert. I doubt it would affect Thaw. As a general rule, metadata > > changes will always be back-compatible, and are likely to be infrequent, > > but they may happen. > > > > > > You mentioned one thing that sounds really good to me. If it would be > > > possible to have one CHK key for the plain complete binary file (octet > > > stream, no mime type, no filename) and multiple different CHKs that > > > transport additional informations (filename,mime), this would be > > > really great. This offers the most flexiblity, allows to compare files > > > if content is the same and more. If the only disadvantage is that you > > > have to insert 1 additional metadata CHK, then please do it. For 100+ > > > MB files this makes no difference... > > > What do you think? Could this be done? > > > > And for small files, it makes a huge difference. This requires an > > arbitrary threshold in the node. How do we determine one? > > > > I don't see why MIME should vary from inserter to inserter really. Only > > broken clients use application/octet-stream. :) > > > > You can force it to work this way if you want by inserting a CHK with no > > filename, and then inserting a redirect to it. But you wouldn't at > > present be able to ask the node for the redirect target. > > > > > > On 10/27/06, toad <toad at amphibian.dyndns.org> wrote: > > > >On Fri, Oct 27, 2006 at 02:51:25PM +0200, bbackde at googlemail.com > > > >wrote: > > > >> I understand your points. > > > >> > > > >> I was not on the techlist before 2 months, thats why I ask now. > > > > > > > >Ok. > > > >> > > > >> After reading your text I think Frost has no hit if this behaviour is > > > >> changed. Frost will just provide an empty "TargetFilename=" and it > > > >> will never use a filename for requests or inserts, Frost will only > > > >> request the pure CHK@ key. > > > >> As I understood this continues to work, an application can still use > > > >> an empty TargetFilename??? And this will work forever, true? > > > > > > > >Sure. > > > >> > > > >> As long as this works as is nice for me... > > > >> > > > >> Thanks for your clarifications. > > > >> > > > >> >Why should Freenet URIs behave so radically differently to any other > > > >> >kind of URI in the known universe? > > > >> > > > >> imho this was a big advantage of freenet. In freenet all is free and > > > >> you can't trust noone. But if someone gained some trust e.g. on Frost > > > >> or over a freesite, and this person provides a CHK key, everyone > > > >> always knows that the data behind this key are ok (theoretically). > > > >> Now, if there are different keys for the same file, this does not work > > > >> any longer. Sample: a trusted person uploads a CHK with a filename. > > > >> Someone downloads the file and archives it, but he renamed it. After > > > >> months the original trusted uploader is wasted, and the downloader > > > >> reinserts the file because someone asked for it. If he use thaw (for > > > >> example), then the file gets another CHK key because of the different > > > >> filename. Then the reinserter provides the key to the public, but > > > >> noone can verify that the reinserted file is really the same file as > > > >> uploaded by the trusted person... > > > > > > > >Why would the key be different? As long as he uses the same MIME type > > > >and the same filename, it should be the same. > > > > > > > >> Maybe some checksum would help, but I don't know how and where to add > > > >it... > > > > > > > >I can add a checksum to the metadata, that's what I was suggesting. > > > >> > > > >> On 10/27/06, toad <toad at amphibian.dyndns.org> wrote: > > > >> >On Fri, Oct 27, 2006 at 08:37:56AM +0200, bbackde at googlemail.com > > > >> >wrote: > > > >> >> I heard that you want to make "CHK with filename" mandatory. What > > > >> >> does > > > >> >> this mean ... > > > >> > > > > >> >It means that IF you specify a filename when you insert a CHK, that > > > >> >filename becomes a necessary part of the URI. If you don't, it > > > >> >doesn't. > > > >> >> > > > >> >> I have to insert a file with URI=CHK@ and TargetFilename=abc. Then I > > > >> >> must request this file with URI=CHK at bla/abc? the CHK key itself > > > >> >> is not > > > >> >> enough? > > > >> >> > > > >> >> The same file, but with different names, get another CHK key? > > > >> > > > > >> >Yes. Although beyond the top splitfile level they will share all > > > >> >blocks. > > > >> >> > > > >> >> What is the sense of this? You said it would be easier to find the > > > >> >> file in store, but why is this easier, the CHK is already unique? > > > >> >> The > > > >> >> filename just adds another abstraction layer and you have to deal > > > >> >> with > > > >> >> files of same name, but different content. Because of this you would > > > >> >> always have to use key+filename for lookups, why is'nt the key > > > >> >> itself > > > >> >> enough for this? > > > >> > > > > >> >The CHK is not unique. Here is the problem: > > > >> > > > > >> >CHK at blah,blah,blah - is invalid, or is a simple key > > > >> >CHK at blah,blah,blah/something - could be a simple key, or could be > > > >> >"fetch > > > >> >CHK at blah,blah,blah then look up the file called something in the > > > >> >manifest" > > > >> >CHK at blah,blah,blah/something/else - could be > > > >> >a) a simple key (CHK at blah,blah,blah) > > > >> >b) a single container lookup (lookup something in CHK at > > > >> >blah,blah,blah) > > > >> >c) a double container lookup (lookup something in CHK at > > > >> >blah,blah,blah > > > >> >then else in the returned manifest) > > > >> > > > > >> >You see the problem? A slash delimits a directory, in virtually all > > > >> >URI > > > >> >schemes. You should not be able to add arbitrary subdirectories to a > > > >> >URI > > > >> >while still returning the original file. Being able to do so means > > > >> >that > > > >> >you cannot compare two URIs with any level of confidence, as well as > > > >> >being counter-intuitive. Also, there is additional ambiguity if we > > > >> >support implicit containers (something.zip/filename-in-zip). > > > >> > > > > >> >Thus, there is a two-stage solution: > > > >> > > > > >> >1. If the user specifies a filename, insert the file as a single-file > > > >> >manifest so that the filename is required to fetch the file. > > > >> >2. Stop accepting superfluous path components. > > > >> > > > > >> >Then we have no ambiguity. A slash always indicates a manifest lookup > > > >> >- > > > >> >or a part of an SSK or USK url, which is much the same thing for our > > > >> >purposes (an SSK always has one path component before the manifests, a > > > >> >USK always has two). > > > >> >> > > > >> >> IMHO this concept leads to problems. > > > >> > > > > >> >The current situation leads to problems. > > > >> > > > > >> >> I know that same files with > > > >> >> different names only have another key because the manifest is > > > >> >> different, the (hidden) datablocks itself have the same key. But how > > > >> >> should applications for file sharing and insert-on-demand work with > > > >> >> this concept? > > > >> > > > > >> >Having read my explanation above, you really think the current (well, > > > >> >older) situation is better? > > > >> > > > > >> >> We share the file with a unique SHA identifier. If 2 > > > >> >> users have the same file, but with different names, this works well. > > > >> >> Now if there are the same files, but with different CHK keys because > > > >> >> the filename is different, the application would have to maintain a > > > >> >> list of all known CHK keys for a file with same SHA checksum, and it > > > >> >> would have to try to download one key after the other until a > > > >> >> download > > > >> >> is successful. > > > >> > > > > >> >Or you could just not use a filename. > > > >> >> > > > >> >> As you see, this concept adds more complexity to FCP2, with a > > > >> >> questionable benefit. > > > >> > > > > >> >It reduces ambiguity and complexity by making keys behave the same way > > > >> >as any other URI for any other protocol. > > > >> > > > > >> >Really, what is the alternative? As far as I can see these are our > > > >> >options: > > > >> > > > > >> >1. Make CHKs not have filenames at all. > > > >> >PRO: Easy, unambiguous > > > >> >CON: Not exactly user-friendly > > > >> > > > > >> >2. CHKs may have any filename, we ignore it. > > > >> > > > > >> >The first path component is always ignored. > > > >> >PRO: Easy, unambiguous > > > >> >CON: Breaks existing CHK freesites > > > >> >CON: Counterintuitive: part of URI can be tampered with > > > >> > > > > >> >3. If a filename is specified on insert, it is enforced. > > > >> >(Currently working towards this) > > > >> > > > > >> >PRO: Easy, unambiguous > > > >> >CON: See above > > > >> > > > > >> >4. Make CHKs have optional filenames. > > > >> > > > > >> >PRO: Makes Frost work > > > >> >CON: Ambiguous: any number of bogus pathname components can be > > > >> >appended > > > >> >to a URI and it still work > > > >> > > > > >> >Unless you have any better ideas? > > > >> > > > > >> >Now, I would be willing to include an enforced checksum in a key, if > > > >> >that helps you. I might even be willing to always insert a separate > > > >> >block for the data itself and the MIME type, and then have a redirect > > > >> >to > > > >> >this to add on the filename. However this would reduce performance by > > > >> >introducing an extra block fetch. So I'm not sure it's a good idea. > > > >> > > > > >> >Could you explain the usage scenario here in a bit more detail? > > > >> > > > > >> >> Simple applications that count on this concept > > > >> >> would have problems later if there are alot of files from many users > > > >> >> floating around... they should respect that only the CHK key is the > > > >> >> URI for a file. This worked great on 0.5, everyone understand this > > > >> >> and > > > >> >> its easier for all... > > > >> > > > > >> >Everyone who has several years of experience with Freenet understood > > > >> >this. Nobody else did. URIs should behave like URIs. I'd rather have > > > >> >option 1 than option 4! > > > >> >> > > > >> >> Please don't make this mandatory. > > > >> > > > > >> >Why should Freenet URIs behave so radically differently to any other > > > >> >kind of URI in the known universe? > > > >> >> > > > >> >> rgds, bback. > > > >> > > > > >> > > > > >> >-----BEGIN PGP SIGNATURE----- > > > >> >Version: GnuPG v1.4.5 (GNU/Linux) > > > >> > > > > >> >iD8DBQFFQe+nA9rUluQ9pFARAtHdAJ0VnNXyk/SVOpqI1NvjNNSaTPWkYwCeNZUi > > > >> >baZto3+Gpm92nqZpUQ2rGQE= > > > >> >=+aYc > > > >> >-----END PGP SIGNATURE----- > > > >> > > > > >> > > > > >> >_______________________________________________ > > > >> >Tech mailing list > > > >> >Tech at freenetproject.org > > > >> >http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech > > > >> > > > > >> > > > > >> _______________________________________________ > > > >> Tech mailing list > > > >> Tech at freenetproject.org > > > >> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech > > > >> > > > > > > > > > > > >-----BEGIN PGP SIGNATURE----- > > > >Version: GnuPG v1.4.5 (GNU/Linux) > > > > > > > >iD8DBQFFQgx9A9rUluQ9pFARAivWAKCq1BU/AJ6DdS/RE7gwsDR0PPd/9gCfVmha > > > >WcN6y9iqMi0RcLs/EoBodHE= > > > >=T65Q > > > >-----END PGP SIGNATURE----- > > > > > > > > > > > >_______________________________________________ > > > >Tech mailing list > > > >Tech at freenetproject.org > > > >http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech > > > > > > > > > > > _______________________________________________ > > > Tech mailing list > > > Tech at freenetproject.org > > > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech > > > > > > > > _______________________________________________ > > Tech mailing list > > Tech at freenetproject.org > > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.5 (GNU/Linux) > > iD8DBQFFQhZwA9rUluQ9pFARAj6aAJ97Psd8RYTAGqdJEUtBYQxghhUSoACfRq6o > d8mdepuktRVVgcpwVv0X/pw= > =9sw9 > -----END PGP SIGNATURE----- > > > _______________________________________________ > Tech mailing list > Tech at freenetproject.org > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech > >