Interesting background, indeed. Its as in 0.5 but with multiple levels
of manifests...
So its not needed for ~50% of the files to insert an extra manifest
with only the mime type/filename, and for this files there would be an
overhead. Agreed.

On 10/27/06, toad <toad at amphibian.dyndns.org> wrote:
> Here's an important detail:
>
> Creating a splitfile:
>
> We have 700MB of data. We encode it to 1050MB. We insert these blocks.
>
> The metadata is maybe 30MB. We encode this to 45MB, and insert that.
>
> Then the metadata for the metadata is maybe 500kB. We encode this to
> 750kB and insert it.
>
> Then the metadata for the metadata for the metadata is say 53kB before
> compression, 31kB after. If we have a MIME type it can be included at
> this stage.
>
> However, if we have a filename, we must insert an additional redirect
> block, because the top level metadata is >32kB before compression. This
> is an arbitrary limit introduced to prevent huge manifests; manifests
> should normally consist of a bunch of filenames, each mapped to either
> a redirect or a small splitfile. If there is more than 32kB of metadata,
> then we insert it as a separate block.
>
> So the end result is that if the top level metadata (which by definition
> fits in a block) is more than 32kB before compression, we indirect it
> anyway. But if it isn't, we don't, because it fits in the manifest.
> (This is an important optimisation IMHO in the case of normal manifests
> pointing to multiple medium sized files big enough not to fit in
> containers).
>
> Some useful background for our discussion?
>
> On Fri, Oct 27, 2006 at 03:13:07PM +0100, toad wrote:
> > On Fri, Oct 27, 2006 at 03:55:16PM +0200, bbackde at googlemail.com wrote:
> > > >Why would the key be different? As long as he uses the same MIME type
> > > >and the same filename, it should be the same.
> > >
> > > I wrote he changed the filename because he wanted another name. If he
> > > forgets the old name, or if he does not use the old name explicitely,
> > > then he will get a different key.
> > > This is a user error, I know. If he makes no mistake the key is the
> > > same. But this is also not exactly user friendly ;)
> > >
> > > But nevertheless, no problem for me. And I know what you meant with
> > > the checksum, you could do it, and it would be nice to use SHA-256, as
> > > Frost does *g*.
> >
> > Of course. It's been suggested for ease of integration with other
> > systems that we provide one or more enforced checksums in the top-level
> > metadata. If we can spare the CPU cycles we could provide the full range
> > - SHA-256, SHA-1, MD5. But realistically probably just SHA-256, which
> > hopefully others will use anyway. This isn't entirely trivial to
> > implement; how important is it?
> >
> > > If you want to implement it please tell me about this. The reason is
> > > that the new Frost will rely on the fact that the same file creates
> > > the same CHK key. I assume if you add a checksum to metadata the CHK
> > > key changes... and yes I know its alpha and everything could change
> > > later, but then I will have to reset the filebase for Frost and start
> > > with a new filelist. I think also thaw and freemule would have a hit
> > > when the CHK keys for same content change.
> >
> > Well ... the same data might yield a different key, but unless the top
> > level manifest has disappeared, it would still propagate the original
> > insert. I doubt it would affect Thaw. As a general rule, metadata
> > changes will always be back-compatible, and are likely to be infrequent,
> > but they may happen.
> > >
> > > You mentioned one thing that sounds really good to me. If it would be
> > > possible to have one CHK key for the plain complete binary file (octet
> > > stream, no mime type, no filename) and multiple different CHKs that
> > > transport additional informations (filename,mime), this would be
> > > really great. This offers the most flexiblity, allows to compare files
> > > if content is the same and more. If the only disadvantage is that you
> > > have to insert 1 additional metadata CHK, then please do it. For 100+
> > > MB files this makes no difference...
> > > What do you think? Could this be done?
> >
> > And for small files, it makes a huge difference. This requires an
> > arbitrary threshold in the node. How do we determine one?
> >
> > I don't see why MIME should vary from inserter to inserter really. Only
> > broken clients use application/octet-stream. :)
> >
> > You can force it to work this way if you want by inserting a CHK with no
> > filename, and then inserting a redirect to it. But you wouldn't at
> > present be able to ask the node for the redirect target.
> > >
> > > On 10/27/06, toad <toad at amphibian.dyndns.org> wrote:
> > > >On Fri, Oct 27, 2006 at 02:51:25PM +0200, bbackde at googlemail.com 
> > > >wrote:
> > > >> I understand your points.
> > > >>
> > > >> I was not on the techlist before 2 months, thats why I ask now.
> > > >
> > > >Ok.
> > > >>
> > > >> After reading your text I think Frost has no hit if this behaviour is
> > > >> changed. Frost will just provide an empty "TargetFilename=" and it
> > > >> will never use a filename for requests or inserts, Frost will only
> > > >> request the pure CHK@ key.
> > > >> As I understood this continues to work, an application can still use
> > > >> an empty TargetFilename??? And this will work forever, true?
> > > >
> > > >Sure.
> > > >>
> > > >> As long as this works as is nice for me...
> > > >>
> > > >> Thanks for your clarifications.
> > > >>
> > > >> >Why should Freenet URIs behave so radically differently to any other
> > > >> >kind of URI in the known universe?
> > > >>
> > > >> imho this was a big advantage of freenet. In freenet all is free and
> > > >> you can't trust noone. But if someone gained some trust e.g. on Frost
> > > >> or over a freesite, and this person provides a CHK key, everyone
> > > >> always knows that the data behind this key are ok (theoretically).
> > > >> Now, if there are different keys for the same file, this does not work
> > > >> any longer. Sample: a trusted person uploads a CHK with a filename.
> > > >> Someone downloads the file and archives it, but he renamed it. After
> > > >> months the original trusted uploader is wasted, and the downloader
> > > >> reinserts the file because someone asked for it. If he use thaw (for
> > > >> example), then the file gets another CHK key because of the different
> > > >> filename. Then the reinserter provides the key to the public, but
> > > >> noone can verify that the reinserted file is really the same file as
> > > >> uploaded by the trusted person...
> > > >
> > > >Why would the key be different? As long as he uses the same MIME type
> > > >and the same filename, it should be the same.
> > > >
> > > >> Maybe some checksum would help, but I don't know how and where to add
> > > >it...
> > > >
> > > >I can add a checksum to the metadata, that's what I was suggesting.
> > > >>
> > > >> On 10/27/06, toad <toad at amphibian.dyndns.org> wrote:
> > > >> >On Fri, Oct 27, 2006 at 08:37:56AM +0200, bbackde at googlemail.com 
> > > >> >wrote:
> > > >> >> I heard that you want to make "CHK with filename" mandatory. What 
> > > >> >> does
> > > >> >> this mean ...
> > > >> >
> > > >> >It means that IF you specify a filename when you insert a CHK, that
> > > >> >filename becomes a necessary part of the URI. If you don't, it 
> > > >> >doesn't.
> > > >> >>
> > > >> >> I have to insert a file with URI=CHK@ and TargetFilename=abc. Then I
> > > >> >> must request this file with URI=CHK at bla/abc? the CHK key itself 
> > > >> >> is not
> > > >> >> enough?
> > > >> >>
> > > >> >> The same file, but with different names, get another CHK key?
> > > >> >
> > > >> >Yes. Although beyond the top splitfile level they will share all 
> > > >> >blocks.
> > > >> >>
> > > >> >> What is the sense of this? You said it would be easier to find the
> > > >> >> file in store, but why is this easier, the CHK is already unique? 
> > > >> >> The
> > > >> >> filename just adds another abstraction layer and you have to deal 
> > > >> >> with
> > > >> >> files of same name, but different content. Because of this you would
> > > >> >> always have to use key+filename for lookups, why is'nt the key 
> > > >> >> itself
> > > >> >> enough for this?
> > > >> >
> > > >> >The CHK is not unique. Here is the problem:
> > > >> >
> > > >> >CHK at blah,blah,blah - is invalid, or is a simple key
> > > >> >CHK at blah,blah,blah/something - could be a simple key, or could be 
> > > >> >"fetch
> > > >> >CHK at blah,blah,blah then look up the file called something in the
> > > >> >manifest"
> > > >> >CHK at blah,blah,blah/something/else - could be
> > > >> >a) a simple key (CHK at blah,blah,blah)
> > > >> >b) a single container lookup (lookup something in CHK at 
> > > >> >blah,blah,blah)
> > > >> >c) a double container lookup (lookup something in CHK at 
> > > >> >blah,blah,blah
> > > >> >then else in the returned manifest)
> > > >> >
> > > >> >You see the problem? A slash delimits a directory, in virtually all 
> > > >> >URI
> > > >> >schemes. You should not be able to add arbitrary subdirectories to a 
> > > >> >URI
> > > >> >while still returning the original file. Being able to do so means 
> > > >> >that
> > > >> >you cannot compare two URIs with any level of confidence, as well as
> > > >> >being counter-intuitive. Also, there is additional ambiguity if we
> > > >> >support implicit containers (something.zip/filename-in-zip).
> > > >> >
> > > >> >Thus, there is a two-stage solution:
> > > >> >
> > > >> >1. If the user specifies a filename, insert the file as a single-file
> > > >> >manifest so that the filename is required to fetch the file.
> > > >> >2. Stop accepting superfluous path components.
> > > >> >
> > > >> >Then we have no ambiguity. A slash always indicates a manifest lookup 
> > > >> >-
> > > >> >or a part of an SSK or USK url, which is much the same thing for our
> > > >> >purposes (an SSK always has one path component before the manifests, a
> > > >> >USK always has two).
> > > >> >>
> > > >> >> IMHO this concept leads to problems.
> > > >> >
> > > >> >The current situation leads to problems.
> > > >> >
> > > >> >> I know that same files with
> > > >> >> different names only have another key because the manifest is
> > > >> >> different, the (hidden) datablocks itself have the same key. But how
> > > >> >> should applications for file sharing and insert-on-demand work with
> > > >> >> this concept?
> > > >> >
> > > >> >Having read my explanation above, you really think the current (well,
> > > >> >older) situation is better?
> > > >> >
> > > >> >> We share the file with a unique SHA identifier. If 2
> > > >> >> users have the same file, but with different names, this works well.
> > > >> >> Now if there are the same files, but with different CHK keys because
> > > >> >> the filename is different, the application would have to maintain a
> > > >> >> list of all known CHK keys for a file with same SHA checksum, and it
> > > >> >> would have to try to download one key after the other until a 
> > > >> >> download
> > > >> >> is successful.
> > > >> >
> > > >> >Or you could just not use a filename.
> > > >> >>
> > > >> >> As you see, this concept adds more complexity to FCP2, with a
> > > >> >> questionable benefit.
> > > >> >
> > > >> >It reduces ambiguity and complexity by making keys behave the same way
> > > >> >as any other URI for any other protocol.
> > > >> >
> > > >> >Really, what is the alternative? As far as I can see these are our
> > > >> >options:
> > > >> >
> > > >> >1. Make CHKs not have filenames at all.
> > > >> >PRO: Easy, unambiguous
> > > >> >CON: Not exactly user-friendly
> > > >> >
> > > >> >2. CHKs may have any filename, we ignore it.
> > > >> >
> > > >> >The first path component is always ignored.
> > > >> >PRO: Easy, unambiguous
> > > >> >CON: Breaks existing CHK freesites
> > > >> >CON: Counterintuitive: part of URI can be tampered with
> > > >> >
> > > >> >3. If a filename is specified on insert, it is enforced.
> > > >> >(Currently working towards this)
> > > >> >
> > > >> >PRO: Easy, unambiguous
> > > >> >CON: See above
> > > >> >
> > > >> >4. Make CHKs have optional filenames.
> > > >> >
> > > >> >PRO: Makes Frost work
> > > >> >CON: Ambiguous: any number of bogus pathname components can be 
> > > >> >appended
> > > >> >to a URI and it still work
> > > >> >
> > > >> >Unless you have any better ideas?
> > > >> >
> > > >> >Now, I would be willing to include an enforced checksum in a key, if
> > > >> >that helps you. I might even be willing to always insert a separate
> > > >> >block for the data itself and the MIME type, and then have a redirect 
> > > >> >to
> > > >> >this to add on the filename. However this would reduce performance by
> > > >> >introducing an extra block fetch. So I'm not sure it's a good idea.
> > > >> >
> > > >> >Could you explain the usage scenario here in a bit more detail?
> > > >> >
> > > >> >> Simple applications that count on this concept
> > > >> >> would have problems later if there are alot of files from many users
> > > >> >> floating around... they should respect that only the CHK key is the
> > > >> >> URI for a file. This worked great on 0.5, everyone understand this 
> > > >> >> and
> > > >> >> its easier for all...
> > > >> >
> > > >> >Everyone who has several years of experience with Freenet understood
> > > >> >this. Nobody else did. URIs should behave like URIs. I'd rather have
> > > >> >option 1 than option 4!
> > > >> >>
> > > >> >> Please don't make this mandatory.
> > > >> >
> > > >> >Why should Freenet URIs behave so radically differently to any other
> > > >> >kind of URI in the known universe?
> > > >> >>
> > > >> >> rgds, bback.
> > > >> >
> > > >> >
> > > >> >-----BEGIN PGP SIGNATURE-----
> > > >> >Version: GnuPG v1.4.5 (GNU/Linux)
> > > >> >
> > > >> >iD8DBQFFQe+nA9rUluQ9pFARAtHdAJ0VnNXyk/SVOpqI1NvjNNSaTPWkYwCeNZUi
> > > >> >baZto3+Gpm92nqZpUQ2rGQE=
> > > >> >=+aYc
> > > >> >-----END PGP SIGNATURE-----
> > > >> >
> > > >> >
> > > >> >_______________________________________________
> > > >> >Tech mailing list
> > > >> >Tech at freenetproject.org
> > > >> >http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech
> > > >> >
> > > >> >
> > > >> _______________________________________________
> > > >> Tech mailing list
> > > >> Tech at freenetproject.org
> > > >> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech
> > > >>
> > > >
> > > >
> > > >-----BEGIN PGP SIGNATURE-----
> > > >Version: GnuPG v1.4.5 (GNU/Linux)
> > > >
> > > >iD8DBQFFQgx9A9rUluQ9pFARAivWAKCq1BU/AJ6DdS/RE7gwsDR0PPd/9gCfVmha
> > > >WcN6y9iqMi0RcLs/EoBodHE=
> > > >=T65Q
> > > >-----END PGP SIGNATURE-----
> > > >
> > > >
> > > >_______________________________________________
> > > >Tech mailing list
> > > >Tech at freenetproject.org
> > > >http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech
> > > >
> > > >
> > > _______________________________________________
> > > Tech mailing list
> > > Tech at freenetproject.org
> > > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech
> > >
>
>
>
> > _______________________________________________
> > Tech mailing list
> > Tech at freenetproject.org
> > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5 (GNU/Linux)
>
> iD8DBQFFQhZwA9rUluQ9pFARAj6aAJ97Psd8RYTAGqdJEUtBYQxghhUSoACfRq6o
> d8mdepuktRVVgcpwVv0X/pw=
> =9sw9
> -----END PGP SIGNATURE-----
>
>
> _______________________________________________
> Tech mailing list
> Tech at freenetproject.org
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/tech
>
>

Reply via email to