[Tech] CHK with filename

toad Fri, 27 Oct 2006 12:38:15 +0100

On Fri, Oct 27, 2006 at 08:37:56AM +0200, bbackde at googlemail.com wrote:
> I heard that you want to make "CHK with filename" mandatory. What does
> this mean ...


It means that IF you specify a filename when you insert a CHK, that
filename becomes a necessary part of the URI. If you don't, it doesn't.
> 
> I have to insert a file with URI=CHK@ and TargetFilename=abc. Then I
> must request this file with URI=CHK at bla/abc? the CHK key itself is not
> enough?
> 
> The same file, but with different names, get another CHK key?

Yes. Although beyond the top splitfile level they will share all blocks.
> 
> What is the sense of this? You said it would be easier to find the
> file in store, but why is this easier, the CHK is already unique? The
> filename just adds another abstraction layer and you have to deal with
> files of same name, but different content. Because of this you would
> always have to use key+filename for lookups, why is'nt the key itself
> enough for this?

The CHK is not unique. Here is the problem:

CHK at blah,blah,blah - is invalid, or is a simple key
CHK at blah,blah,blah/something - could be a simple key, or could be "fetch
CHK at blah,blah,blah then look up the file called something in the
manifest"
CHK at blah,blah,blah/something/else - could be
a) a simple key (CHK at blah,blah,blah)
b) a single container lookup (lookup something in CHK at blah,blah,blah)
c) a double container lookup (lookup something in CHK at blah,blah,blah
then else in the returned manifest)

You see the problem? A slash delimits a directory, in virtually all URI
schemes. You should not be able to add arbitrary subdirectories to a URI
while still returning the original file. Being able to do so means that
you cannot compare two URIs with any level of confidence, as well as
being counter-intuitive. Also, there is additional ambiguity if we
support implicit containers (something.zip/filename-in-zip).

Thus, there is a two-stage solution:

1. If the user specifies a filename, insert the file as a single-file
manifest so that the filename is required to fetch the file.
2. Stop accepting superfluous path components.

Then we have no ambiguity. A slash always indicates a manifest lookup -
or a part of an SSK or USK url, which is much the same thing for our
purposes (an SSK always has one path component before the manifests, a
USK always has two).
> 
> IMHO this concept leads to problems. 

The current situation leads to problems.

> I know that same files with
> different names only have another key because the manifest is
> different, the (hidden) datablocks itself have the same key. But how
> should applications for file sharing and insert-on-demand work with
> this concept? 

Having read my explanation above, you really think the current (well,
older) situation is better?

> We share the file with a unique SHA identifier. If 2
> users have the same file, but with different names, this works well.
> Now if there are the same files, but with different CHK keys because
> the filename is different, the application would have to maintain a
> list of all known CHK keys for a file with same SHA checksum, and it
> would have to try to download one key after the other until a download
> is successful.

Or you could just not use a filename.
> 
> As you see, this concept adds more complexity to FCP2, with a
> questionable benefit. 

It reduces ambiguity and complexity by making keys behave the same way
as any other URI for any other protocol.

Really, what is the alternative? As far as I can see these are our
options:

1. Make CHKs not have filenames at all.
PRO: Easy, unambiguous
CON: Not exactly user-friendly

2. CHKs may have any filename, we ignore it.

The first path component is always ignored.
PRO: Easy, unambiguous
CON: Breaks existing CHK freesites
CON: Counterintuitive: part of URI can be tampered with

3. If a filename is specified on insert, it is enforced.
(Currently working towards this)

PRO: Easy, unambiguous
CON: See above

4. Make CHKs have optional filenames.

PRO: Makes Frost work
CON: Ambiguous: any number of bogus pathname components can be appended
to a URI and it still work

Unless you have any better ideas?

Now, I would be willing to include an enforced checksum in a key, if
that helps you. I might even be willing to always insert a separate
block for the data itself and the MIME type, and then have a redirect to
this to add on the filename. However this would reduce performance by
introducing an extra block fetch. So I'm not sure it's a good idea.

Could you explain the usage scenario here in a bit more detail?

> Simple applications that count on this concept
> would have problems later if there are alot of files from many users
> floating around... they should respect that only the CHK key is the
> URI for a file. This worked great on 0.5, everyone understand this and
> its easier for all...

Everyone who has several years of experience with Freenet understood
this. Nobody else did. URIs should behave like URIs. I'd rather have
option 1 than option 4!
> 
> Please don't make this mandatory.

Why should Freenet URIs behave so radically differently to any other
kind of URI in the known universe?
> 
> rgds, bback.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20061027/fcb2b416/attachment.pgp>

[Tech] CHK with filename

Reply via email to