On Thursday, 11. May 2006 07:23, you wrote: > >> Disadv: possible collisions > > > > Could be avoided by having a MD5 checksum AND a SHA1 checksum. > > Hitting collisions of two algorithms at the same time should be virtually > > impossible. If not really impossible. > Yes, or we take a SHA-256/SHA-512 or whatever. Doesn't matter.
My idea was that 2 different hashing algorithms would produce different results for a change - so if one algorithm produces the same checksum after a change, the other would produce something completely different to its previous value. To hack such a construction, you need to find a change that creates a collision for both algorithms at the same time. Of course, this is quite expensive, but the repository could do these checks while being idle. I just thought if it would be enough to have the size of the file as security add-on. Would be much cheaper. But also a lot weaker than a 2nd hashing algorithm, as there's no guarantee that 2 files of the same length but different content could never lead to the same hash value... It's merely balancing between the probability of having a hash collision and how bad a hash collision could turn out. (Extremely bad, IMHO.) > ... - the history and properties are not > connected, only the storage space should be shared. > > > >> Ad 1: as far as MD5 or SHA1 say they are identical. > >> Maybe we should take SHA-256, which seems to be better. > >> (There's a way to generate files with identical MD5). > >> > >> Ad 2: Would we copy the MD5/.../file to a new name, change it there, > >> and copy it to the existing location? Then the previous and current data > >> don't share history, as they're not directly connected. > >> Or otherwise, if we'd change the "local" location, and copy to MD5/..., > >> then every identical file copied from there as a bad history ... > > > > It would be great if both would be possible. The first method covers > > files that become identical "by chance" and don't share the history (I > > think, it should be the default behaviour) and the second case covers > > the renaming. But telling the two cases apart requires IMHO different > > commands for these operations. Hmm... it must be strictly avoided > > that the user can have access to two "different" files with the same > > content and the same history, I mean where the paths/names are > > different but the contents are equal. This can happen and those > > files must not share the history. So I think, the command that does > > "rename with history" must either do a true rename within the database > > or imply that the old file is removed. > Did my previous sentences clear that up a bit? If not, I'll try to express > that with a picture. Not necessary :-) I think it's clear, thanks! > >> Ad 3: There's a thread "Stage 1 of true rename support" > >> http://marc.theaimsgroup.com/?t=111661193300004&r=1&w=2 starting > >> 2005-05-20, where I did mentioned issue 2286, and where it seemed to fit > >> (at least IMO). I did get no answer to that, though. > >> I don't know the current status of that. > > > > What if a whole directory is renamed? > > > > "------- Additional comments from Peter ... Wed May ... ------- > > > > PUtting in unscheduled." > > > > What could "PUtting in unscheduled." possibly mean?!? This was a comment > > here: http://subversion.tigris.org/issues/show_bug.cgi?id=2286 > They're waiting for patches? Aha :-) > > What if a whole directory is renamed? > Currently: The directory is checked in as new, with all files as new. > With shared space in the repository we'd just have duplicated space for > the directory tree, the files should not need anything (as the storage > space is shared). Sounds good enough, spacewise. Directories should not take much space. > Later: fsvs could detect that eg. the inode is the same and the contents > are mostly the same, deduce a rename, and send a rename (or a copy/delete) > to the repository ... But that's not even on my TODO currently. Can fsvs see that the inode is the same? Then it should be easy. If fsfv needs to deduce a rename, this would cause a lot of internal work. IMHO, the cheapest thing for later would be: have a special renaming command in the client. > So please let me repeat myself :-) > Any other ideas? "Eg. for a file with MD5 of 8a04f87ad04f4a1d3c7e6ca12e07290d repository/ ... db/ ... md5index/ 8a/ 04/ f87a.index " Hmm, how about: repository/ ... db/ ... md5index/ 8a/ 04/ 8a04f87ad04f4a1d3c7e6ca12e07290d ? I mean a file where the name is the full hash value. Advantages: - No need to open and read a file when you look for a specific hash value. Just try a stat() on the value you know, from a caculation you just did. - No need to add checksums to an index file (read-modify-write) The "Con: FSFS cannot be append-only; the indizes have to be written and re-written." wouldn't be a problem anymore :-) Disadvantage: - One hash-file for every stored real file. That might become a huge. > If not, I'll let that sink in a bit (about a week), and then maybe try to > implement issue 2286. In any way, you have much more insight than me. Cheers Dirk --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
