>> - For each (to be) committed file, create a file named
>>   REPOSROOT/MD5/x/x/x/x/xxxxxx... (where the x stand for
>>   the MD5-digits) (could be SHA1, too, of course), and
>>   do a copy from there into the "correct" location.
>>   Adv: repos-wide sharing of identical[1] files.
>>   Disadv: possible collisions
>
> Could be avoided by having a MD5 checksum AND a SHA1 checksum.
> Hitting collisions of two algorithms at the same time should be virtually
> impossible. If not really impossible.
Yes, or we take a SHA-256/SHA-512 or whatever. Doesn't matter.

>>     have to configure MD5 root
>>     history of files very complicated[2]
>> - subversion issue 2286.
>>   http://subversion.tigris.org/issues/show_bug.cgi?id=2286
>>   Here the repository has the work to consolidate identical files.
>>   Adv: client doesn't have to change.
>
> I would rather have a deliberate change in the client, because
> sometimes I might want to copy&delete a file (and drop the history)
> or copy a file with history, making a copy of the history, and
> sometimes I want to change the name/path (and keep the history).
> I think this should be considered from the history-viewpoint.
> If the history of a certain file could be kept or dropped at will when
> moving/renaming, then no problem. The same goes for properties
> and whatnot.
That's what I want, too. The logical "paths" (history) of the files are
not connected, only the storage "below" is shared.

>> - Use a special property "fsvs:hardlink" =>
>> "/repospath/to/file:revision"
>>   On update fetch the mentioned file.
>>   Adv: can be done entirely in fsvs
>>   Disadv: incompatible with everything else
>>     History mangled - eg. File is identical, gets a change (results in
>>       local copy with delta), gets identical to some other.
>>       Then a "diff" is meaningless.
> I think that's dangerous. One cannot trust the users... I know myself
> ;-P
I don't think it's the right solution, either.


>> To be honest, I'd prefer the solution with issue 2286 ...
>> It feels the simplest and cleanest to me.
>
> "When using branches it often happens that identical changes are
> done to copied files; this results in wasted storage space.
>
> Using eg. the MD5-hash as an index it should be possible to find such
> duplicates and, instead of storing a new delta or even fulltext, just
> saving the other "inode" in the repository (simplest case is [filename,
> revision], better some internal pointer for speed reasons)."
>
> If that applies to the storage of file data, minus history and minus
> porperties, then fine. Otherwise a problem might arise when two
> files are or become identical but history and/or properties are not.
As I tried to express above - the history and properties are not
connected, only the storage space should be shared.


>> Ad 1: as far as MD5 or SHA1 say they are identical.
>> Maybe we should take SHA-256, which seems to be better.
>> (There's a way to generate files with identical MD5).
>>
>> Ad 2: Would we copy the MD5/.../file to a new name, change it there, and
>> copy it to the existing location? Then the previous and current data
>> don't
>> share history, as they're not directly connected.
>> Or otherwise, if we'd change the "local" location, and copy to MD5/...,
>> then every identical file copied from there as a bad history ...
>
> It would be great if both would be possible. The first method covers
> files that become identical "by chance" and don't share the history (I
> think, it should be the default behaviour) and the second case covers the
> renaming. But telling the two cases apart requires IMHO different
> commands for these operations. Hmm... it must be strictly avoided
> that the user can have access to two "different" files with the same
> content and the same history, I mean where the paths/names are
> different but the contents are equal. This can happen and those
> files must not share the history. So I think, the command that does
> "rename with history" must either do a true rename within the database
> or imply that the old file is removed.
Did my previous sentences clear that up a bit? If not, I'll try to express
that with a picture.


>> Ad 3: There's a thread "Stage 1 of true rename support"
>> http://marc.theaimsgroup.com/?t=111661193300004&r=1&w=2 starting
>> 2005-05-20, where I did mentioned issue 2286, and where it seemed to fit
>> (at least IMO). I did get no answer to that, though.
>> I don't know the current status of that.
>
> What if a whole directory is renamed?
>
> "------- Additional comments from Peter ... Wed May ... -------
>
> PUtting in unscheduled."
>
> What could "PUtting in unscheduled." possibly mean?!? This was a comment
> here: http://subversion.tigris.org/issues/show_bug.cgi?id=2286
They're waiting for patches?


> What if a whole directory is renamed?
Currently: The directory is checked in as new, with all files as new.
With shared space in the repository we'd just have duplicated space for
the directory tree, the files should not need anything (as the storage
space is shared).

Later: fsvs could detect that eg. the inode is the same and the contents
are mostly the same, deduce a rename, and send a rename (or a copy/delete)
to the repository ... But that's not even on my TODO currently.


So please let me repeat myself :-)
Any other ideas?
If not, I'll let that sink in a bit (about a week), and then maybe try to
implement issue 2286.


Regards,

Phil


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to