On Wednesday, 10. May 2006 12:26, Ph. Marek wrote:
> > The tricky bit will be that I'll get a copy_path on update - and I'll have
> > yet to find out what to do about them ... the source might not be
> > available, or modified ...
> As I wrote - detecting duplicate files is easy. That'll be ready soon.
> 
> How do we store that in the repository?
> The following ideas come to my mind:
> 
> - For each (to be) committed file, create a file named
>   REPOSROOT/MD5/x/x/x/x/xxxxxx... (where the x stand for
>   the MD5-digits) (could be SHA1, too, of course), and
>   do a copy from there into the "correct" location.
>   Adv: repos-wide sharing of identical[1] files.
>   Disadv: possible collisions

Could be avoided by having a MD5 checksum AND a SHA1 checksum. 
Hitting collisions of two algorithms at the same time should be virtually 
impossible. If not really impossible.

>     have to configure MD5 root
>     history of files very complicated[2]
> - subversion issue 2286.
>   http://subversion.tigris.org/issues/show_bug.cgi?id=2286
>   Here the repository has the work to consolidate identical files.
>   Adv: client doesn't have to change.

I would rather have a deliberate change in the client, because
sometimes I might want to copy&delete a file (and drop the history)
or copy a file with history, making a copy of the history, and
sometimes I want to change the name/path (and keep the history). 
I think this should be considered from the history-viewpoint.
If the history of a certain file could be kept or dropped at will when
moving/renaming, then no problem. The same goes for properties 
and whatnot.

>     Repos has the necessary informations, as it has to reconstruct
>       the fulltext either way, and can easily check for identical data.
>   Disadv: Repos has to change[3]

If the repository could take care of that, it would be a nice feature.

> - Use a special property "fsvs:hardlink" => "/repospath/to/file:revision"
>   On update fetch the mentioned file.
>   Adv: can be done entirely in fsvs
>   Disadv: incompatible with everything else
>     History mangled - eg. File is identical, gets a change (results in
>       local copy with delta), gets identical to some other.
>       Then a "diff" is meaningless.

I think that's dangerous. One cannot trust the users... I know myself
;-P

> To be honest, I'd prefer the solution with issue 2286 ...
> It feels the simplest and cleanest to me.

"When using branches it often happens that identical changes are 
done to copied files; this results in wasted storage space.

Using eg. the MD5-hash as an index it should be possible to find such 
duplicates and, instead of storing a new delta or even fulltext, just 
saving the other "inode" in the repository (simplest case is [filename,
revision], better some internal pointer for speed reasons)."

If that applies to the storage of file data, minus history and minus
porperties, then fine. Otherwise a problem might arise when two
files are or become identical but history and/or properties are not.
  
> Any other ideas?
> 
> 
> Regards,
> 
> Phil
> 
> 
> 
> Ad 1: as far as MD5 or SHA1 say they are identical.
> Maybe we should take SHA-256, which seems to be better.
> (There's a way to generate files with identical MD5).
> 
> Ad 2: Would we copy the MD5/.../file to a new name, change it there, and
> copy it to the existing location? Then the previous and current data don't
> share history, as they're not directly connected.
> Or otherwise, if we'd change the "local" location, and copy to MD5/...,
> then every identical file copied from there as a bad history ...

It would be great if both would be possible. The first method covers
files that become identical "by chance" and don't share the history (I
think, it should be the default behaviour) and the second case covers the 
renaming. But telling the two cases apart requires IMHO different 
commands for these operations. Hmm... it must be strictly avoided
that the user can have access to two "different" files with the same
content and the same history, I mean where the paths/names are
different but the contents are equal. This can happen and those
files must not share the history. So I think, the command that does
"rename with history" must either do a true rename within the database
or imply that the old file is removed.

> Ad 3: There's a thread "Stage 1 of true rename support"      
> http://marc.theaimsgroup.com/?t=111661193300004&r=1&w=2 starting
> 2005-05-20, where I did mentioned issue 2286, and where it seemed to fit
> (at least IMO). I did get no answer to that, though.
> I don't know the current status of that.

What if a whole directory is renamed? 

"------- Additional comments from Peter ... Wed May ... -------

PUtting in unscheduled."

What could "PUtting in unscheduled." possibly mean?!? This was a comment
here: http://subversion.tigris.org/issues/show_bug.cgi?id=2286

What if a whole directory is renamed? 

Cheers
Dirk

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to