FWIW, when the dust settles from discussion on the Subversion dev list, I'll be integrating this information into the docs:
http://subversion.tigris.org/merge-tracking/ On Fri, 28 Apr 2006, Blair Zajac wrote: > There's a merge tracking proposal put up by Daniel Berlin here: > > http://svn.haxx.se/dev/archive-2006-04/0916.shtml > > Blair > > -- > Blair Zajac, Ph.D. > <[EMAIL PROTECTED]> > Subversion training, consulting and support > http://www.orcaware.com/svn/ > From: Daniel Berlin <[EMAIL PROTECTED]> > To: dev <[EMAIL PROTECTED]> > Date: Fri, 28 Apr 2006 09:34:04 -0700 > Subject: Merge tracking proposal > > Among other things I am working on at Google, I have been tasked > full-time with implementing merge tracking. > > As part of this, I have come up with a design I plan on implementing for > tracking what revisions have been merged where, in a manner that is > suitable for use by various other operations (history sensitive merging, > etc). > > In doing so, I reviewed the use cases that were kindly written up, and > believe that most if not all of them can be accomplished with this > design. > > Please remember that this design is *only* for tracking what changes are > merged where. I expect this to be the easy part, compared to deciding > exactly what algorithms our history sensitive merge uses, and how it > proceeds. > > I have divided the design into four portions "Goals", "information > storage", "information updating", "other prereqs to being able to > implement the design". > > The "random questions and answers" section is there to answer common > questions other developers I've talked to while coming up with this > design have had, in the hopes that it will answer some common queries > the list may have. > > Goals: > > The overarching goal here is to track the revision numbers being merged > by a merge operation, and keeping this information in the right places > as various operations (copy, delete, add, etc) are performed. > > The goals of this design are: > 1. To be able to track this down to what files in a working copy and be > able to determine what files have had what revisions merged into them. > > 2. To not need to contact the server more than we already do now to > determine which revisions have been merged in a file or directory (ie > some contact is acceptable, asking the server about each file is not). > > 3. To be able to edit merge information in a human editable form. > > 4. For the information to be stored in a space efficient manner, and to > be able to determine the revisions merged into a given file/director in > a time efficient manner. > > 5. Still getting a conservatively correct answer (not worse than what we > have now) when no merge info is specified. > > 6. To be able to collect, transmit, and keep this information up to date > as much as possible on the client side. > > 7. To be able to index this information in the future order to answer > queries > > Specific Non-goals for *this design* include: > 1. Doing actual history sensitive merging > 2. Curing cancer (aka being all things to all people) > > When reading the design presented here, please remember that it is > impossible to get something perfect in subversion on the first try, and > attempting to nit pick this to death will not actually help anything, > but it would be very annoying. This is not to dissuade people from > suggesting design changes, but if you plan on suggesting a different > revision list format because you believe colon doesn't have a good level > of synergy with existing separators, or something, you may want to > rethink whether it really matters. > > Some pre-notes: > The one argument i continually have with myself is whether to store info > in revprops, or just on dirs and files. If you want to try to > convincingly argue one way or the other, go for it. Certainly, I think > it makes certain semantics clearer on what operations do below and how > to proceed easier, the question is whether it is efficient enough time > wise when we go to retrieve merge info, and whether it complicates what > merge has to do too much. It also removes all of the listed > pre-reqs :). > > One could also try to argue that we should start with exactly the same > cases svnmerge does (IE only allow merge info at the wc roots, only > store it on that directory, etc), with a nicer integrated interface, and > try to expand it from there. I am open to such an argument as well. :) > > Anyway, on with the design. > > > Information storage > > The first question that many people ask is "where should we store the > merge information" (what we store will be covered next). > > After a large amount of research, the design I have come up with is > this: > A merge info property, named SVN_MERGE_PROPERTY (not the real name, I > have made it a constant so we can have a large bikeshed about what to > really call it) stored in the revision properties, directory properties, > and file properties. > Each will store the *full, complete* list of current merged in changes, > as far as it knows. This ensures that the merge algorithm and other > consumers do not have to walk back revisions in order to get the > transitive closure of the revision list. > > The way we choose which of file, dir, revprop merge info to use in case > of conflicts simple system of inheritance[1] where the "most specific" > place wins. This means that if the property is set on a file, that > completely overrides the directory and revision level properties. > > The way we choose which to store to depends on how much and where you > merge, and will be covered in the semantics. > > The reasoning for this system is to avoid having to either copy info > everywhere, or crawl everywhere, in order to determine which revisions > have been applied. At the same time, we want to be space and time > efficient, so we can't just store the entire revision list everywhere. > > As for what is stored: > > For the large number of people i have talked to and heard about from > others, it seems the human editable *format* of how svnmerge stores > merge information (IE pathname and list of revisions) is fine. Binary > storage of such information would buy, on average, a 2-3 byte decrease > per revision/range in size over ascii[1], while making it not directly > human editable. > > As such, i have chosen to represent the revisions we have merged *into* > something as a path, a colon, and then a comma separated revision list, > containing one or more revision or revision ranges. Revision range end > and beginning points are separated by "-". > > So the grammar looks something like this > > revisionrange -> REVISION "-" REVISION > > revisioneelement -> revisionrange | REVISION > > revisionlist -> (revisionrange | REVISION)(COMMA revisioneelement)* > > revisionline -> PATHNAME COLON revisionlist > > top -> revisionline (NEWLINE revisionline)* > > This list will *not* be stored in a canonicalized minimal form for a > path (IE it may contain single revision numbers that could be ranges). > This is chiefly because the benefit of such a canonical format (slightly > easier *comparison*, but not indexing) is heavily outweighed by the fact > that generating a canonical form may require groveling through a lot of > information to determine what that minimal canonical form is. In > particular, it may be that the revision list "5,7,9" is, in minimal > canonical form, "5-9", because 6 and 8 do not have any affect on the > pathname that 5 and 9 are from. > Canonicalization could be done as a server side post pass because the > information is stored in properties. > > Note that this revision format will not scale on its own if you have a > list of million revisions. None will easily. However, because it is > stored in properties, one can change the wc and fs backends to simply do > something different with this single property if they wanted to. > Given the rates of change of various very active repositories, this will > not be a problem we need to solve for many many years. > > Information updating: > Each operation you can perform may update or copy the merge info > associated with a path, file, or revision. > > > svn add: No change to merge info > svn delete: No direct change to merge info (indirectly, because the > props go away, so does the merge info for the file) > svn rename: No change to merge info > svn copy: Copies the merge info from the source path to the destination > path, if any. > > This includes copying info from revprops, if necessary, by determining > if the merge info exists in a revprop for the last changed commit for > the source path, and copying it to the new revprop if it does (someone > probably needs to check if this is the right semantic :P) > > All copies are full-copies of the merge information. > > svn merge: Adds or subtracts to the merge info, according to the > following: > > Where to put the info: > 1. If the merge target is a single file, the merge info goes to the > property SVN_MERGE_INFO set on that file. > 2. If the merge target is a non-wc-root directory, the merge info goes > to the property SVN_MERGE_INFO set on the directory > 3. If the merge target is a wc-root directory, the merge info goes to > the property SVN_MERGE_INFO set on the revprop. > > What info is put: > 1. If you are merging in reverse, revisions are subtracted from the > revision lines, but we never write out anti-revisions. Thus, if you > subtract all the merged revisions, you just get an empty list, and if > you do a reverse merge from there, you still get an empty list > 2. If you are merging forward, the revision(s) you are merging is added > to the revision line in sorted order (such that all revisions and > revision ranges in the list are monotonically increasing from left to > right). The exact details of how the range is represented in terms of a > list of single revs, or a revision range, is left as a quality of > implementation detail. The only requirement is that the range be > correct. > 3. The path (known as PATHNAME in the grammar) used as the key to > determine which revision line to change is the subdirectory path being > merged from, relative to the repo root, with the repo url stripped from > it. > > Thus a merge of revisions 1-9 from http://foo.bar.com/reposroot/trunk > would produce "/trunk:1-9" > > cross-repo merging is a bridge we can cross if we ever get there :). > > > pre-reqs for this design: > > 1. Need to be able to set a revprop to be stored on commit > 2. Need to be able to say to copy a revprop from a particular revision > and only contact the server at commit time. > > 2. Need to be able to have auth treat SVN_MERGE_PROPERTY revprop > differently from other revprops (either by special casing the cases > users do care about controlling, or special casing props users don't > care about controlling, etc) so that people who don't have access to the > revprops can still do history sensitive merges of directories they do > have access to. > > > Random questions and answers > > What happens if someone commits a merge with a non-merge tracking > client? > It simply means the next time you merge, you may receive conflicts that > you would have received if you were using a non-history-sensitive > client. > > Can we do without the revprop portion of this design? > Technically yes, AFAIK, but it may require more crawling and querying at > merge time. > > Can we do history sensitive wc<->wc merges without contacting the serve? > No. But you probably couldn't anyway, even if the revprop not being > stored locally issue were not here. > > What happens if the info is not there? > The same thing that happens if the info is not there now. > > What happens if a user edits merge info incorrectly? > They get the results specified by their merge info. > > How does the revprop stay up to date? > We copy it from revision to revision. > > What happens if a user manually edits a file and unmerges a revision (IE > not using a "reverse merge" command), but doesn't update the merge info > to match? > The merge info will believe the change has still been merged. > > What happens if i svn move/rename a directory, and then merge it > somewhere? > This doesn't change history, only the future, thus we will simply add > the merge info for that directory as if it was a new directory. We will > not do something like attempt to modify all merge info to specify the > new directory, as that would be wrong. > > I don't think only that copying info on svn copy is correct, what if you > copy a dir with merge info into a dir where the dir has merge info, > won't it get the info wrong now? > > No. > > Let's say you have > > a/foo (merge info: /trunk:5-9 > a/branches/bar (merge info: /trunk:1-4) > > If you copy a/foo into a/branches/bar, we now have > > a/branches/bar (merge info: /trunk:1-4) > a/branches/bar/foo (merge info: /trunk:5-9) > > This is strictly correct. The only changes which have been merged into > a/branches/bar/foo, are still 5-9. The only changes which have been > merged into /branches/bar are 1-4. No merges have been performed by > your copy, only copies have been performed. If you perform a merge of > revisions 1-9 into bar, the results one would expect that the history > sensitive merge algorithm will skip revisions 5-9 for > a/branches/bar/foo, and skip revisions 1-4 for a/branches/bar. > The above information gives the algorithm the information necessary to > do this. > > So if you want to argue svn copy has the wrong merge info semantics, > it's not because of the above, AFAIK :) > > > I'm sure that even in this long document, I've forgotten some things i > did spec out. > Apologies in advance. > > > Footnotes: > [1] This is not going to be a full blown design for property > inheritance, nor should this design depend on such a system being > implemented. > > [2] Assuming 4 byte revision numbers, and repos with revisions numbering > in the hundreds of thousands. You could do slightly better by variable > length encoding of integers, but even that will generally be 4 bytes for > hundreds of thousands of revs. Thus, we have strings like "102341" vs 4 > byte numbers, meaning you save about 2 bytes for a 4 byte integer. > Range lists in binary would need a distinguisher from single revisions, > adding a single bit to both (meaning you'd get 31 bit integers), and > thus, would require 8 bytes per range vs 12 bytes per range. While 30% > is normally nothing to sneeze at space wise, it's also not significantly > more efficient in time, as most of the time will not be spent parsing > revision lists, but doing something with them. The space efficiency > therefore does not seem to justify the cost you pay in not making them > easily editable. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > _______________________________________________ > Svnmerge mailing list > [email protected] > http://www.orcaware.com/mailman/listinfo/svnmerge -- Daniel Rall
pgp7yjp9zBdjp.pgp
Description: PGP signature
_______________________________________________ Svnmerge mailing list [email protected] http://www.orcaware.com/mailman/listinfo/svnmerge
