[Tailor] patch: import tags from cvs to darcs

Aaron Kaplan Tue, 04 Oct 2005 13:34:11 -0700

Here's a patch (against VersionOne this time) to import tags from cvs
to darcs.  In cvs (but not in darcs) it's possible to apply a tag to
only a subset of the files in a project; I only export a tag from cvs
if it's been applied to all (and only) the files that are currently
alive.  A couple of caveats:


  - If you go behind tailor's back and use darcs to pull changes into
  the hybrid repository, and then tailor pulls in a tag from cvs, the
  tag will be applied to the wrong version.  But tailor already didn't
  deal well with darcs going behind its back, so this patch doesn't
  really create a new problem.

  - If a tag is applied in cvs and then you sync from darcs->cvs, with
  no cvs->darcs sync in between, then the tag will never be propagated
  to darcs.  Once again, this is a situation that tailor already
  didn't handle well--if you have people working simultaneously on
  both ends, then conflicts can arise, and tailor doesn't deal well
  with conflicts.

  - CVS allows one to move an already existing tag from one revision
  to another.  If a tag is moved in CVS after the newly-tagged
  revision has already been imported into darcs, then the change won't
  be reflected in darcs.  However, if a tag is moved to a revision
  that has not yet been imported into darcs (and this is the more
  common situation--you create a new version and immediately move the
  tag to this new version), then the tag will be properly imported.

Some code is in place for syncing tags in the other direction
(darcs->cvs) as well, but it's currently disabled because it can tag
the wrong version under certain circumstances.  I hope to fix that soon
and send another patch.

-Aaron

New patches:

[Import tags from cvs to darcs.
Aaron Kaplan <[EMAIL PROTECTED]>**20051002225015] {
hunk ./vcpx/changes.py 116
+        self.tags = other.get('tags',[])
hunk ./vcpx/cvs.py 19
-def compare_cvs_revs(rev1, rev2):
-    """Compare two CVS revision numerically, not alphabetically."""
-
-    if not rev1: rev1 = '0'
-    if not rev2: rev2 = '0'
+def normalize_cvs_rev(rev):
+    """Convert a revision string to a tuple of numbers, eliminating the
+    penultimate zero in a 'magic branch number' if there is one.
+    1.1.1.1 is converted to (1,1). """
+    if not rev: rev = '0'
hunk ./vcpx/cvs.py 27
-    rev1 = rev1.split(' ')[0]
-    rev2 = rev2.split(' ')[0]
-    r1 = [int(n) for n in rev1.split('.')]
-    r2 = [int(n) for n in rev2.split('.')]
+    rev = rev.split(' ')[0]
+
+    r = [int(n) for n in rev.split('.')]
+    # convert "magic branch numbers" like 1.2.0.2 to regular
+    # branch numbers like 1.2.2.
+    if len(r) > 2 and r[-2] == 0:
+       r = r[0:-2] + r[-1:]
+
+    if r == [1,1,1,1]:
+        r = [1,1]
+
+    return tuple(r)
+
+def compare_cvs_revs(revstr1, revstr2):
+    """Compare two CVS revision strings numerically, not alphabetically."""
hunk ./vcpx/cvs.py 43
+    r1 = normalize_cvs_rev(revstr1)
+    r2 = normalize_cvs_rev(revstr2)
+    
hunk ./vcpx/cvs.py 47
+
+def cvs_revs_same_branch(rev1, rev2):
+    """True iff the two normalized revision numbers are on the same branch."""
+
+    # Odd-length revisions are branch numbers, even-length ones
+    # are revision numbers.
+    
+    # Two branch numbers can't be on the same branch unless they're identical.
+    if len(rev1) % 2 and len(rev2) % 2:
+       return rev1 == rev2
+    
+    # Two revision numbers are on the same branch if they
+    # agree up to the last number.
+    if len(rev1) % 2 == 0 and len(rev2) % 2 == 0:
+       return rev1[0:-1] == rev2[0:-1]
+
+    # One branch number, one revision number.  If by removing the last number
+    # of one you get the other, then they're on the same branch, regardless of
+    # which is longer.  E.g. revision 1.2 is the root of the branch 1.2.2;
+    # revision 1.2.2.2 is directly on the branch 1.2.2.
+    if rev1[0:-1] == rev2:
+       return True
+    
+    if rev2[0:-1] == rev1:
+       return True
+
+    return False
+
+    
+def is_branch(rev):
+    """True iff the given (normalized) revision number is a branch number"""
+    if len(rev) % 2:
+       return True
+
+def rev2branch(rev):
+    """Return the branch on which this (normalized) revision lies"""
+    assert not is_branch(rev)
+    return rev[0:-1]
hunk ./vcpx/cvs.py 87
-def changesets_from_cvslog(log, module):
+def changesets_from_cvslog(log, module, branch, entries, since):
hunk ./vcpx/cvs.py 94
-    collected = ChangeSetCollector(log, module)
+    collected = ChangeSetCollector(log, module, branch, entries, since)
hunk ./vcpx/cvs.py 104
-                 if e.name in [n.name for n in last.entries]]):
+                 if e.name in [n.name for n in last.entries]] and
+            not last.tags):
hunk ./vcpx/cvs.py 148
-    def __init__(self, log, module):
+    def __init__(self, log, module, branch, entries, since):
hunk ./vcpx/cvs.py 164
-        self.__parseCvsLog()
+        self.__parseCvsLog(entries, since, branch)
hunk ./vcpx/cvs.py 285
-    def __parseCvsLog(self):
+    def __parseCvsLog(self, entries, since, branch):
hunk ./vcpx/cvs.py 288
+        from changes import Changeset
hunk ./vcpx/cvs.py 291
+        from datetime import timedelta
+        from time import strptime
+        from datetime import datetime
hunk ./vcpx/cvs.py 299
+        file2rev2tags = {}
+        tagcounts = {}
+        branchnum = None
hunk ./vcpx/cvs.py 314
+           while l and not l.startswith('head: '):
+               l = self.__readline()
+           assert l, "Missed 'head:' line"
+            if branch is None:
+                branchnum = normalize_cvs_rev(l[6:-1])
+                branchnum = rev2branch(branchnum)
+           
+           while l and not l == 'symbolic names:\n':
+               l = self.__readline()
+           
+           assert l, "Missed 'symbolic names:' line"
+           
+           l = self.__readline()
+            rev2tags = {}
+           while l.startswith('\t'):
+               tag,revision = l[1:-1].split(': ')
+                tagcounts[tag] = tagcounts.get(tag,0) + 1
+                revision = normalize_cvs_rev(revision)
+                rev2tags.setdefault(revision,[]).append(tag)
+                if tag == branch:
+                    branchnum = revision
+                   
+               l = self.__readline()
+               
+            # branchnum may still be None, if this file doesn't exist
+            # on the requested branch.
+
+            # filter out branch tags, and tags for revisions that are
+            # on other branches.
+            for revision in rev2tags.keys():
+                if is_branch(revision) or \
+                   not branchnum or \
+                   not cvs_revs_same_branch(revision,branchnum):
+                    del rev2tags[revision]
hunk ./vcpx/cvs.py 349
-            expected_revisions = None
-            while 1:
-                l = self.__readline()
-                if l in (self.inter_sep, self.intra_sep):
-                    break
+            file2rev2tags[entry] = rev2tags
hunk ./vcpx/cvs.py 351
-                m = revcount_regex.search(l)
-                if m is not None:
-                    expected_revisions = int(m.group(1))
+            expected_revisions = None
+           while l not in (self.inter_sep, self.intra_sep):
+               m = revcount_regex.search(l)
+               if m is not None:
+                   expected_revisions = int(m.group(1))
+               l = self.__readline()
hunk ./vcpx/cvs.py 390
+
+        # Determine the current revision of each live
+        # (i.e. non-deleted) entry.
+        state = dict(entries.fileversions())
+
+        # before stepping through changes, see if the initial state is
+        # taggable.  If so, add an initial changeset that does nothing
+        # but tag, using the date of the last revision tailor imported
+        # on its previous run.  There's no way to tell when the tag
+        # was really applied, so we don't know if it was seen on the
+        # last run or not.  Before applying the tag on the other end,
+        # we'll have to check whether it's already been applied.
+        tags = self.__applicable_tags(state, file2rev2tags, tagcounts)
+        if tags:
+            if since == None:
+                # I think this could only happen if the CVS repo was
+                # tagged before any files were added to it.  We could
+                # probably get a better date by looking at when the
+                # files were added, but who cares.
+                timestamp = datetime(1900,1,1)
+            else:
+                # "since" is a revision name read from the state file,
+                # which means it was originally generated by
+                # getGlobalCVSRevision.  The format string "%Y-%m-%d
+                # %H:%M:%S" matches the format generated by the implicit
+                # call to timestamp.__str__() in getGlobalCVSRevision.
+                y,m,d,hh,mm,ss,d1,d2,d3 = strptime(since, "%Y-%m-%d %H:%M:%S")
+                timestamp = datetime(y,m,d,hh,mm,ss)
+            author = "unknown tagger"
+            changelog = "tag %s %s" % (timestamp, tags)
+            key = (timestamp, author, changelog)
+            self.changesets[key] = \
+                   Changeset(_getGlobalCVSRevision(timestamp,\
+                                                   author),\
+                             timestamp,author,changelog,\
+                             tags=tags)
+
+        # Walk through the changesets, identifying ones that result in
+        # a state with a tag.  Add that info to the changeset.
+        for cs in self.__iter__():
+            self.__update_state(state, cs)
+            cs.tags = self.__applicable_tags(state, file2rev2tags, tagcounts)
+
hunk ./vcpx/cvs.py 435
+    def __applicable_tags(self,state,taginfo,expectedcounts):
+        # state:   a dictionary mapping filename->revision
+        #
+        # taginfo: a two-level dictionary mapping
+        #          tagname->revision->list of tags.
+        #
+        # expectedcounts: a dictionary mapping tagname->number of
+        #                 files tagged with that name.
+        observedcounts = {}
+        possibletags = []
+        for filename, revno in state.iteritems():
+            filetags = taginfo[filename].get(revno,[])
+            if len(possibletags) == 0:
+                # first iteration of loop
+                possibletags = filetags
+
+            # Intersection of possibletags and filetags.  I'm
+            # avoiding using python sets to preserve python 2.3
+            # compatibility.
+            possibletags = [t for t in possibletags if t in filetags]
+            for t in filetags:
+                 observedcounts[t] = observedcounts.get(t,0) + 1
+
+            if len(possibletags) == 0:
+                break
+
+        # All currently existing files carry the tags in possibletags.
+        # But that doesn't mean that the tags correspond to this
+        # state--we might need to create additional files before
+        # tagging.
+        possibletags = [t for t in possibletags if \
+                        observedcounts[t] == expectedcounts[t]]
+        
+        return possibletags
+
+
+    def __update_state(self,state, changeset):
+        for e in changeset.entries:
+            if e.action_kind in (e.ADDED, e.UPDATED):
+                state[e.name] = normalize_cvs_rev(e.new_revision)
+            elif e.action_kind == e.DELETED:
+                if state.has_key(e.name):
+                    del state[e.name]
+            elif e.action_kind == e.RENAMED:
+                if state.has_key(e.name):
+                    del state[e.old_name]
+                state[e.name] = normalize_cvs_rev(e.new_revision)
hunk ./vcpx/cvs.py 507
-        branch = ''
+        branch = None
hunk ./vcpx/cvs.py 514
-        cmd = self.repository.command("-f", "-d", "%(repository)s", "rlog",
-                                      "-N")
+        cmd = self.repository.command("-f", "-d", "%(repository)s", "rlog")
hunk ./vcpx/cvs.py 559
-        return changesets_from_cvslog(log, self.repository.module)
+        return changesets_from_cvslog(log, self.repository.module, branch,\
+                                      CvsEntries(self.repository.rootdir), 
since)
hunk ./vcpx/cvs.py 697
+
+    def fileversions(self, prefix=''):
+        """Return a set of (entry name, version number) pairs."""
+
+        from os.path import join
+
+        pairs = [ (prefix+e.filename, normalize_cvs_rev(e.cvs_version)) \
+                  for e in self.files.values() ]
+
+        for (dirname, entries) in self.directories.iteritems():
+            pairs += [ (prefix+filename, version) \
+                       for filename, version in
+                           entries.fileversions("%s/" % dirname) ]
+
+        return pairs
hunk ./vcpx/cvsps.py 545
+    def _tag(self, tagname):
+        """
+        Apply a tag.
+        """
+
+        cmd=self.repository.command("tag")
+        c = ExternalCommand(cwd=self.basedir, command=cmd)
+        c.execute(tagname)
+        if c.exit_status:
+            raise ChangesetApplicationFailure("%s returned status %d" %
+                                              (str(c), c.exit_status))
+
hunk ./vcpx/darcs.py 30
+
+    Filters out the (currently incorrect) tag info from
+    changesets_from_darcschanges_unsafe.
hunk ./vcpx/darcs.py 35
+    csets = changesets_from_darcschanges_unsafe(changes, unidiff,\
+                                                repodir)
+    for cs in csets:
+        cs.tags = None
+    return csets
+
+
+def changesets_from_darcschanges_unsafe(changes, unidiff=False, repodir=None):
+    """
+    Do the real work of parsing the change log, including tags.
+    Warning: the tag information in the changsets returned by this
+    function are only correct if each darcs tag in the repo depends on
+    all of the patches that precede it.  This is not a valid
+    assumption in general--a tag that does not depend on patch P can
+    be pulled in from another darcs repo after P.  We collect the tag
+    info anyway because DarcsWorkingDir._currentTags() can use it
+    safely despite this problem.  Hopefully the problem will
+    eventually be fixed and this function can be renamed
+    changesets_from_darcschanges.
+    """
hunk ./vcpx/darcs.py 110
-                                 self.current['entries'])
+                                 self.current['entries'],
+                                 tags=self.current.get('tags',[]))
hunk ./vcpx/darcs.py 120
-                self.current[name] = ''.join(self.current_field)
+                val = ''.join(self.current_field)
+                if val[:4] == 'TAG ':
+                    self.current.setdefault('tags',[]).append(val[4:])
+                self.current[name] = val
hunk ./vcpx/darcs.py 236
-                changesets.append(cset)
+                
+                if name.startswith('tagged'):
+                    print "Warning: skipping tag %s because I don't \
+                    propagate tags from darcs." % name
+                else:
+                    changesets.append(cset)
hunk ./vcpx/darcs.py 608
+
+    def _tag(self, tag):
+        """
+        Apply the given tag to the repository, unless it has already
+        been applied to the current state. (If it has been applied to
+        an earlier state, do apply it; the later tag overrides the
+        earlier one.
+        """
+        if tag not in self._currentTags():
+            cmd = self.repository.command("tag", "--author", "Unknown tagger")
+            ExternalCommand(cwd=self.basedir, command=cmd).execute(tag)
+
+    def _currentTags(self):
+        """
+        Return a list of tags that refer to the repository's current
+        state.  Does not consider tags themselves to be part of the
+        state, so if the repo was tagged with T1 and then T2, then
+        both T1 and T2 are considered to refer to the current state,
+        even though 'darcs get --tag=T1' and 'darcs get --tag=T2'
+        would have different results (the latter creates a repo that
+        contains tag T2, but the former does not).
+
+        This function assumes that a tag depends on all patches that
+        precede it in the "darcs changes" list.  This assumption is
+        valid if tags only come into the repository via tailor; if the
+        user applies a tag by hand in the hybrid repository, or pulls
+        in a tag from another darcs repository, then the assumption
+        could be violated and mistagging could result.
+        """
+        cmd = self.repository.command("changes", "--from-match=not name ^TAG",\
+                                      "--xml-output")
+        changes =  ExternalCommand(cwd=self.basedir, command=cmd)
+        output = changes.execute(stdout=PIPE, stderr=STDOUT)[0]
+        if changes.exit_status:
+            raise ChangesetApplicationFailure(
+                "%s returned status %d saying \"%s\"" %
+                (str(changes), changes.exit_status, output.read()))
+        
+        tags = []
+        for cs in changesets_from_darcschanges_unsafe(output):
+            for tag in cs.tags:
+                if tag not in tags:
+                    tags.append(tag)
+        return tags
hunk ./vcpx/target.py 113
+        for tag in changeset.tags:
+            self._tag(tag)
hunk ./vcpx/target.py 444
+        for tag in changeset.tags:
+            self._tag(tag)
hunk ./vcpx/target.py 460
+
+    def _tag(self, tagname):
+        """
+        Tag the current version, if the VC type supports it, otherwise
+        do nothing.
+        """
+        pass
}

Context:

[Collapse multiple deletions on a single file under cvs
[EMAIL PROTECTED]
 This should fix #7, that reported a strange case where cvs log says
 a single file has two revisions, by the same author and with the same
 timestamp, both deleting the file.
] 
[bzrng-doc-fix
Jelmer Vernooij <[EMAIL PROTECTED]>**20050929220532] 
[bzrng-initial-rev-fix
Jelmer Vernooij <[EMAIL PROTECTED]>**20050929211300
 Fix handling of initial revisions when using bzrng source repository
] 
[bzrng-source-support
Jelmer Vernooij <[EMAIL PROTECTED]>**20050929205637
 Add initial support for Bzr as source repository. 
] 
[Fix an off by one error in hg rename
[EMAIL PROTECTED] 
[Update to work with bzr 0.0.9
[EMAIL PROTECTED] 
[Hg's .remove() wants a sequence of filename
[EMAIL PROTECTED] 
[Normalize manifest path and handle spaced filenames
[EMAIL PROTECTED] 
[Import PIPE needed for the manifest command
[EMAIL PROTECTED] 
[Use getpreferredencoding() when encoding is None or empty
[EMAIL PROTECTED] 
[Catch also ValueError
[EMAIL PROTECTED] 
[Expose the reason of svn update failure, that may be related to authentication
[EMAIL PROTECTED] 
[Renormalize paths under hglib, before looking them up in the filesystem
[EMAIL PROTECTED] 
[Under hg, the rename must walk over old directory contents
[EMAIL PROTECTED] 
[tla: allow multiple updates of the same revision
Robin Farine <[EMAIL PROTECTED]>**20050927200408
 When tailor resumes an interrupted tailorization, the source tree
 is updated even if it is already at the correct revision. With tla,
 replay causes conflicts in this case, let's use update instead.
] 
[Use getpreferredencoding() instead of getdefaultencoding()
[EMAIL PROTECTED]
 Accordingly to documentation, the former tells the encoding selected
 by the user in his environment LANG variable, while the latter is what
 Python uses internally to map non ascii strings to unicode.
] 
[Use dashes instead of underscores in option name
[EMAIL PROTECTED] 
[Use dashes, not underscores, in option name
[EMAIL PROTECTED] 
[New option to introduce a delay before each changeset
[EMAIL PROTECTED] 
[bzrng-nicer-revid
Jelmer Vernooij <[EMAIL PROTECTED]>**20050924221947
 Attempt to create a revision id similar to the ones Bazaar-NG creates itself, 
 rather then making the current local user "owner" of all revisions.
] 
[Use mercurial's own normpath()
[EMAIL PROTECTED]
 At the API layer, mercurial always expects UNIX style pathnames.
] 
[Normalize the pathnames of the entries
[EMAIL PROTECTED]
 This should fix #4. caused by the mixture of "/" and "\" in the pathnames.
] 
[Cosmetic changes
[EMAIL PROTECTED] 
[M-x whitespace-cleanup
[EMAIL PROTECTED] 
[svn-simplify-startswith
Jelmer Vernooij <[EMAIL PROTECTED]>**20050924202917] 
[svn-copyfrom-remove-fix
Jelmer Vernooij <[EMAIL PROTECTED]>**20050924200801
 Fix issues with files that were part of a merge but were later removed
] 
[svn-copyfrom-and-remove-test
Jelmer Vernooij <[EMAIL PROTECTED]>**20050924183749
 Add unit test that exposes a bug in the SVN log parser; SVN shows 'D' for 
 files that were removed after a (larger) merge and never committed.
] 
[bzrng-fix-rename
Jelmer Vernooij <[EMAIL PROTECTED]>**20050924140258
 Fix rename support in bzrng backend
] 
[Use a more interesting patch name format by default
[EMAIL PROTECTED]
 Instead of "Tailorized XX" now the default for patch-name-format is
 "[ProjectName @ XX]".
] 
[Allow 'project' in the patch name format
[EMAIL PROTECTED] 
[Don't swallow keyboard interrupts
[EMAIL PROTECTED]
 Effectively terminate the loop over configured projects when user
 hits Ctrl-C.
] 
[Use --stop-on-copy to determine the initial revision on a Subversion branch
[EMAIL PROTECTED] 
[Fix a cut&paste error
[EMAIL PROTECTED] 
[Explain subdir role in project sections
[EMAIL PROTECTED] 
[Expand ~user on project's root-directory and make it absolute anyway
[EMAIL PROTECTED] 
[TAG Version 0.9.16
[EMAIL PROTECTED] 
Patch bundle hash:
4c6e4e6eaa6f38d596b82162fddd59e325243e80

_______________________________________________
Tailor mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/tailor

[Tailor] patch: import tags from cvs to darcs

Reply via email to