Hi,
I recently downloaded Tailor to convert a very large (~2 million lines of code
with 6 years of revision history and about 100,000 discrete CVS patches)
CVS repository to Mercurial. I used the cvs backend for cvs and the hg
backend for mercurial. It was tailor version 0.9.22.
Firstly, thanks for an excellent tool!
I tried two different processes:
1. I converted our HEAD branch. This was relatively straightforward to do
(after following the instructions).
2. I converted a branch in our repository. This was more difficult (see
below).
I had a couple of small problems with number 1:
- There is a copy-and-paste typo in the mercurial backend:
diff -r tailor-0.9.22/vcpx/hg.py ../tailor-0.9.22/vcpx/hg.py
20c20
< class HglibWorkingDir(UpdatableSourceWorkingDir,
SyncronizableTargetWorkingDir):
---
> class HgWorkingDir(UpdatableSourceWorkingDir, SyncronizableTargetWorkingDir):
- We happen to have a bunch of $Id:$-type tags in our CVS repository for which
there are a lot of commits in the repository. If we try to merge changes
between the version on the branch and the HEAD, then we get mostly spurious
conflicts (files that changed in both, and can be merged, except for the ID
field).
To avoid this, I added the "-kk" flag to the CVS update and checkout
commands,
which normalizes these fields:
diff -U 3 -r tailor-0.9.22/vcpx/cvsps.py ../tailor-0.9.22/vcpx/cvsps.py
--- tailor-0.9.22/vcpx/cvsps.py 2006-05-01 17:54:33.000000000 -0400
+++ ../tailor-0.9.22/vcpx/cvsps.py 2006-05-27 22:19:16.000000000 -0400
@@ -223,7 +223,7 @@
rmtree(join(self.basedir, e.name))
else:
cmd = self.repository.command("-d", "%(repository)s",
- "-q", "update", "-d",
+ "-q", "update", "-kk", "-d",
"-r", e.new_revision)
cvsup = ExternalCommand(cwd=self.basedir, command=cmd)
retry = 0
@@ -297,7 +297,7 @@
parentdir, subdir = split(self.basedir)
cmd = self.repository.command("-q",
"-d", self.repository.repository,
- "checkout",
+ "checkout", "-kk",
"-d", subdir)
if revision:
cmd.extend(["-r", revision])
- CVS wants to make sure that the second changes before it exits (which means
only one operation per second, which is frustrating with a hundred
thousand CVS changesets). To avoid this, I rsync'd a new copy of the
repository (so that there was no possibility of corruption) and then
modified CVS to not at the end.
--- BUILD/cvs-1.11.19/src/update.c 2005-01-31 17:18:01.000000000 -0500
+++ BUILD/cvs-1.11.19-nowait/src/update.c 2006-05-27 21:52:27.621272500
-0400
@@ -525,11 +525,11 @@
#endif
/* see if we need to sleep before returning to avoid time-stamp races */
if (last_register_time)
{
- sleep_past (last_register_time);
+ /* sleep_past (last_register_time); */
}
return err;
}
I had more problems with number 2:
- I had accidentally put a "/" on the end of the CVS module name, which meant
the first character of the path was truncated (eg, I got
"pplications/main.cc" instead of "applications/main.cc". I fixed this by
removing the "/", but it would be good to put this in the doco or
(preferably) check for it and remove it.
- CVS has a bugs with the rlog command when using -r:branch syntax, which
leads to changesets on the wrong branch being listed in the log. For
example:
cvs -f -d /export/cvsroot-copy rlog -r:prod
algorithms/machine_learning/mlp/random.cc
RCS file: /export/cvsroot-copy/algorithms/machine_learning/mlp/random.cc,v
head: 1.2
branch:
locks: strict
access list:
symbolic names:
prod: 1.2.0.106
mingw32-branch: 1.2.0.20
keyword substitution: kv
total revisions: 3; selected revisions: 1
description:
----------------------------
revision 1.2.20.1
date: 2004/05/31 22:45:58; author: jeremy; state: Exp; lines: +8 -2
* Initial mingw32 work; most things compile now
This would lead to a "Something went wrong: unable to determine the exact
upstream revision of the checked out tree" error, since the date of the
incorrect revision was much earlier than the date of the branch (earlier this
year).
I fixed this by filtering the revisions as they were parsed (ie, by not
trusting CVS):
diff -U 3 -r tailor-0.9.22/vcpx/cvsps.py ../tailor-0.9.22/vcpx/cvsps.py
--- tailor-0.9.22/vcpx/cvsps.py 2006-05-01 17:54:33.000000000 -0400
+++ ../tailor-0.9.22/vcpx/cvsps.py 2006-05-27 21:11:58.000000000 -0400
@@ -401,8 +425,20 @@
cs = self.__parseRevision(entry)
if cs is None:
break
+
date,author,changelog,e,rev,state,newentry = cs
+ # CVS seems to sometimes mess up what it thinks the branch
is...
+ if not cvs_revs_same_branch(normalize_cvs_rev(rev), branchnum):
+ self.log.warning("skipped revision %s on entry %s "
+ "as revision didn't match branch revision
%s "
+ "for branch %s"
+ % (str(normalize_cvs_rev(rev)), entry,
str(branchnum),
+ str(branch)))
+ expected_revisions -= 1
+ continue
+
+
# Skip spurious entries added in a branch
if not (rev == '1.1' and state == 'dead' and
changelog.startswith('file ') and
I hope that this can save somebody some frustration! As for the patch to filter
the CVS revisions: I think that this would be a good thing to add to the
official
distribution; from Googling I see that I'm not the only person to think that
Tailor didn't work on branches (when it was in fact a CVS bug)!
Cheers,
Jeremy
_______________________________________________
Tailor mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/tailor