Many thanks for your rapid response. It is certainly looking quite
promising.
Toby Johnson wrote:
Due to the way vss2svn retrieves data, it often doesn't know until later
that a data item refers to a "destroyed" item. Even after you use
"destroy" in VSS, there are traces left of the file.
Fair enough. There are plenty of items that were destroyed in this
database, particularly when we were first starting to adopt SourceSafe
during the Windows port, or in temporary test projects. Just that they
dominate the output so much that it is quite a bit to trawl through for
the real errors... I could soon pipe the output through grep -v though.
However, from further investigation later, it seems some items are being
incorrectly identified as orphaned. See below.
> ERROR -- No more active itempath to commit to 'IYAAAAAA':
This is a bit trickier and I'm not sure I've ever actually seen this
error, but it probably also has to do with destroyed or corrupted files.
This appears to be related to a project that was moved around and
renamed somewhat... not sure what the best way to extract relevant debug
is but approx sequence of actions follows (depending on what is
considered as a "related" action). It ended up imported in "the wrong
place".
ADD /Project/Subproject/
RENAME /Project/Subproject/ to Sub/
MOVE /Project/Sub/ to /Sub/
ADD /Project/Sub/ (different physical name)
RENAME /Project/Sub/ to Sub1/
MOVE /Sub/ to /Project/Sub/
DELETE /Project/Sub1/
vss2svn seems to have actually imported this at /Sub/
It is thus also missed out of all labelled copies of /Project/
Other subprojects that had been similarly moved around and renamed etc.
ended up in the Orphaned folder, and were similarly missed out of the
relevant labels.
Also for individual files that have been messed around in similar ways,
it seems (at least sometimes) to be losing track of which file is which,
and leaving projects with missing files.
e.g. for one file name (a header file, the final version of which
[GCDAAAAA] is used and shared between several projects) the physical
action file (first few columns, when sorted by date and filtered by
filename) shows:
7840 FPBAAAAA \N FNCAAAAA SHARE
15492 MTCAAAAA \N KTCAAAAA ADD
15749 MTCAAAAA \N KTCAAAAA DELETE
9315 GCDAAAAA 1 \N ADD
15824 GCDAAAAA \N KTCAAAAA ADD
7927 FPBAAAAA \N FNCAAAAA DELETE
7930 GCDAAAAA \N FNCAAAAA SHARE
33639 GCDAAAAA 3 YQFAAAAA SHARE
(the above excludes COMMIT actions which do not mention the filename)
KTCAAAAA is a project that remains in the final version of the
sourcesafe database. Interestingly FPBAAAAA seems to have "appeared from
nowhere". Note that all of FPBAAAAA, MTCAAAAA, GCDAAAAA have the same
filename, and have been present in (some of) the same projects as each
other.
In the VssAction, I see the following operations on that file:
5794 _GCDAAAAA GCDAAAAA 1 ADD
5837 \N GCDAAAAA 2 COMMIT
6395 FNCAAAAA GCDAAAAA 2 SHARE
8505 \N GCDAAAAA 3 COMMIT
9863 YQFAAAAA GCDAAAAA 3 SHARE
... more shares later, etc.
The file is added as an orphan when it should have been added to
KTCAAAAA. Indeed adjacent VssActions are working on KTCAAAAA
5793 KTCAAAAA FCDAAAAA 1 ADD
5794 _GCDAAAAA GCDAAAAA 1 ADD
5795 KTCAAAAA HCDAAAAA 1 ADD
PhysicalAction (sorted by timestamp) shows for these
7208 FCDAAAAA 1 \N ADD
15823 FCDAAAAA \N KTCAAAAA ADD
9315 GCDAAAAA 1 \N ADD
10544 HCDAAAAA 1 \N ADD
15824 GCDAAAAA \N KTCAAAAA ADD
15825 HCDAAAAA \N KTCAAAAA ADD
This, and probably many other files / projects seem to be getting
incorrectly identified as orphans. Possibly the script is getting
confused between GCDAAAAA and FPBAAAAA?
Yes, that seems to be the problem. Probably another item for the
sanity-checker to prevent adding the same item twice. I thought we were
already checking for that, and making sure the same file wasn't touched
twice in the same delete, but maybe we're not doing that for adds, or
maybe the label logic is bypassing it.
Sequence for this file seems to be
ADD (at /Project1/filename)
SHARE (to /Project2/filename)
DELETE (/Project2/filename)
COMMIT (at /Project1/filename)
SHARE (to Project2/filename)
[... other actions ... COMMIT and SHARE to other projects ...]
LABEL
Another label that had a similar problem (this time for a whole project)
went through the sequence
ADD (at location1)
MOVE (to location2)
MOVE (to location1)
LABEL
I note that as you say, duplicates seem to be silently swallowed for the
subsequent "COMMIT" actions on the first file.
Yes, it's possible to modify the datacache files and then use the
"resume" feature to restart the conversion at a particular point.
Couldn't get this to work... if I resume from the BUILDACTIONHIST or
GETPHYSHIST phase then the datacache files are recreated, but if I
resume from immediately following phases, (MERGEPARENTDATA or IMPORTSVN)
then datacache files seem to be ignored - having presumably been
incorporated into the vss_data.db database already.
As you point out, there are countless other ways to trim the sections
out of the file if this proves necessary, and I've now just knocked up a
quick filter to pipe the output through that removes these known
duplicates. It all imported ok then.
b: not too surprising; the pin/unpin logic can be tricky to follow.
It seems that there are only about 2 or 3 dozen files pinned to the
wrong version, so these shouldn't be too arduous to fix up manually if
this proves necessary.
It does looks like I'll have a fair bit of work to recover old versions
of orphaned projects and copy these to the appropriate label folders
unless those other problems get fixed in the meantime. It may be better
just to not bother with labels, and keep SourceSafe handy for
maintaining older versions.
Since you will likely need to change your development
model to not rely on these features anyway, I would suggest running the
conversion as you did, checking it out from SVN, then doing one final
export from VSS "on top of" the SVN checkout directory and committing to
ensure you have the latest of everything. Then rearrange your project as
necessary to not rely on those VSS features.
Was thinking of doing part of this the other way round... while various
highly duplicated header files remain "shared", it is easier to replace
all of these with a stub to #include a central copy rather than hunting
them down in SVN after a conversion.
The amount of things that may need manually fixing up afterwards is
potentially looking concerning though, particularly files that are
missing from and/or incorrectly [un]pinned in labelled versions. I
recognise that this is an alpha release not expected to be perfect, so
if there is anyone who would like more detailed debug info or for me to
run additional checks, please let me know.
I filed a ticket for the duplicate-label and uninitialized-value issues,
but again there is likely not much to be done regarding the
destroyed/corrupted items.
There does seem (from the above) to be some kind of problem with the
vss2svn script that gets itself out of sync between the PhysicalAction
and VssAction output.
About time I did what I should have done in the first place and checked
outstanding problem logs for anything that correlates, or the
appropriate form for a minimal test case...
--
Stephen Lee
_______________________________________________
vss2svn-users mailing list
Project homepage:
http://www.pumacode.org/projects/vss2svn/
Subscribe/Unsubscribe/Admin:
http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org
Mailing list web interface (with searchable archives):
http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user