Many thanks for your rapid response. It is certainly looking quite promising.

Toby Johnson wrote:

Due to the way vss2svn retrieves data, it often doesn't know until later
that a data item refers to a "destroyed" item. Even after you use
"destroy" in VSS, there are traces left of the file.

Fair enough. There are plenty of items that were destroyed in this database, particularly when we were first starting to adopt SourceSafe during the Windows port, or in temporary test projects. Just that they dominate the output so much that it is quite a bit to trawl through for the real errors... I could soon pipe the output through grep -v though.

However, from further investigation later, it seems some items are being incorrectly identified as orphaned. See below.

> ERROR -- No more active itempath to commit to 'IYAAAAAA':
This is a bit trickier and I'm not sure I've ever actually seen this
error, but it probably also has to do with destroyed or corrupted files.

This appears to be related to a project that was moved around and renamed somewhat... not sure what the best way to extract relevant debug is but approx sequence of actions follows (depending on what is considered as a "related" action). It ended up imported in "the wrong place".

ADD /Project/Subproject/
RENAME /Project/Subproject/ to Sub/
MOVE /Project/Sub/ to /Sub/
ADD /Project/Sub/ (different physical name)
RENAME /Project/Sub/ to Sub1/
MOVE /Sub/ to /Project/Sub/
DELETE /Project/Sub1/

vss2svn seems to have actually imported this at /Sub/
It is thus also missed out of all labelled copies of /Project/

Other subprojects that had been similarly moved around and renamed etc. ended up in the Orphaned folder, and were similarly missed out of the relevant labels.

Also for individual files that have been messed around in similar ways, it seems (at least sometimes) to be losing track of which file is which, and leaving projects with missing files.

e.g. for one file name (a header file, the final version of which [GCDAAAAA] is used and shared between several projects) the physical action file (first few columns, when sorted by date and filtered by filename) shows:
7840    FPBAAAAA    \N    FNCAAAAA    SHARE
15492    MTCAAAAA    \N    KTCAAAAA    ADD
15749    MTCAAAAA    \N    KTCAAAAA    DELETE
9315    GCDAAAAA    1    \N    ADD
15824    GCDAAAAA    \N    KTCAAAAA    ADD
7927    FPBAAAAA    \N    FNCAAAAA    DELETE
7930    GCDAAAAA    \N    FNCAAAAA    SHARE
33639    GCDAAAAA    3    YQFAAAAA    SHARE

(the above excludes COMMIT actions which do not mention the filename)

KTCAAAAA is a project that remains in the final version of the sourcesafe database. Interestingly FPBAAAAA seems to have "appeared from nowhere". Note that all of FPBAAAAA, MTCAAAAA, GCDAAAAA have the same filename, and have been present in (some of) the same projects as each other.

In the VssAction, I see the following operations on that file:
5794    _GCDAAAAA    GCDAAAAA    1    ADD
5837    \N    GCDAAAAA    2    COMMIT
6395    FNCAAAAA    GCDAAAAA    2    SHARE
8505    \N    GCDAAAAA    3    COMMIT
9863    YQFAAAAA    GCDAAAAA    3    SHARE
... more shares later, etc.

The file is added as an orphan when it should have been added to KTCAAAAA. Indeed adjacent VssActions are working on KTCAAAAA

5793    KTCAAAAA    FCDAAAAA    1    ADD
5794    _GCDAAAAA    GCDAAAAA    1    ADD
5795    KTCAAAAA    HCDAAAAA    1    ADD

PhysicalAction (sorted by timestamp) shows for these
7208    FCDAAAAA    1    \N    ADD
15823    FCDAAAAA    \N    KTCAAAAA    ADD
9315    GCDAAAAA    1    \N    ADD
10544    HCDAAAAA    1    \N    ADD
15824    GCDAAAAA    \N    KTCAAAAA    ADD
15825    HCDAAAAA    \N    KTCAAAAA    ADD

This, and probably many other files / projects seem to be getting incorrectly identified as orphans. Possibly the script is getting confused between GCDAAAAA and FPBAAAAA?

Yes, that seems to be the problem. Probably another item for the
sanity-checker to prevent adding the same item twice. I thought we were
already checking for that, and making sure the same file wasn't touched
twice in the same delete, but maybe we're not doing that for adds, or
maybe the label logic is bypassing it.

Sequence for this file seems to be
ADD (at /Project1/filename)
SHARE (to /Project2/filename)
DELETE (/Project2/filename)
COMMIT (at /Project1/filename)
SHARE (to Project2/filename)
[... other actions ... COMMIT and SHARE to other projects ...]
LABEL

Another label that had a similar problem (this time for a whole project) went through the sequence
ADD (at location1)
MOVE (to location2)
MOVE (to location1)
LABEL

I note that as you say, duplicates seem to be silently swallowed for the subsequent "COMMIT" actions on the first file.

Yes, it's possible to modify the datacache files and then use the
"resume" feature to restart the conversion at a particular point.

Couldn't get this to work... if I resume from the BUILDACTIONHIST or GETPHYSHIST phase then the datacache files are recreated, but if I resume from immediately following phases, (MERGEPARENTDATA or IMPORTSVN) then datacache files seem to be ignored - having presumably been incorporated into the vss_data.db database already.

As you point out, there are countless other ways to trim the sections out of the file if this proves necessary, and I've now just knocked up a quick filter to pipe the output through that removes these known duplicates. It all imported ok then.


b: not too surprising; the pin/unpin logic can be tricky to follow.

It seems that there are only about 2 or 3 dozen files pinned to the wrong version, so these shouldn't be too arduous to fix up manually if this proves necessary.

It does looks like I'll have a fair bit of work to recover old versions of orphaned projects and copy these to the appropriate label folders unless those other problems get fixed in the meantime. It may be better just to not bother with labels, and keep SourceSafe handy for maintaining older versions.

Since you will likely need to change your development
model to not rely on these features anyway, I would suggest running the
conversion as you did, checking it out from SVN, then doing one final
export from VSS "on top of" the SVN checkout directory and committing to
ensure you have the latest of everything. Then rearrange your project as
necessary to not rely on those VSS features.

Was thinking of doing part of this the other way round... while various highly duplicated header files remain "shared", it is easier to replace all of these with a stub to #include a central copy rather than hunting them down in SVN after a conversion.

The amount of things that may need manually fixing up afterwards is potentially looking concerning though, particularly files that are missing from and/or incorrectly [un]pinned in labelled versions. I recognise that this is an alpha release not expected to be perfect, so if there is anyone who would like more detailed debug info or for me to run additional checks, please let me know.

I filed a ticket for the duplicate-label and uninitialized-value issues,
but again there is likely not much to be done regarding the
destroyed/corrupted items.

There does seem (from the above) to be some kind of problem with the vss2svn script that gets itself out of sync between the PhysicalAction and VssAction output.

About time I did what I should have done in the first place and checked outstanding problem logs for anything that correlates, or the appropriate form for a minimal test case...

--
Stephen Lee

_______________________________________________
vss2svn-users mailing list
Project homepage:
http://www.pumacode.org/projects/vss2svn/
Subscribe/Unsubscribe/Admin:
http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org
Mailing list web interface (with searchable archives):
http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user

Reply via email to