Kirit Sælensminde wrote:
Is there more information available in the actual SourceSafe database than is exposed through SS.EXE? If so then it may be that replacing SS.EXE with a database reader could simplify some of the heuristics used for ordering etc. in the tool. It would only be worthwhile though if my tool has better modeling of how the SourceSafe repository changed over time than the existing vss2svn tool does. How is this handled?

Reading the database directly, as you can probably imagine, solves many of the problems of reading ss.exe (not the least of which is how to parse the output, which as you said can be tricky) but also raises new ones. Of course we -- and by "we", I mean Dirk :) -- had to reverse-engineer the database format through trial-and-error, and there are likely still some places where we're doing it wrong.

However, this approach also exposes some data that is simply impossible to retrieve using ss.exe or the OLE API, such as recovering child items from a deleted project, or correctly recovering the history of a renamed item, especially if different items of the same name existed in the repository at multiple points in time.

Unfortunately, the bottom line is that the VSS database structure is rather cumbersome, incomplete, and fragile, and regardless of how the data is retrieved there's a good chance some information is lost. For example, there is no sort of auto-incremented counter in any of the database files to give even the correct order of actions (although the ss.exe output gives the illusion of ordered version numbers, these are derived at runtime and aren't actually stored anywhere). So this means we must rely entirely on timestamps, and since VSS is a file-based system that has only the system clocks of the various client machines that connect (sometimes even in different time zones!), this information is very unreliable -- especially, as Dirk mentioned, when an archive/restore cycle is performed, because then the timestamps are overwritten with the time of the restore, and not the time of the original commit!!

    Since we worked also very hard on getting things "right" during the
    conversion, there are a few concepts that are not easily mapped
    between
    the two tools. Esp. the archive and restore cycles are the most
    problematic one. Have you solved this problem domain and how did you
solve it?

I'm not 100% sure what you mean here. SourceSafe has no concept of transactions - each file submission is handled seperately, so the migration doesn't attempt to guess where transactions might be valid. In practice each file version that is sent to Subversion is a seperate transaction (revision number).

Dirk was referring here to the act of using the VSS "Archive" command followed by a later "Restore"; as I mentioned above, this really screws with the timestamps. However, since you mention transactions, I should point out that we try to deduce atomic transactions in VSS by assuming that if consecutive VSS commits have the same author and comment, they are part of the same logical transaction, and are recreated in Subversion that way. We keep track of any files that are modified in a given transaction, and "commit" that transaction whenever the same file is about to be modified twice (there are also other cases where we always immediately commit, such as after a rename).


Better handling of shared files is the main thing that the tool is able to handle. If you have a simple situation where a file is developed and then shared to each location it is used then this tool will handle that much better than other tools I've seen, i.e. it will not put multiple versions of that file into Subversion until after the share occurs.

What does vss2svn do in this situation? I've been thinking of putting together a single page with all of the tools I can find with a short description of what they actually import in terms of the SourceSafe history into Subversion.

I believe we are doing the same thing here; specifically, when an item was shared in VSS we treat that as a Subversion "cheap copy". We keep track of all shares during the migration, and after a share occurs, then any commits which are made to any of the various logical locations which point to the same physical file are propagated to each file in Subversion. So when foo.txt is shared to bar.txt, that is treated as an "svn copy" action. Then if a commit is made to foo.txt, that change will be made to both foo.txt and bar.txt in the same transaction.

Unfortunately, as you can imagine, all of this is rather complex, and the learning curve for just getting familiar with the code is very steep. Couple that with the fact that most people will only use such a tool once, and you can see that it very difficult to continue innovation of such a project! I doubt I will ever need to perform another VSS migration (I hope to live the rest of my life without ever actually using the tool for real source control again :) so the "scratch your itch" motivation of most open source projects quickly diminishes.

toby

_______________________________________________
vss2svn-users mailing list
Project homepage:
http://www.pumacode.org/projects/vss2svn/
Subscribe/Unsubscribe/Admin:
http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org
Mailing list web interface (with searchable archives):
http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user

Reply via email to