[fossil-users] Reconstructing a corrupted Fossil repository
The other day I converted a Subversion repository to Fossil via the script presented at http://www.fossil-scm.org/index.html/wiki?name=Cookbook#SVN . It mostly worked, but the imported commits were not on the trunk branch. (By the way, said Subversion repository did not utilize branching.) I corrected that by adding propagating tags branch=trunk and sym-trunk. However, at first I mistakenly put the tags somewhere in the middle of the timeline, but I remedied this by putting them at the first non-trunk commit, i.e. the first imported commit. A day or two later, while dangerously bored, I experimented with shunning, and I removed the erroneous tag edits in the middle of the timeline. I'm not 100% sure how, and I haven't succeeded in completely reproducing the damage with a test repository, but this somehow fractured the timeline with several commit manifests having P cards naming nonexistent predecessors. Very bad. Plus this broke the edit link in the web UI for the affected commits. I couldn't fix the repository in place without database editing beyond my comfort level (zero), plus it's (currently...) impossible to generate manifests having a predetermined SHA1 sum. It would have been okay to let all the checksums change after the point of my edit, but still it seemed like too much work. I exported and reimported the repository using [fossil export] and [fossil import], but the tree remained fractured. I tried editing the exported file before reimporting it, but I still couldn't work out how to make it do what I wanted. My solution was to transfer the contents of each commit to a new repository. Since the repository has fewer than 150 commits; no branches; no special tags, users, or configuration; no tickets, wiki pages, or events; nothing special at all; and no requirement to preserve the checksums, this was acceptable. For this repository, the transfer process takes about four minutes on my computer, almost all of it spent inside [fossil commit], presumably doing checksums. The strange thing I do is open two repositories simultaneously within a single directory, shuffling multiple copies of .fslckout. Then [fossil update] performs the edits that are committed with [fossil delete], [fossil add], and [fossil commit], and I use [fossil changes] to see what needs to be deleted. Since someone (me?) might find this script useful in the future, perhaps as the foundation for a more comprehensive database regeneration procedure, or a for stress test, I'm pasting it below: #!/usr/bin/env tclsh set repo1 CORRUPT.fossil set repo2 REBUILT.fossil proc fossil {args} { puts [concat fossil $args] exec fossil {*}$args } set pwd [pwd] file mkdir tmp cd tmp fossil open [file join $pwd $repo1] foreach line [split [fossil timeline -t ci -n 0 -W 0] \n] { if {![regexp {^=== (\d{4}-\d\d-\d\d) ===$} $line _ date] [regexp {(?x)^(\d\d:\d\d:\d\d)\ \[([[:xdigit:]]+)\] \ (?:\*CURRENT\*\ )?(.*)$}\ $line _ time version comment]} { lappend history [list $date $time $version $comment] } } set date [clock format [clock add [clock scan $date] -1 day]\ -format %Y-%m-%d] fossil new --date-override $date 00:00:00 [file join $pwd $repo2] file rename .fslckout [file join $pwd repo1.fslckout] fossil open [file join $pwd $repo2] file rename .fslckout [file join $pwd repo2.fslckout] foreach checkin [lreverse $history] { lassign $checkin timestamp version comment file rename [file join $pwd repo1.fslckout] .fslckout fossil update $version file rename .fslckout [file join $pwd repo1.fslckout] file rename [file join $pwd repo2.fslckout] .fslckout foreach change [split [fossil changes] \n] { if {[regexp {^MISSING +(\S.*)$} $change _ name]} { fossil delete $name } } fossil add . fossil commit --allow-empty --no-warnings\ --date-override $timestamp --comment $comment file rename .fslckout [file join $pwd repo2.fslckout] } file rename [file join $pwd repo1.fslckout] .fslckout fossil close file rename [file join $pwd repo2.fslckout] .fslckout fossil close cd $pwd file delete -force tmp # vim: set sts=4 sw=4 tw=80 et ft=tcl: -- Andy Goth | andrew.m.goth/at/gmail/dot/com ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Reconstructing a corrupted Fossil repository
On Tue, Mar 25, 2014 at 6:42 PM, Andy Goth andrew.m.g...@gmail.com wrote: ...but this somehow fractured the timeline with several commit manifests having P cards naming nonexistent predecessors. i was hopeful until you said that :/. Once an artifact referenced by other artifacts is gone, if you have no way of 100% accurately reproducing it then... well... perhaps Richard can offer some hope, but i can't :/. BTW: i'm impressed by your casual use of the word P-card in everyday speech ;). i don't get to do that very often ;). I couldn't fix the repository in place without database editing beyond my comfort level (zero), plus it's (currently...) impossible to generate manifests having a predetermined SHA1 sum. It would have been okay to let all the checksums change after the point of my edit, but still it seemed like too much work. i suspect a re-import from svn is the most expedient route here. My solution was to transfer the contents of each commit to a new repository. Since the repository has fewer than 150 commits; no branches; no special tags, users, or configuration; no tickets, wiki pages, or events; nothing special at all; and no requirement to preserve the checksums, this was acceptable. For this repository, the transfer process takes about four minutes on my computer, almost all of it spent inside [fossil commit], presumably doing checksums. i hope you've posted that later in this mail :). The strange thing I do is open two repositories simultaneously within a single directory, shuffling multiple copies of .fslckout. Then [fossil update] performs the edits that are committed with [fossil delete], [fossil add], and [fossil commit], and I use [fossil changes] to see what needs to be deleted. That sounds dangerous, but i don't inherently see a specific problem with it if it's done carefully. An alternate algorithm which might suit you better (but i have never tried): check out the first svn version, use (svn export) to set up your initial fossil version. Then incrementally check out svn versions, export them to the same fossil checkout dir, use (fossil addremove; fossil ci -m 'revision r'). That should be relatively performant if you don't have to go over the network for the svn (otherwise woe possibly awaits you ;). Since someone (me?) might find this script useful in the future, perhaps as the foundation for a more comprehensive database regeneration procedure, or a for stress test, I'm pasting it below: Indeed you did :). Thanks. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Reconstructing a corrupted Fossil repository
On 3/25/2014 1:06 PM, Stephan Beal wrote: i suspect a re-import from svn is the most expedient route here. I had considered that, but the subversion repository is hard to get to. It's on a private network my laptop cannot directly connect to without the aid of janky security software, then once I've done that, I need to construct a huge tar archive of the repository and manually transfer it from one computer to the next to the next to the next since no one bothered to set up routing. It takes over an hour, and once was enough. But once that was done, I would have to re-import the commits I had done since the initial import, so I'd still be facing my original problem. incrementally check out svn versions, export them to the same fossil checkout dir, use (fossil addremove; fossil ci -m 'revision r'). I didn't know [fossil addremove] existed. That would have eliminated the need for [fossil changes]! How convenient. -- Andy Goth | andrew.m.goth/at/gmail/dot/com ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Reconstructing a corrupted Fossil repository
On 3/25/2014 1:23 PM, Andy Goth wrote: I didn't know [fossil addremove] existed. That would have eliminated the need for [fossil changes]! How convenient. Here's an updated version of the script that uses [fossil addremove]. It also fixes a bug which included the user and tags in the comments. #!/usr/bin/env tclsh set repo1 CORRUPT.fossil set repo2 REBUILT.fossil proc fossil {args} { puts [concat fossil $args] exec fossil {*}$args } set pwd [pwd] file mkdir tmp cd tmp fossil open [file join $pwd $repo1] foreach line [split [fossil timeline -t ci -n 0 -W 0] \n] { if {![regexp {^=== (\d{4}-\d\d-\d\d) ===$} $line _ date] [regexp {(?x)^(\d\d:\d\d:\d\d)\ \[([[:xdigit:]]+)\] \ (?:\*CURRENT\*\ )?(.*)\ \(user:\ .*\ tags:\ .*\)$}\ $line _ time version comment]} { lappend history [list $date $time $version $comment] } } set date [clock format [clock add [clock scan $date] -1 day]\ -format %Y-%m-%d] fossil new --date-override $date 00:00:00 [file join $pwd $repo2] file rename .fslckout [file join $pwd repo1.fslckout] fossil open [file join $pwd $repo2] file rename .fslckout [file join $pwd repo2.fslckout] foreach checkin [lreverse $history] { lassign $checkin timestamp version comment file rename [file join $pwd repo1.fslckout] .fslckout fossil update $version file rename .fslckout [file join $pwd repo1.fslckout] file rename [file join $pwd repo2.fslckout] .fslckout fossil addremove fossil commit --allow-empty --no-warnings\ --date-override $timestamp --comment $comment file rename .fslckout [file join $pwd repo2.fslckout] } file rename [file join $pwd repo1.fslckout] .fslckout fossil close file rename [file join $pwd repo2.fslckout] .fslckout fossil close cd $pwd file delete -force tmp # vim: set sts=4 sw=4 tw=80 et ft=tcl: -- Andy Goth | andrew.m.goth/at/gmail/dot/com ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
[fossil-users] Partial hash collision
Fossil uses unique prefixes of checksums as identifiers. What does it do when a previously-unique prefix becomes ambiguous due to a new commit? Also, what happens when an existing comment (or ticket or wiki page or whatever) references a no-longer-unique prefix? Fossil can't rewrite the old manifest without changing every checksum forward, so its only hope is to change the display, but that leads to more problems. It's quite likely this has already been discussed and resolved, but I haven't been able to track down any emails or documentation on the matter. I wish I could give you a test case, but the SHA1 function is thankfully difficult to invert, even for prefixes. :^) -- Andy Goth | andrew.m.goth/at/gmail/dot/com ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Partial hash collision
On 3/25/2014 4:40 PM, Andreas Kupries wrote: On Tue, Mar 25, 2014 at 2:28 PM, Andy Goth andrew.m.g...@gmail.com wrote: Fossil uses unique prefixes of checksums as identifiers. No, it does not. Fossil stores full identifiers I was referring only to the display. Full identifiers are usually shown only in detail pages and the actual manifests. and allows you to search for commits by prefix. IOW the prefix thing is a pure convenience to reduce the amount of stuff to enter. Understood, but this convenience feature feeds back into the database when the user enters a prefix into a commit comment or a ticket or a wiki page. -- Andy Goth | andrew.m.goth/at/gmail/dot/com ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users