[fossil-users] Reconstructing a corrupted Fossil repository

2014-03-25 Thread Andy Goth
The other day I converted a Subversion repository to Fossil via the 
script presented at 
http://www.fossil-scm.org/index.html/wiki?name=Cookbook#SVN .  It mostly 
worked, but the imported commits were not on the trunk branch.  (By the 
way, said Subversion repository did not utilize branching.)


I corrected that by adding propagating tags branch=trunk and sym-trunk. 
 However, at first I mistakenly put the tags somewhere in the middle of 
the timeline, but I remedied this by putting them at the first non-trunk 
commit, i.e. the first imported commit.


A day or two later, while dangerously bored, I experimented with 
shunning, and I removed the erroneous tag edits in the middle of the 
timeline.  I'm not 100% sure how, and I haven't succeeded in completely 
reproducing the damage with a test repository, but this somehow 
fractured the timeline with several commit manifests having P cards 
naming nonexistent predecessors.  Very bad.  Plus this broke the edit 
link in the web UI for the affected commits.


I couldn't fix the repository in place without database editing beyond 
my comfort level (zero), plus it's (currently...) impossible to generate 
manifests having a predetermined SHA1 sum.  It would have been okay to 
let all the checksums change after the point of my edit, but still it 
seemed like too much work.


I exported and reimported the repository using [fossil export] and 
[fossil import], but the tree remained fractured.  I tried editing the 
exported file before reimporting it, but I still couldn't work out how 
to make it do what I wanted.


My solution was to transfer the contents of each commit to a new 
repository.  Since the repository has fewer than 150 commits; no 
branches; no special tags, users, or configuration; no tickets, wiki 
pages, or events; nothing special at all; and no requirement to preserve 
the checksums, this was acceptable.  For this repository, the transfer 
process takes about four minutes on my computer, almost all of it spent 
inside [fossil commit], presumably doing checksums.


The strange thing I do is open two repositories simultaneously within a 
single directory, shuffling multiple copies of .fslckout.  Then [fossil 
update] performs the edits that are committed with [fossil delete], 
[fossil add], and [fossil commit], and I use [fossil changes] to see 
what needs to be deleted.


Since someone (me?) might find this script useful in the future, perhaps 
as the foundation for a more comprehensive database regeneration 
procedure, or a for stress test, I'm pasting it below:


#!/usr/bin/env tclsh
set repo1 CORRUPT.fossil
set repo2 REBUILT.fossil
proc fossil {args} {
puts [concat fossil $args]
exec fossil {*}$args
}
set pwd [pwd]
file mkdir tmp
cd tmp
fossil open [file join $pwd $repo1]
foreach line [split [fossil timeline -t ci -n 0 -W 0] \n] {
if {![regexp {^=== (\d{4}-\d\d-\d\d) ===$} $line _ date]
  [regexp {(?x)^(\d\d:\d\d:\d\d)\ \[([[:xdigit:]]+)\]
 \ (?:\*CURRENT\*\ )?(.*)$}\
$line _ time version comment]} {
lappend history [list $date $time $version $comment]
}
}
set date [clock format [clock add [clock scan $date] -1 day]\
-format %Y-%m-%d]
fossil new --date-override $date 00:00:00 [file join $pwd $repo2]
file rename .fslckout [file join $pwd repo1.fslckout]
fossil open [file join $pwd $repo2]
file rename .fslckout [file join $pwd repo2.fslckout]
foreach checkin [lreverse $history] {
lassign $checkin timestamp version comment
file rename [file join $pwd repo1.fslckout] .fslckout
fossil update $version
file rename .fslckout [file join $pwd repo1.fslckout]
file rename [file join $pwd repo2.fslckout] .fslckout
foreach change [split [fossil changes] \n] {
if {[regexp {^MISSING +(\S.*)$} $change _ name]} {
fossil delete $name
}
}
fossil add .
fossil commit --allow-empty --no-warnings\
--date-override $timestamp --comment $comment
file rename .fslckout [file join $pwd repo2.fslckout]
}
file rename [file join $pwd repo1.fslckout] .fslckout
fossil close
file rename [file join $pwd repo2.fslckout] .fslckout
fossil close
cd $pwd
file delete -force tmp
# vim: set sts=4 sw=4 tw=80 et ft=tcl:

--
Andy Goth | andrew.m.goth/at/gmail/dot/com
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Reconstructing a corrupted Fossil repository

2014-03-25 Thread Stephan Beal
On Tue, Mar 25, 2014 at 6:42 PM, Andy Goth andrew.m.g...@gmail.com wrote:

 ...but this somehow fractured the timeline with several commit manifests
 having P cards naming nonexistent predecessors.


i was hopeful until you said that :/. Once an artifact referenced by other
artifacts is gone, if you have no way of 100% accurately reproducing it
then... well... perhaps Richard can offer some hope, but i can't :/.

BTW: i'm impressed by your casual use of the word P-card in everyday
speech ;). i don't get to do that very often ;).

I couldn't fix the repository in place without database editing beyond my
 comfort level (zero), plus it's (currently...) impossible to generate
 manifests having a predetermined SHA1 sum.  It would have been okay to let
 all the checksums change after the point of my edit, but still it seemed
 like too much work.


i suspect a re-import from svn is the most expedient route here.


 My solution was to transfer the contents of each commit to a new
 repository.  Since the repository has fewer than 150 commits; no branches;
 no special tags, users, or configuration; no tickets, wiki pages, or
 events; nothing special at all; and no requirement to preserve the
 checksums, this was acceptable.  For this repository, the transfer process
 takes about four minutes on my computer, almost all of it spent inside
 [fossil commit], presumably doing checksums.


i hope you've posted that later in this mail :).


 The strange thing I do is open two repositories simultaneously within a
 single directory, shuffling multiple copies of .fslckout.  Then [fossil
 update] performs the edits that are committed with [fossil delete],
 [fossil add], and [fossil commit], and I use [fossil changes] to see what
 needs to be deleted.


That sounds dangerous, but i don't inherently see a specific problem with
it if it's done carefully. An alternate algorithm which might suit you
better (but i have never tried): check out the first svn version, use (svn
export) to set up your initial fossil version. Then incrementally check out
svn versions, export them to the same fossil checkout dir, use (fossil
addremove; fossil ci -m 'revision r'). That should be relatively
performant if you don't have to go over the network for the svn (otherwise
woe possibly awaits you ;).


 Since someone (me?) might find this script useful in the future, perhaps
 as the foundation for a more comprehensive database regeneration procedure,
 or a for stress test, I'm pasting it below:


Indeed you did :). Thanks.


-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do. -- Bigby Wolf
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Reconstructing a corrupted Fossil repository

2014-03-25 Thread Andy Goth

On 3/25/2014 1:06 PM, Stephan Beal wrote:

i suspect a re-import from svn is the most expedient route here.


I had considered that, but the subversion repository is hard to get to.
It's on a private network my laptop cannot directly connect to without
the aid of janky security software, then once I've done that, I need
to construct a huge tar archive of the repository and manually transfer
it from one computer to the next to the next to the next since no one
bothered to set up routing.  It takes over an hour, and once was enough.

But once that was done, I would have to re-import the commits I had done
since the initial import, so I'd still be facing my original problem.


incrementally check out svn versions, export them to the same fossil
checkout dir, use (fossil addremove; fossil ci -m 'revision r').


I didn't know [fossil addremove] existed.  That would have eliminated
the need for [fossil changes]!  How convenient.

--
Andy Goth | andrew.m.goth/at/gmail/dot/com
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Reconstructing a corrupted Fossil repository

2014-03-25 Thread Andy Goth

On 3/25/2014 1:23 PM, Andy Goth wrote:

I didn't know [fossil addremove] existed.  That would have eliminated
the need for [fossil changes]!  How convenient.


Here's an updated version of the script that uses [fossil addremove].
It also fixes a bug which included the user and tags in the comments.

#!/usr/bin/env tclsh
set repo1 CORRUPT.fossil
set repo2 REBUILT.fossil
proc fossil {args} {
puts [concat fossil $args]
exec fossil {*}$args
}
set pwd [pwd]
file mkdir tmp
cd tmp
fossil open [file join $pwd $repo1]
foreach line [split [fossil timeline -t ci -n 0 -W 0] \n] {
if {![regexp {^=== (\d{4}-\d\d-\d\d) ===$} $line _ date]
  [regexp {(?x)^(\d\d:\d\d:\d\d)\ \[([[:xdigit:]]+)\]
 \ (?:\*CURRENT\*\ )?(.*)\ \(user:\ .*\ tags:\ .*\)$}\
$line _ time version comment]} {
lappend history [list $date $time $version $comment]
}
}
set date [clock format [clock add [clock scan $date] -1 day]\
-format %Y-%m-%d]
fossil new --date-override $date 00:00:00 [file join $pwd $repo2]
file rename .fslckout [file join $pwd repo1.fslckout]
fossil open [file join $pwd $repo2]
file rename .fslckout [file join $pwd repo2.fslckout]
foreach checkin [lreverse $history] {
lassign $checkin timestamp version comment
file rename [file join $pwd repo1.fslckout] .fslckout
fossil update $version
file rename .fslckout [file join $pwd repo1.fslckout]
file rename [file join $pwd repo2.fslckout] .fslckout
fossil addremove
fossil commit --allow-empty --no-warnings\
--date-override $timestamp --comment $comment
file rename .fslckout [file join $pwd repo2.fslckout]
}
file rename [file join $pwd repo1.fslckout] .fslckout
fossil close
file rename [file join $pwd repo2.fslckout] .fslckout
fossil close
cd $pwd
file delete -force tmp
# vim: set sts=4 sw=4 tw=80 et ft=tcl:

--
Andy Goth | andrew.m.goth/at/gmail/dot/com
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


[fossil-users] Partial hash collision

2014-03-25 Thread Andy Goth
Fossil uses unique prefixes of checksums as identifiers.  What does it 
do when a previously-unique prefix becomes ambiguous due to a new commit?


Also, what happens when an existing comment (or ticket or wiki page or 
whatever) references a no-longer-unique prefix?  Fossil can't rewrite 
the old manifest without changing every checksum forward, so its only 
hope is to change the display, but that leads to more problems.


It's quite likely this has already been discussed and resolved, but I 
haven't been able to track down any emails or documentation on the matter.


I wish I could give you a test case, but the SHA1 function is thankfully 
difficult to invert, even for prefixes. :^)


--
Andy Goth | andrew.m.goth/at/gmail/dot/com
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Partial hash collision

2014-03-25 Thread Andy Goth

On 3/25/2014 4:40 PM, Andreas Kupries wrote:

On Tue, Mar 25, 2014 at 2:28 PM, Andy Goth andrew.m.g...@gmail.com wrote:

Fossil uses unique prefixes of checksums as identifiers.


No, it does not. Fossil stores full identifiers


I was referring only to the display.  Full identifiers are usually shown
only in detail pages and the actual manifests.


and allows you to search for commits by prefix. IOW the prefix thing
is a pure convenience to reduce the amount of stuff to enter.


Understood, but this convenience feature feeds back into the database
when the user enters a prefix into a commit comment or a ticket or a
wiki page.

--
Andy Goth | andrew.m.goth/at/gmail/dot/com
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users