Greetings, First some background: I've been working on converting a rather large (~19 GB) VSS repository to SVN. Yes, it's a mess. The source repository is all full of gunk. But the other issues can be addressed separately.
The issue here is that being such a large repository (the resulting SVN repository is about 22000 revisions), and considering that there are a few files in this repository of several hundred megabytes in size (why, I don't know), memory usage became a huge problem. Note that I've never used VSS in my life, and my background in Perl is fairly limited. I'm just the poor sucker who got tasked with converting this repository. This patch implements something more or less like the recommendation made by Toby here: http://thread.gmane.org/gmane.comp.version-control.subversion.vss2svn.user/1652/focus=1654 It adds a 'file' attribute to the Node class associating a node with the physical file that that node's contents are read from. It also keeps the 'text' attribute around, though now nothing is using it. In Dumpfile.pm it changes get_export_contents to get_export_file which sets the 'file' attribute, but doesn't actually open the file yet. Finally, in output_content, if $node->{text} is undefined, it checks for $node->{file}, and if that exists it uses File::Copy to copy that file's contents into the dump file. I've tested this with a couple smaller repositories, and our huge one, and it works as advertised. Whereas previously my memory usage would slowly increase, slowing my system to a crawl, now it just stays flat since File::Copy only buffers 2MB at a time. So assuming I haven't made any glaring errors, (and please point them out if I have), enjoy :) Erik
diff -ur vss2svn/script/Vss2Svn/Dumpfile/Node.pm vss2svn-working/script/Vss2Svn/Dumpfile/Node.pm --- vss2svn/script/Vss2Svn/Dumpfile/Node.pm 2007-06-11 17:19:29.000000000 -0400 +++ vss2svn-working/script/Vss2Svn/Dumpfile/Node.pm 2007-06-11 17:20:59.000000000 -0400 @@ -32,6 +32,7 @@ copypath => undef, props => undef, hideprops => 0, + file => undef, text => undef, }; diff -ur vss2svn/script/Vss2Svn/Dumpfile.pm vss2svn-working/script/Vss2Svn/Dumpfile.pm --- vss2svn/script/Vss2Svn/Dumpfile.pm 2007-06-11 17:15:42.000000000 -0400 +++ vss2svn-working/script/Vss2Svn/Dumpfile.pm 2007-06-11 17:20:59.000000000 -0400 @@ -9,6 +9,8 @@ use warnings; use strict; +use File::Copy; + our %gHandlers = ( ADD => \&_add_handler, @@ -215,7 +217,7 @@ $node->{action} = 'add'; if ($data->{itemtype} == 2) { - $self->get_export_contents($node, $data, $expdir); + $self->get_export_file($node, $data, $expdir); } # $self->track_modified($data->{physname}, $data->{revision_id}, $itempath); @@ -243,7 +245,7 @@ $node->{action} = 'change'; if ($data->{itemtype} == 2) { - $self->get_export_contents($node, $data, $expdir); + $self->get_export_file($node, $data, $expdir); } # $self->track_modified($data->{physname}, $data->{revision_id}, $itempath); @@ -706,9 +708,9 @@ } # End last_deleted_rev_path ############################################################################### -# get_export_contents +# get_export_file ############################################################################### -sub get_export_contents { +sub get_export_file { my($self, $node, $data, $expdir) = @_; if (!defined($expdir)) { @@ -719,23 +721,10 @@ return 0; } - my $file = "$expdir/$data->{physname}.$data->{version}"; - - if (!open EXP, "$file") { - $self->add_error("Could not open export file '$file'"); - return 0; - } - - binmode(EXP); - -# $node->{text} = join('', <EXP>); - $node->{text} = do { local( $/ ) ; <EXP> } ; - - close EXP; - + $node->{file} = "$expdir/$data->{physname}.$data->{version}"; return 1; -} # End get_export_contents +} # End get_export_file ############################################################################### # output_node @@ -747,18 +736,18 @@ my $string = $node->get_headers(); print $fh $string; $self->output_content($node->{hideprops}? undef : $node->{props}, - $node->{text}); + $node->{text}, $node->{file}); } # End output_node ############################################################################### # output_content ############################################################################### sub output_content { - my($self, $props, $text) = @_; + my($self, $props, $text, $file) = @_; my $fh = $self->{fh}; - $text = '' unless defined $text; + $text = '' unless defined $text || defined $file; my $proplen = 0; my $textlen = 0; @@ -780,7 +769,11 @@ $proplen = length($propout); } - $textlen = length($text); + if(!defined $text && defined $file) { + $textlen = -s $file; + } else { + $textlen = length($text); + } return if ($textlen + $proplen == 0); if ($proplen > 0) { @@ -792,7 +785,14 @@ } print $fh "Content-length: " . ($proplen + $textlen) - . "\n\n$propout$text\n"; + . "\n\n$propout"; + + if(!defined $text && defined $file) { + copy($file, $fh); + print $fh "\n"; + } else { + print $fh "$text\n"; + } } # End output_content
_______________________________________________ vss2svn-users mailing list Project homepage: http://www.pumacode.org/projects/vss2svn/ Subscribe/Unsubscribe/Admin: http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org Mailing list web interface (with searchable archives): http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user