----- Original Message ---- From: Toby Johnson <[EMAIL PROTECTED]> To: Vss2Svn Users <vss2svn-users@lists.pumacode.org> Sent: Thursday, April 19, 2007 3:31:42 PM Subject: Re: OOM issues: large files AND many commits in SS databases
> Hi Kenneth, thanks for writing. You're losing me a bit here though. The > issue in Dumpfile.pm in which it pulls the entire file into memory is > certainly an issue (as documented in ticket 25 which you mention) and > one that I would like to fix. Doing a buffered read isn't terribly > difficult, but rewriting it to write the data directly from source to > target file will take some more coding. Please take everything that I say here with a grain of salt. I've been looking at perl -and your script- for about a week. I'm uncertain that the file write-out in Dumpfile.pm is at the core of my issue. As I read it, the routine that is performing the un-buffered read is Dumpfile::get_export_contents. This only gets called from Dumpfile::add_handler and commit_handler. So, if we have an ADD or a COMMIT, we'll eat up more memory, until we get rid of those nodes. So, the memory that we consume with the un-buffered read should be released when each revision gets written out. Right? Along those lines: Aren't all data structures in Dumpfile flushed after each revision, except for those in SanityChecker? > However, I don't understand where you're getting to SanityChecker > The number of commits (~3000) you're dealing with is certainly not huge ... > but if you have numerous multi-hundred-megabyte files > then that could definitely be a > problem. But again, that would be solved by fixing Dumpfile.pm. I mis-typed. According to the SvnRevisionVssAction table in vss_data.db, I have 70,344 VSS actions, and 4,901 SVN revisions for this database. However, I don't have many multi-hundred-meg files in this database. Out of 34,593 files in the latest revision, six are greater than 100M and 34 are larger than 10M. > Is there a reason you compiled Perl from source, instead of using > ActiveState's binary version? When I ran the binary version two days ago, it used the OS's native memory allocation routines. So, when perl fails to allocate memory, I get a crash that's handled by the Windows error reporting mechanisms; not perl's. This means that I have no idea what line of the script caused the interpreter to fail, as the backtrace that I get from Windows starts me off deep in the C code for Perl's memory allocation routines, and works up to the C code for *starting* the interpreter... When using perl's mem alloc routines in the same situation, I get a message from perl that says something like "Invalid request for memory on line 300 in file.pl". This is much more informative! > The second issue: > I have a patch for DumpFile::get_export_contents() that (for me) works better > than the one here: > > http://www.pumacode.org/projects/vss2svn/ticket/25 > (I'll submit the patch to the list within 24 hrs.) The patch is attached. It does two things: 1) It patches output_node to take a reference to the incoming node and output_content to take a reference to the data that it's going to write out. 2) It syswrite instead of print to write out that data. Both of these changes reduce the memory footprint, and enabled me to process another database that required 1GB of RAM really early in the IMPORTSVN phase. > My time to work on this project is > very rare these days. Aye, I read as much from the archives. Thanks for your input! > but I would definitely be interested in reducing > the overall memory footprint. As would I. I can't convert this particular DB until the footprint is reduced! -Kenneth __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
DumpfileLargeFile.patch
Description: Binary data
_______________________________________________ vss2svn-users mailing list Project homepage: http://www.pumacode.org/projects/vss2svn/ Subscribe/Unsubscribe/Admin: http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org Mailing list web interface (with searchable archives): http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user