Re: OOM issues: large files AND many commits in SS databases

Kenneth Lakin Thu, 19 Apr 2007 17:43:39 -0700

----- Original Message ----
From: Toby Johnson <[EMAIL PROTECTED]>
To: Vss2Svn Users <vss2svn-users@lists.pumacode.org>
Sent: Thursday, April 19, 2007 3:31:42 PM
Subject: Re: OOM issues: large files AND many commits in SS databases


> Hi Kenneth, thanks for writing. You're losing me a bit here though. The 
> issue in Dumpfile.pm in which it pulls the entire file into memory is 
> certainly an issue (as documented in ticket 25 which you mention) and 
> one that I would like to fix. Doing a buffered read isn't terribly 
> difficult, but rewriting it to write the data directly from source to 
> target file will take some more coding.

Please take everything that I say here with a grain of salt. I've been looking 
at perl -and your script- for about a week.

I'm uncertain that the file write-out in Dumpfile.pm is at the core of my 
issue. 
As I read it, the routine that is performing the un-buffered read is 
Dumpfile::get_export_contents.
This only gets called from Dumpfile::add_handler and commit_handler.
So, if we have an ADD or a COMMIT, we'll eat up more memory, until we get rid 
of those nodes.
So, the memory that we consume with the un-buffered read should be released 
when each revision gets written out. Right?
Along those lines: Aren't all data structures in Dumpfile flushed after each 
revision, except for those in SanityChecker?

> However, I don't understand where you're getting to SanityChecker
> The number of commits (~3000) you're dealing with is certainly not huge ...
> but if you have numerous multi-hundred-megabyte files 

> then that could definitely be a 
> problem. But again, that would be solved by fixing Dumpfile.pm.
I mis-typed.
According to the SvnRevisionVssAction table in vss_data.db,
I have 70,344 VSS actions, and 4,901 SVN revisions for this database.
However, I don't have many multi-hundred-meg files in this database. Out of 
34,593 files in the latest revision, six are greater 
than 100M and 34 are larger than 10M.

> Is there a reason you compiled Perl from source, instead of using 
> ActiveState's binary version?

When I ran the binary version two days ago, it used the OS's native memory 
allocation routines.
So, when perl fails to allocate memory, I get a crash that's handled by the 
Windows error reporting mechanisms; not perl's.
This means that I have no idea what line of the script caused the interpreter 
to fail, as the backtrace that I get from Windows 
starts me off deep in the C code for Perl's memory allocation routines, and 
works up to the C code for *starting* the interpreter...
When using perl's mem alloc routines in the same situation, I get a message 
from perl that says something like
"Invalid request for memory on line 300 in file.pl". This is much more 
informative!

> The second issue:
> I have a patch for DumpFile::get_export_contents() that (for me) works better 
> than the one here:
>
> http://www.pumacode.org/projects/vss2svn/ticket/25
> (I'll submit the patch to the list within 24 hrs.)

The patch is attached. It does two things:
1) It patches output_node to take a reference to the incoming node and 
output_content to take a reference to 
the data that it's going to write out. 
2) It syswrite instead of print to write out that data.
Both of these changes reduce the memory footprint, and enabled me to process 
another database that 
required 1GB of RAM really early in the IMPORTSVN phase.

> My time to work on this project is 
> very rare these days.

Aye, I read as much from the archives. Thanks for your input!

> but I would definitely be interested in reducing 
> the overall memory footprint.

As would I. I can't convert this particular DB until the footprint is reduced!

-Kenneth




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

DumpfileLargeFile.patch
Description: Binary data

_______________________________________________
vss2svn-users mailing list
Project homepage:
http://www.pumacode.org/projects/vss2svn/
Subscribe/Unsubscribe/Admin:
http://lists.pumacode.org/mailman/listinfo/vss2svn-users-lists.pumacode.org
Mailing list web interface (with searchable archives):
http://dir.gmane.org/gmane.comp.version-control.subversion.vss2svn.user

Re: OOM issues: large files AND many commits in SS databases

Reply via email to