More investigation points the finger at Perl DBI and/or DBD::Pg. They are now apparently caching all of the results before returning anything. My reading of the documentation seems to imply that this was always the case, however I was able to dump 2.7 million records with Perl 5.14 on a server with 8GB of RAM without running out of memory. With Perl 5.18+ this appears to no longer be possible, YMMV.
It looks like a comprehensive fix could be found in teaching DBD::Pg to use row caching: https://rt.cpan.org/Public/Bug/Display.html?id=93266 On 03/10/2017 11:11 AM, Jason Stephenson wrote: > Hi, all. > > NOTE: This is not https://bugs.launchpad.net/evergreen/+bug/1671845. It > may be related or have a similar cause, but the experience/symptoms are > completely different. > > At this point, consider this a head's up, as well as a problem > description that I don't yet think I have enough information to file as > a bug report. It is also a request for anyone who wants to double check > this report and to help with debugging. > > I've noticed some bizarre behavior with DBI, MARC::Record, and writing > to a file with Perl version 5.20 and 5.22. (These versions of Perl ship > with Debian 8 Jessie and Ubuntu 16.04 Xenial Xerus, respectively.) > > I have a script (https://github.com/Dyrcona/boopsie) that I use to make > a weekly extract of records to send to Boopsie, Inc. on behalf of our > member libraries that use their app. > > What I have seen is that when run with the aforementioned versions of > Perl, the program consumes all of the RAM on the server and gets killed > by OOM killer. No output ever reaches the file. This suggests to me that > the problem occurs in the main loop with extract and converting the > MARCXML from the database, though it could be the Perl output buffer run > amok. > > The main loop of my program is similar, though less complicated, than > that of marc_export. I tried marc_export to see if it would have the > same problem. When extracting my whole database, it does: > > marc_export -a -e UTF-8 > all.mrc > > It also crashes if fed the output of an equivalent psql query to extract > all of the record ids, or if a file of all record ids is piped into > marc_export. It makes no difference if the output format is USMARC or > MARCXML. > > I can split this up into batches of 50,000 or so records (quite possibly > more) and all is well. I figured this out by dumping records for a > branch with around 51,000 items and that worked. My whole database has > just over 2.7 million, non-deleted bib records. > > This worked on Perl version 5.14 on Debian 7 Wheezy and on Ubuntu 14.04 > Trusty Tahr. > > I hope to run marc_export with the Perl debugger to figure out the exact > cause. Until this is fixed, I'm using a work around in my scripts of > dumping MARCXML batches and converting them to USMARC and putting them > into 1 file with yaz-marcdump. This seems to work in light of the Lp bug > mentioned in the NOTE. > > Any and all information, contradictory or otherwise, from those using > Debian 8 or Ubuntu 16.04 is most welcome. > > Jason >