Quoting Dan Scott <[email protected]>:
On Thu, Jun 06, 2013 at 10:48:32AM -0400, Joe Thornton wrote: The problem is that the database can get pretty bloated with a huge transaction. Smaller transactions might give autovacuum a chance to run in parallel with the import. Also, have you tweaked the database configuration at all for things like maintenance_work_mem and work_mem?
Dan makes an excellent point above? Have you tried running pg_tune on your database server? That will suggest various settings for the database server to take advantage of the hardware better than "factory" defaults. Just using pg_tune's output as your new configuration for Postgres should make a huge improvement in performance.
Parallelizing the import, as Jason suggested, certainly can help. I haven't tried this for quite a while, but a really simplistic way of parallelizing the import would be to take that script with many 1,000 record chunks and splitting it into n-1 separate scripts (where "n" is the number of cores on your database server--again, credit due to Jason), then run each script from a separate psql session. You might run into deadlocks this way, but it's worth a shot.
I found that setting the three internal flags as mentioned in my previous email was all it took to avoid deadlocks during a load. Things may have changed in the two years since, however.
After doing a load with those flags set, you will need to run two scripts that still come with Evergreen:
Open-ILS/src/extras/import/quick_metarecord_map.sql Open-ILS/src/support-scripts/authority_control_fields.pl The latter gets installed in $PREFIX/bin after Evergreen is installed. -- Jason Stephenson Assistant Director for Technology Services Merrimack Valley Library Consortium
