Quoting Dan Scott <[email protected]>:

On Thu, Jun 06, 2013 at 10:48:32AM -0400, Joe Thornton wrote:
The problem is that the database can get pretty bloated with a huge
transaction. Smaller transactions might give autovacuum a chance to run
in parallel with the import. Also, have you tweaked the database
configuration at all for things like maintenance_work_mem and work_mem?

Dan makes an excellent point above? Have you tried running pg_tune on your database server? That will suggest various settings for the database server to take advantage of the hardware better than "factory" defaults. Just using pg_tune's output as your new configuration for Postgres should make a huge improvement in performance.

Parallelizing the import, as Jason suggested, certainly can help. I
haven't tried this for quite a while, but a really simplistic way of
parallelizing the import would be to take that script with many 1,000
record chunks and splitting it into n-1 separate scripts (where "n" is
the number of cores on your database server--again, credit due to
Jason), then run each script from a separate psql session. You might run
into deadlocks this way, but it's worth a shot.

I found that setting the three internal flags as mentioned in my previous email was all it took to avoid deadlocks during a load. Things may have changed in the two years since, however.

After doing a load with those flags set, you will need to run two scripts that still come with Evergreen:

Open-ILS/src/extras/import/quick_metarecord_map.sql
Open-ILS/src/support-scripts/authority_control_fields.pl

The latter gets installed in $PREFIX/bin after Evergreen is installed.


--
Jason Stephenson
Assistant Director for Technology Services
Merrimack Valley Library Consortium

Reply via email to