Quoting Chris Roosendaal <[email protected]>:

[Snip]

First of all we've started import on the testing server and production
server at the same time to compare the performance.
The postgresql configuration and the source MARC XML data are the same
in both cases (it's important, I guess).

You'll want to run pgtune on both servers to get maximum performance. The production server should NOT be configured the same as the testing server, or it will not perform at full capacity.

[Snip]

The times you reported are not unusual for a series load on your testing server. Given that your production server isn't optimized, I wouldn't expect it to perform much better than testing. However, it taking 3 days longer just doesn't seem right, unless it really is running on a base configuration.

[Snip]

Can it be true that import performance can be decreased to 2-3 times
with LOCALE settings, as tsearch2 needs to find the character with
diacritics and replace it by the same character without?

Yes. You really want to use C collation and C locale for best performance. The createdb statement in the README, is your best bet.


Something else you might want to do are set the following config.internal_flag entries to true while loading records:

ingest.metarecord_mapping.skip_on_insert
ingest.disable_authority_linking
ingest.assume_inserts_only

Remember to set them back to FALSE after the load. You'll also need to run the quick_metarecord SQL script after the load.

If you can split your files up further into batches of about 10,000 each and load them in parallel with a number of loaders running equal to number of cores on the database server -1 , you will likely load them more quickly.

I was able to load 900 000 bibs in 7 hours on a server with 8 cores and 24GB of RAM using batches of 10 000 records, 7 load threads, C collations and the above settings. NB: This was NOT using parallel_pg_loader script from extras. It was using a custom bib loader that would need a lot of work to be made generically useful to others.

HtH,
Jason Stephenson
Merrimack Vally Library Consortium


Reply via email to