Thanks, Dan (and also Mike). Great tip!
I think documenting this is a good piece, for sure. Is there any
reason we also wouldn't want to include it in the default SQL
generated by pg_loader/parallel_pg_loader?
If we're concerned about it automatically being called without
checking the data, we could include it as a comment in
pg_loader_output.sql, just we currently do the commit, as a visual
reminder.
~B
Quoting Dan Scott <[EMAIL PROTECTED]>:
Hey Brandon:
The full text indexes are absolutely the key - check out this thread
from July 2nd:
http://list.georgialibraries.org/pipermail/open-ils-dev/2008-July/003265.html
- I think it addresses your questions for the most part.
And yeah, as Mike notes, we really should document that in the
appropriate section of the wiki. Especially as I'm about to embark on
a refresh of our several-million records :0
Dan
2008/8/6 Brandon W. Uhlman <[EMAIL PROTECTED]>:
I have about 960 000 bibliographic records I need to import into an
Evergreen system. The database server is dual quad-core Xeons with 24GB of
RAM.
Currently, I've split the bibliographic records into 8 batches of ~120K
records each, did the marc_bre/direct_ingest/parellel_pg_loader dance, but
one of those files has been chugging along in psql now for more than 16
hours. How long should I expect these files to take? Would more smaller
files load more quickly in terms of total time for the same full recordset?
I notice that the insert into metabib.full_rec seems to be taking by far the
longest. It does have more records than any of the other pieces to import,
but the time taken still seems disproportionate.
I notice that metabib.full_rec has this trigger --
zzz_update_materialized_simple_record_tgr AFTER INSERT OR DELETE OR UPDATE
ON metabib.full_rec FOR EACH ROW EXECUTE PROCEDURE
reporter.simple_rec_sync().
Is the COPY INTO calling this trigger every time I copy in a new record? If
so, can I remove to trigger to defer this update, and do it en masse
afterward? Would it be quicker?
Just looking for any tips I can use to increase the loading speed of
huge-ish datasets.
Cheers,
Brandon
======================================
Brandon W. Uhlman, Systems Consultant
Public Library Services Branch
Ministry of Education
Government of British Columbia
850-605 Robson Street
Vancouver, BC V6B 5J3
Phone: (604) 660-2972
E-mail: [EMAIL PROTECTED]
[EMAIL PROTECTED]
--
Dan Scott
Laurentian University
======================================
Brandon W. Uhlman, Systems Consultant
Public Library Services Branch
Ministry of Education
Government of British Columbia
605 Robson Street, 5th Floor
Vancouver, BC V6B 5J3
Phone: (604) 660-2972
E-mail: [EMAIL PROTECTED]
[EMAIL PROTECTED]