Excellent suggestion, Brandon! So it has been implemented in parallel_pg_loader and pg_loader.
2008/8/6 Brandon W. Uhlman <[EMAIL PROTECTED]>: > Thanks, Dan (and also Mike). Great tip! > > I think documenting this is a good piece, for sure. Is there any reason we > also wouldn't want to include it in the default SQL generated by > pg_loader/parallel_pg_loader? > > If we're concerned about it automatically being called without checking the > data, we could include it as a comment in pg_loader_output.sql, just we > currently do the commit, as a visual reminder. > > ~B > > Quoting Dan Scott <[EMAIL PROTECTED]>: > >> Hey Brandon: >> >> The full text indexes are absolutely the key - check out this thread >> from July 2nd: >> >> http://list.georgialibraries.org/pipermail/open-ils-dev/2008-July/003265.html >> - I think it addresses your questions for the most part. >> >> And yeah, as Mike notes, we really should document that in the >> appropriate section of the wiki. Especially as I'm about to embark on >> a refresh of our several-million records :0 >> >> Dan >> >> 2008/8/6 Brandon W. Uhlman <[EMAIL PROTECTED]>: >>> >>> I have about 960 000 bibliographic records I need to import into an >>> Evergreen system. The database server is dual quad-core Xeons with 24GB >>> of >>> RAM. >>> >>> Currently, I've split the bibliographic records into 8 batches of ~120K >>> records each, did the marc_bre/direct_ingest/parellel_pg_loader dance, >>> but >>> one of those files has been chugging along in psql now for more than 16 >>> hours. How long should I expect these files to take? Would more smaller >>> files load more quickly in terms of total time for the same full >>> recordset? >>> >>> I notice that the insert into metabib.full_rec seems to be taking by far >>> the >>> longest. It does have more records than any of the other pieces to >>> import, >>> but the time taken still seems disproportionate. >>> >>> I notice that metabib.full_rec has this trigger -- >>> zzz_update_materialized_simple_record_tgr AFTER INSERT OR DELETE OR >>> UPDATE >>> ON metabib.full_rec FOR EACH ROW EXECUTE PROCEDURE >>> reporter.simple_rec_sync(). >>> Is the COPY INTO calling this trigger every time I copy in a new record? >>> If >>> so, can I remove to trigger to defer this update, and do it en masse >>> afterward? Would it be quicker? >>> >>> Just looking for any tips I can use to increase the loading speed of >>> huge-ish datasets. >>> >>> Cheers, >>> >>> Brandon >>> >>> ====================================== >>> Brandon W. Uhlman, Systems Consultant >>> Public Library Services Branch >>> Ministry of Education >>> Government of British Columbia >>> 850-605 Robson Street >>> Vancouver, BC V6B 5J3 >>> >>> Phone: (604) 660-2972 >>> E-mail: [EMAIL PROTECTED] >>> [EMAIL PROTECTED] >>> >>> >> >> >> >> -- >> Dan Scott >> Laurentian University >> > > > > ====================================== > Brandon W. Uhlman, Systems Consultant > Public Library Services Branch > Ministry of Education > Government of British Columbia > 605 Robson Street, 5th Floor > Vancouver, BC V6B 5J3 > > Phone: (604) 660-2972 > E-mail: [EMAIL PROTECTED] > [EMAIL PROTECTED] > > -- Dan Scott Laurentian University
