[OPEN-ILS-DEV] Simple record materialized view - possible improvements

Brandon W. Uhlman Thu, 07 Aug 2008 16:08:41 -0700

Hi,

I'm going to try to talk my way through the current implementation ofmaterialized views in Evergreen, specifically the one for simplerecord extracts that hangs off of metabib.full_rec, and then posit analternative. In my head, it should result in no change in generalperformance, and some improvement in performance in some cases (themarc2bre/direct_ingest/pg_loader dance).

Currently on an insert or update to metabib.full_rec,zzz_update_materialized_simple_record_tgr is called, whichmaterializes the changes from the full record to the simple recordextracts. This makes importing a very large set or sets of bib dataincredibly slow.

The solution as discussed briefly on this list yesterday, and in moredetail last month(http://list.georgialibraries.org/pipermail/open-ils-dev/2008-July/003265.html), is to use disable_materialized_simple_record_trigger() and enable_materialized_simple_record_trigger(),which:


- remove the trigger

- truncate the materialized view table (that is, empty it), refreshthe data, and replace the trigger


respectively.

For continually growing datasets, this means, however, that the sizeof the table to materialize upon reactivating the view is notproportional to the size of the data set just added, but to the entiredataset, including that which was previously materialized. For verylarge datasets, the (computational) cost of rebuilding this data couldbe significant; in addition, the functionality provided by this datawould not be available while you truncate the table (*).

My proposal is to make the materialization process a little lazier. Orto quote Montgomery Scott, chief engineer of the Enterprise, to lockthe rematerialization subroutine directly into the transporter'spattern buffer.

First, let's add a staleness bit/boolean to metabib.full_rec. Thestaleness bit gets set to true if the row is updated or inserted whenchanges to the simple record view are not being materialized -- thatis, because disable_materialized_simple_record_trigger() has beencalled. On reenabling the trigger, instead of truncating the entiretable and rebuilding from scratch, only rebuild those entries whichhave the staleness bit set.

If we're feeling particularly irreverent, we could call the column inthe database that says whether something is waiting to berematerialized 'in_pattern_buffer'.

Ok, so I just re-enabled the trigger on a dataset of about 920,000records, and it only took 15 minutes. I imagined it would take longer.Nonetheless, I am philosophically opposed to doing work I don't thinkI have to, so I'm putting this idea out to gather moss.


Thoughts?

~B

(*) I guess it would if you wrapped the whole thing in one bigtransaction, but the whole reason I'm thinking of this is that Idisabled the trigger outside a transaction, because I had multipleimports to do. I'm not quite sure how transactions work in PostgreSQL.Do I cost myself any performance by having a single "giganticallyenormous" transaction, instead of a bunch of "just pretty big" ones?


======================================
Brandon W. Uhlman, Systems Consultant
Public Library Services Branch
Ministry of Education
Government of British Columbia
605 Robson Street, 5th Floor
Vancouver, BC  V6B 5J3

Phone: (604) 660-2972
E-mail: [EMAIL PROTECTED]
        [EMAIL PROTECTED]

[OPEN-ILS-DEV] Simple record materialized view - possible improvements

Reply via email to