On Wed, Jul 2, 2008 at 8:59 AM, Dan Wells <[EMAIL PROTECTED]> wrote: > Hello all, > > Thanks to Reuben and Todd for the suggestions. I am somewhat embarrassed to > say that it turned out that a thorough vacuuming/analyzation was all that was > needed. I had tried vacuuming tables which I felt were relevant to > searching, but I certainly had not done them all, and perhaps the analyze > part may have been the most important anyway. Either way, I vote that a > vacuum/analyze final step be added to the import wiki page, as this has > proven to be absolutely necessary for a load of this size. > > The full vacuum did reveal another oddity, so here is some hopefully > interesting and useful information. The table which ended up needing the > most cleaning out turned out to be reporter.materialized_simple_record. The > reason this happened is that there is a trigger on metabib.full_rec which > syncs this reporter table on every update of full_rec. This of course keeps > the data coherent under normal circumstances, but is a lot of extra (and I > think unnecessary) work when doing a basic bulk load. The end result, in my > case, was a table with 100,000 good rows (one for each record) and 5,825,699 > dead rows (one for every single field in my dataset, when adding the 100,000 > back in). This also chewed up around 2GB of disk space (which mattered to me > but maybe not to many). > > Here is my solution to this issue: > > Before running pg_loader-output.sql, run the following SQL statement: > > ALTER TABLE metabib.full_rec DISABLE TRIGGER > zzz_update_materialized_simple_record_tgr; > > After running pg_loader-output.sql, run the following SQL statements: > > INSERT INTO reporter.materialized_simple_record SELECT DISTINCT ON (id) * > FROM reporter.old_super_simple_record WHERE id >=1; > ALTER TABLE metabib.full_rec ENABLE TRIGGER > zzz_update_materialized_simple_record_tgr; > > > Doing this not only saved me from needing to reclaim the 2GB of space, but it > also reduced significantly the time needed to run pg_loader-output.sql. I > know there are currently other much slower steps in the load process, but > every bit of time savings helps. These steps assume you are loading into a > completely empty database, but I think they would work with live data > provided you emptied reporter.materialized_simple_record before filling it up > again. > > What do others think? >
I think great minds think alike :) ... The database upgrade script for 1.2.2.0 to 1.2.2.1 includes three functions meant to address the issues surrounding that pseudo-materialized view: reporter.disable_materialized_simple_record_trigger() -- removes the trigger, does not remove the data reporter.enable_materialized_simple_record_trigger() -- truncates the mat-view, adds data back, adds trigger reporter.refresh_materialized_simple_record() -- calls disable_ and then enable_ I agree that these should be integrated into the installation (and upgrade/batch loading) instructions --miker > Thanks, > DW > > >>>> "Reuben Pasquini" <[EMAIL PROTECTED]> 6/30/2008 11:28 AM >>> > Try vacuuming the database. > $ vacuumdb -U postgres --all --full --analyze > > http://www.postgresql.org/docs/8.2/static/maintenance.html > > You may also want to increase the default number > of shared-memory buffers postgres can use to > stash data in memory. > http://www.postgresql.org/docs/8.2/static/runtime-config.html > > Good luck! > Reuben > > > >>>> "Dan Wells" <[EMAIL PROTECTED]> 6/30/2008 9:40 AM >>> > Hello all, > > I have been spending the last few weeks trying out different methods > for loading a batch of around 100,000 records. I have had varying > degrees of success with getting the load to "work," including copy level > information, but the last few loads seem to have somehow developed a > search speed problem. I say "developed" because I am pretty sure that > it wasn't horribly slow the first few times I did the load, but has > gotten progressively worse to the point that any keyword search which > returns several thousand results no longer completes in the OPAC, and > takes from 45 seconds to several minutes to process in postgres. I am > not certain that it has gotten worse, but I am sure that it is currently > quite bad. > > In between each load I have been running build-db.sh, and even dropped > and recreated my entire evergreen database before a recent load. Is > there something else I need to do in order to get an entirely clean > slate? This project has been my first experience with postgres, so I > may be missing something rather obvious. > > My server specs are somewhat modest but by no means slow; I am > currently running a RedHat ES 5 virtual machine on an Opteron 2.4Ghz > with 1GB of dedicated RAM. > > Any suggestions? Naturally one might suspect an index problem. Is > there any way to verify that all relevant indexes are working, and/or to > simply rebuild them all? > > Thanks, > DW > > > > -- Mike Rylander | VP, Research and Design | Equinox Software, Inc. / The Evergreen Experts | phone: 1-877-OPEN-ILS (673-6457) | email: [EMAIL PROTECTED] | web: http://www.esilibrary.com
