On Wed, Jul 2, 2008 at 8:59 AM, Dan Wells <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> Thanks to Reuben and Todd for the suggestions.  I am somewhat embarrassed to 
> say that it turned out that a thorough vacuuming/analyzation was all that was 
> needed.  I had tried vacuuming tables which I felt were relevant to 
> searching, but I certainly had not done them all, and perhaps the analyze 
> part may have been the most important anyway.  Either way, I vote that a 
> vacuum/analyze final step be added to the import wiki page, as this has 
> proven to be absolutely necessary for a load of this size.
>
> The full vacuum did reveal another oddity, so here is some hopefully 
> interesting and useful information.  The table which ended up needing the 
> most cleaning out turned out to be reporter.materialized_simple_record.  The 
> reason this happened is that there is a trigger on metabib.full_rec which 
> syncs this reporter table on every update of full_rec.  This of course keeps 
> the data coherent under normal circumstances, but is a lot of extra (and I 
> think unnecessary) work when doing a basic bulk load.  The end result, in my 
> case, was a table with 100,000 good rows (one for each record) and 5,825,699 
> dead rows (one for every single field in my dataset, when adding the 100,000 
> back in).  This also chewed up around 2GB of disk space (which mattered to me 
> but maybe not to many).
>
> Here is my solution to this issue:
>
> Before running pg_loader-output.sql, run the following SQL statement:
>
> ALTER TABLE metabib.full_rec DISABLE TRIGGER 
> zzz_update_materialized_simple_record_tgr;
>
> After running pg_loader-output.sql, run the following SQL statements:
>
> INSERT INTO reporter.materialized_simple_record SELECT DISTINCT ON (id) * 
> FROM reporter.old_super_simple_record WHERE id >=1;
> ALTER TABLE metabib.full_rec ENABLE TRIGGER 
> zzz_update_materialized_simple_record_tgr;
>
>
> Doing this not only saved me from needing to reclaim the 2GB of space, but it 
> also reduced significantly the time needed to run pg_loader-output.sql.  I 
> know there are currently other much slower steps in the load process, but 
> every bit of time savings helps.  These steps assume you are loading into a 
> completely empty database, but I think they would work with live data 
> provided you emptied reporter.materialized_simple_record before filling it up 
> again.
>
> What do others think?
>

I think great minds think alike :) ... The database upgrade script for
1.2.2.0 to 1.2.2.1 includes three functions meant to address the
issues surrounding that pseudo-materialized view:

 reporter.disable_materialized_simple_record_trigger() -- removes the
trigger, does not remove the data
 reporter.enable_materialized_simple_record_trigger() -- truncates the
mat-view, adds data back, adds trigger
 reporter.refresh_materialized_simple_record() -- calls disable_ and
then enable_

I agree that these should be integrated into the installation (and
upgrade/batch loading) instructions

--miker

> Thanks,
> DW
>
>
>>>> "Reuben Pasquini" <[EMAIL PROTECTED]> 6/30/2008 11:28 AM >>>
> Try vacuuming the database.
>     $ vacuumdb -U postgres --all --full --analyze
>
> http://www.postgresql.org/docs/8.2/static/maintenance.html
>
> You may also want to increase the default number
> of shared-memory buffers postgres can use to
> stash data in memory.
>    http://www.postgresql.org/docs/8.2/static/runtime-config.html
>
> Good luck!
> Reuben
>
>
>
>>>> "Dan Wells" <[EMAIL PROTECTED]> 6/30/2008 9:40 AM >>>
> Hello all,
>
> I have been spending the last few weeks trying out different methods
> for loading a batch of around 100,000 records.  I have had varying
> degrees of success with getting the load to "work," including copy level
> information, but the last few loads seem to have somehow developed a
> search speed problem.  I say "developed" because I am pretty sure that
> it wasn't horribly slow the first few times I did the load, but has
> gotten progressively worse to the point that any keyword search which
> returns several thousand results no longer completes in the OPAC, and
> takes from 45 seconds to several minutes to process in postgres.  I am
> not certain that it has gotten worse, but I am sure that it is currently
> quite bad.
>
> In between each load I have been running build-db.sh, and even dropped
> and recreated my entire evergreen database before a recent load.  Is
> there something else I need to do in order to get an entirely clean
> slate?  This project has been my first experience with postgres, so I
> may be missing something rather obvious.
>
> My server specs are somewhat modest but by no means slow; I am
> currently running a RedHat ES 5 virtual machine on an Opteron 2.4Ghz
> with 1GB of dedicated RAM.
>
> Any suggestions?  Naturally one might suspect an index problem.  Is
> there any way to verify that all relevant indexes are working, and/or to
> simply rebuild them all?
>
> Thanks,
> DW
>
>
>
>



-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone: 1-877-OPEN-ILS (673-6457)
 | email: [EMAIL PROTECTED]
 | web: http://www.esilibrary.com

Reply via email to