Op 11/04/2013 10:25, Pierre Casenove schreef:
Hello,
I have a setup with postgresql 8.4 and 150 clients.
My DB dump (using pg_dump, with -Fc option) is about 1.1 GB.
I still haven't noticed slower operations.
Should I planify VACUUM ANALYZE operations once a month? Could it lower the size of the dump?
Would the script be like:
spacewalk-service stop
su --command "psql -c 'VACUUM VERBOSE ANALYZE;' -d spaceschema" postgres
spacewalk-service start
?

Thanks in advance for your help,

Pierre


On CentOS 6 there's also a command /usr/bin/*vacuumdb *available with a bunch of options that basically does that without you having the know the exact SQL commands. (for 8.4 at least, don't know about 9+)

You can run the basic (lazy) vacuum while spacewalk is running, I run it daily in my database dump script before the dump itself. (/usr/bin/vacuumdb --analyze -h localhost -U postgres spaceschema, needs a .pgpass file to work if you don't add the password to the command) My gzipped database is about 300M now for ~50 clients and ~15.500 packages in repos. Can't say if it has a big impact on size as I configured it this way when I set up the database and I've no further postgres experience.

For the full vacuum you'd stop spacewalk and add --full. (Or add "FULL" to the sql command you mentioned)

Regards,
Jeremy


2013/4/11 Anton Pritchard-Meaker <[email protected] <mailto:[email protected]>>

    Thanks I really appreciate this, I'll definitely look into these
    actions. Downtime is not an issue for my Spacewalk implementation.

    I'm pretty new to PostreSQL, so I was completely unaware of all of
    maintenance tools available which actually sound quite necessary.

    Anton Pritchard-Meaker | Unix Engineer

    ------------------------------------------------------------------------
    *From:* [email protected]
    <mailto:[email protected]>
    [[email protected]
    <mailto:[email protected]>] on behalf of Paul
    Robert Marino [[email protected] <mailto:[email protected]>]
    *Sent:* 10 April 2013 22:39

    *To:* [email protected] <mailto:[email protected]>
    *Subject:* Re: [Spacewalk-list] API calls for new hosts

    In PostgreSQL 8.x auto vacuuming was first being introduced and
    the default settings weren't Ideal. Further more it wasn't a
    complete implementation so standard vacuuming is still necessary
    in 8.x

    In PostgreSQL 9.x  auto vacuuming matured quite a bit a and manual
    vacuuming is needed far less often, but still a good idea to do
    occasionally.

    There are two kinds of vacuuming a lazy vacuum and a full vacuum.
    There are also two other table maintenance task which need to be
    done periodically as well.

    A lazy vacuum does not require an exclusive table lock so in many
    cases may be executed while the database is actively in use;
    however there tend to be tables in spacewalk that constantly have
    lock which may hang the process so its best to schedule occasional
    downtime for this operation. The good new is if you do it on a
    regular basis a lazy vacuum is quick. In addition in PostgreSQL
    9.x the auto vacuum process fairly effectively opportunistically
    tries to do this for you as needed with as little impact as possible.

    A full vacuum requires an exclusive table lock but does a few
    things a lazy vacuum can't. The first thing it does is it flattens
    the MVCC ( MVCC is version control for rows it provides rollback
    capabilities and allows long running queries to complete without
    the results being tainted by data added or deleted after the long
    running query was started). the MVCC needs to be occasionally
    flattened on high volume tables to prevent the version numbers
    from wrapping around (which can potentially cause a sort of data
    corruption); however this is rare and may databases run for years
    without having to worry about this. The major advantage is that a
    Full vacuum can reclaim all of the disk space being used by old
    row versions. the lazy vacuum can only mark the space into a pool
    for recycling (Oracle had the same thing literally called it the
    trash bin last time I worked with it) unless they are at the end
    of the last table file, also in PostgreSQL 8.x the developers
    realized the maximum size limit of recycle pool was too small for
    modern databases so it was increased significantly in 9.x.


    NOTE: a dump and load has the same effect as a full vacuum

    ANALYZE

    Analyzing updates your table statistics. the statistics are used
    by the query planner. what the query planner does is it takes the
    queries you run on the tables and re-optimizes them based on the
    table structure, the fragmentation level of the table, the types
    of sorts, filters the query has, the indexes available and how
    efficient they, are more. the statistics tell the planer how
    efficient different types of operations are based on a series of
    test queries it executed the last time they were updated.
    Analyzing is a non blocking operation however just like lazy
    vacuuming it can get hung up by other queries from spacewalk
    indefinitely, so its best to do it occasionally with spacewalk
    offline.
    Analyzing can be done as part of a vacuum or independently. If
    done independently you can control it to the level where you can
    even tell it just to analyze a specific column; however its
    usually best to do an analyze with a vacuum for most people, only
    very experienced DBAs should consider doing more advanced versions
    of the ANALYZE command .

    NOTE: a dump and load does not do an ANALYZE on the tables.


    REINDEX

    Vacuuming cleans up the table but not cleanup, defragment, or
    resort the indexes so it is important to at least once a year do a
    REINDEX on standard indexes to maintain performance, and more
    often for ordered indexes. A REINDEX can not be done as part of a
    vacuum it is an independent operation. A REINDEX is an exclusive
    locking operation and as such can not be done at the same time as
    any thing else is accessing the table, as such spacewalk should be
    offline during this operation. reindexing is the slowest
    maintenance operation and should only be done after a full vacuum.
    You should also do an ANALYZE after a REINDEX.

    NOTE: a dump and load has the same effect as a REINDEX.


    All of these operations are at the table level except the ANALYZE
    which may be done down to the column level. a REINDEX can also be
    done in the specific index level I think; however its usually most
    efficient to do the whole table at once unless you have an
    unusually large table.

    Finally there are command line tools for vaccum and reindex that
    can operate by sequentially cycling through the tables in the
    database; however if your disks ram and CPU can handle it you can
    run these operations in parallel on different tables to speed
    things up via multiple SQL connections.


**** DISCLAIMER ****
http://www.schaubroeck.be/maildisclaimer.htm
_______________________________________________
Spacewalk-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/spacewalk-list

Reply via email to