I'm just wondering if there's a more efficient way of handling a certain 
periodic data migration.

We have a pair of tables with this structure:

        table_a__live
                column_1 INT
                column_2 INT
                record_timestamp TIMESTAMP

        table_a__archive
                column_1 INT
                column_2 INT
                record_timestamp TIMESTAMP

periodically, we must migrate items that are 'stale' from `table_a__live ` to 
`table_a__archive`.  The entries are copied over to the archive, then deleted.

The staleness is calculated based on age--  so we need to use INTERVAL.  the 
"live" table can have anywhere from 100k to 20MM records.

the primary key on `table_a__live` is a composite of column_1 & column_2, 

In order to minimize scanning the table, we opted to hint migrations with a 
dedicated column:

        ALTER TABLE table_a__live ADD is_migrate BOOLEAN DEFAULT NULL;
        CREATE INDEX idx_table_a__live_migrate ON table_a__live(is_migrate) 
WHERE is_migrate IS NOT NULL;

so our migration is then based on that `is_migrate` column:

        BEGIN;
        UPDATE table_a__live SET is_migrate = TRUE WHERE record_timestamp < 
transaction_timestamp() AT TIME ZONE 'UTC' - INTERVAL '1 month';
        INSERT INTO table_a__archive (column_1, column_2, record_timestamp) 
SELECT column_1, column_2, record_timestamp FROM table_a__live WHERE is_migrate 
IS TRUE;
        DELETE FROM table_a__live WHERE is_migrate IS TRUE; 
        COMMIT;

The inserts & deletes are blazing fast, but the UPDATE is a bit slow from 
postgres re-writing all the rows.  

can anyone suggest a better approach?

I considered copying everything to a tmp table then inserting/deleting based on 
that table -- but there's a lot of disk-io on that approach too.


fwiw we're on postgres9.6.1

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to