Re: [PERFORM] tuning Postgres for large data import (using Copy from)

John A Meinel Thu, 12 May 2005 07:55:56 -0700

Marc Mamin wrote:

Hello,


I'm not an expert, but I'll give some suggestions.


I'd like to tune Postgres for large data import (using Copy from).


I believe that COPY FROM <file> is supposed to be faster than COPY FROM
STDIN, but <file> must be available to the backend process. If you can
do it, you should think about it, as it eliminates the communication
between the client and the backend.


here are a few steps already done:

1) use 3 different disks for:

        -1: source data
        -2: index tablespaces
        -3: data tablespaces


Make sure pg_xlog is on it's own filesystem. It contains the
write-ahead-log, and putting it by itself keeps the number of seeks
down. If you are constrained, I think pg_xlog is more important than
moving the index tablespaces.

        
2) define all foreign keys as initially deferred


3) tune some parameters:

        max_connections =20
        shared_buffers =30000
        work_mem = 8192
        maintenance_work_mem = 32768
        checkpoint_segments = 12

        (I also modified the kernel accordingly)


Don't forget to increase your free space map if you are going to be
doing deletes frequently.


4) runs VACUUM regulary


The server runs RedHat and has 1GB RAM

In the production (which may run on a better server), I plan to:

- import a few millions rows per day,
- keep up to ca 100 millions rows in the db
- delete older data


I've seen a few posting on hash/btree indexes, which say that hash index do
not work very well in Postgres;
currently, I only use btree indexes. Could I gain performances whole using
hash indexes as well ?

I doubt it.

How does Postgres handle concurrent copy from on: same table / different
tables ?


I think it is better with different tables. If using the same table, and
there are indexes, it has to grab a lock for updating the index, which
causes contention between 2 processes writing to the same table.


I'd be glad on any further suggestion on how to further increase my
performances.


Since you are deleting data often, and copying often, I might recommend
using a partition scheme with a view to bind everything together. That
way you can just drop the old table rather than doing a delete. I don't
know how this would affect foreign key references.

But basically you can create a new table, and do a copy without having
any indexes, then build the indexes, analyze, update the view.

And when deleting you can update the view, and drop the old table.

Something like this:

CREATE TABLE table_2005_05_11 AS (blah);
COPY FROM ... ;
CREATE INDEX blah ON table_2005_05_11(blah);
CREATE OR REPLACE VIEW table AS
        SELECT * FROM table_2005_05_10
        UNION ALL SELECT * FROM table_2005_05_11;
VACUUM ANALYZE table_2005_05_11;
...

John
=:->


Marc

signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] tuning Postgres for large data import (using Copy from)

Reply via email to