Marc Mamin wrote:
Hello,


I'm not an expert, but I'll give some suggestions.


I'd like to tune Postgres for large data import (using Copy from).


I believe that COPY FROM <file> is supposed to be faster than COPY FROM STDIN, but <file> must be available to the backend process. If you can do it, you should think about it, as it eliminates the communication between the client and the backend.


here are a few steps already done:



1) use 3 different disks for:

        -1: source data
        -2: index tablespaces
        -3: data tablespaces
        

Make sure pg_xlog is on it's own filesystem. It contains the write-ahead-log, and putting it by itself keeps the number of seeks down. If you are constrained, I think pg_xlog is more important than moving the index tablespaces.

        
2) define all foreign keys as initially deferred


3) tune some parameters:



        max_connections =20
        shared_buffers =30000
        work_mem = 8192
        maintenance_work_mem = 32768
        checkpoint_segments = 12

        (I also modified the kernel accordingly)


Don't forget to increase your free space map if you are going to be doing deletes frequently.




4) runs VACUUM regulary


The server runs RedHat and has 1GB RAM

In the production (which may run on a better server), I plan to:

- import a few millions rows per day,
- keep up to ca 100 millions rows in the db
- delete older data




I've seen a few posting on hash/btree indexes, which say that hash index do not work very well in Postgres; currently, I only use btree indexes. Could I gain performances whole using hash indexes as well ?

I doubt it.

How does Postgres handle concurrent copy from on: same table / different
tables ?


I think it is better with different tables. If using the same table, and there are indexes, it has to grab a lock for updating the index, which causes contention between 2 processes writing to the same table.


I'd be glad on any further suggestion on how to further increase my performances.


Since you are deleting data often, and copying often, I might recommend using a partition scheme with a view to bind everything together. That way you can just drop the old table rather than doing a delete. I don't know how this would affect foreign key references.

But basically you can create a new table, and do a copy without having
any indexes, then build the indexes, analyze, update the view.

And when deleting you can update the view, and drop the old table.

Something like this:

CREATE TABLE table_2005_05_11 AS (blah);
COPY FROM ... ;
CREATE INDEX blah ON table_2005_05_11(blah);
CREATE OR REPLACE VIEW table AS
        SELECT * FROM table_2005_05_10
        UNION ALL SELECT * FROM table_2005_05_11;
VACUUM ANALYZE table_2005_05_11;
...

John
=:->




Marc






Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to