PFC wrote:
Unfortunately its not fast enough, it needs to be done in no more than 1-2 seconds, ( and in production it will be maybe 20-50 columns of data, perhaps divided over 5-10 tables.) Additionally it needs to scale to perhaps three times as many columns and perhaps 2 - 3 times as many rows in some situation within 1 seconds. Further on it needs to allow for about 20 - 50 clients reading much of that data before the next batch of data arrives.

    Wow. What is the application ?

Geological surveys, where they perform realtime geo/hydro-phone shots of areas of the size of 10x10km every 3-15 seconds.


test=> CREATE OR REPLACE FUNCTION test_insert( )
    RETURNS VOID
    LANGUAGE plpgsql
    AS
$$
DECLARE
    _i INTEGER;
BEGIN
    FOR _i IN 0..100000 LOOP
        INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
    END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms

test=> SELECT test_insert();
 test_insert
-------------

(1 ligne)

Temps : 1885,382 ms

I tested this one and it took 4 seconds, compared to the jdbc insert which took 14 seconds, so its a lot faster. but not as fast as the SELECT INTO.

I also tested an INSERT INTO FROM SELECT, which took 1.8 seconds, now we are starting to talk about real performance.


However COPY is much faster because the parsing overhead and de-escaping of data is faster. COPY is optimized for throughput.

    So, advice :

For optimum throughput, have your application build chunks of data into text files and use COPY. Or if your client lib supports the copy interface, use it.

I did test COPY, i.e. the jdbc COPY patch for pg 8.1, it performs at approx 1.8 seconds :) The test was done with text input, I am going to test it with binary input, which I expect will increase the performance with 20-50%.

All these test have ben performed on a laptop with a Kubuntu 6.10 version of pg 8.1 without any special pg performance tuning. So I expect that compiling lates pg and doing some tuning on it and testing it on the a representative server will give it an additional boost in performance.

The key here is that with abundance in performance, I can experiment with the solution in a completely different way than if I had any "artificial" restrictions.

You will need a fast disk system with xlog and data on separate disks, several CPU cores (1 insert thread will max out 1 core, use the others for selects), lots of RAM so index updates don't need to seek, and tuning of bgwriter and checkpoints to avoid load spikes.

will have a look at it.

regards

thomas

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Reply via email to