Re: [PERFORM] insert vs select into performance

Thomas Finneid Wed, 18 Jul 2007 12:11:21 -0700


PFC wrote:

Unfortunately its not fast enough, it needs to be done in no more than1-2 seconds, ( and in production it will be maybe 20-50 columns ofdata, perhaps divided over 5-10 tables.)Additionally it needs to scale to perhaps three times as many columnsand perhaps 2 - 3 times as many rows in some situation within 1 seconds.Further on it needs to allow for about 20 - 50 clients reading much ofthat data before the next batch of data arrives.
    Wow. What is the application ?

Geological surveys, where they perform realtime geo/hydro-phone shots ofareas of the size of 10x10km every 3-15 seconds.

test=> CREATE OR REPLACE FUNCTION test_insert( )
    RETURNS VOID
    LANGUAGE plpgsql
    AS
$$
DECLARE
    _i INTEGER;
BEGIN
    FOR _i IN 0..100000 LOOP
        INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
    END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms

test=> SELECT test_insert();
 test_insert
-------------

(1 ligne)

Temps : 1885,382 ms

I tested this one and it took 4 seconds, compared to the jdbc insertwhich took 14 seconds, so its a lot faster. but not as fast as theSELECT INTO.

I also tested an INSERT INTO FROM SELECT, which took 1.8 seconds, now weare starting to talk about real performance.

However COPY is much faster because the parsing overhead andde-escaping of data is faster. COPY is optimized for throughput.
    So, advice :
For optimum throughput, have your application build chunks of datainto text files and use COPY. Or if your client lib supports the copyinterface, use it.

I did test COPY, i.e. the jdbc COPY patch for pg 8.1, it performs atapprox 1.8 seconds :) The test was done with text input, I am going totest it with binary input, which I expect will increase the performancewith 20-50%.

All these test have ben performed on a laptop with a Kubuntu 6.10version of pg 8.1 without any special pg performance tuning. So I expectthat compiling lates pg and doing some tuning on it and testing it onthe a representative server will give it an additional boost in performance.

The key here is that with abundance in performance, I can experimentwith the solution in a completely different way than if I had any"artificial" restrictions.

You will need a fast disk system with xlog and data on separatedisks, several CPU cores (1 insert thread will max out 1 core, use theothers for selects), lots of RAM so index updates don't need to seek,and tuning of bgwriter and checkpoints to avoid load spikes.


will have a look at it.

regards

thomas

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [PERFORM] insert vs select into performance

Reply via email to