Re: [PERFORM] insert vs select into performance

2007-07-23 Thread Michael Stone

On Wed, Jul 18, 2007 at 09:13:14PM +0200, Thomas Finneid wrote:

Michael Stone wrote:

I don't understand how the insert you described is table to table?


SELECT INTO is table to table, so is INSERT INTO SELECT FROM.


I could have sworn that at least one of the examples you gave didn't 
have any select. Doesn't really matter.


Mike Stone

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread PFC


It's the time to parse statements, plan, execute, roundtrips with  
the client, context switches, time for your client library to escape  
the data and encode it and for postgres to decode it, etc. In a word :  
OVERHEAD.


I know there is some overhead, but that much when running it batched...?


Well, yeah ;)

Unfortunately its not fast enough, it needs to be done in no more than  
1-2 seconds, ( and in production it will be maybe 20-50 columns of data,  
perhaps divided over 5-10 tables.)
Additionally it needs to scale to perhaps three times as many columns  
and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
Further on it needs to allow for about 20 - 50 clients reading much of  
that data before the next batch of data arrives.


Wow. What is the application ?

	Test run on a desktop PC, Athlon 64 3200+, 2 IDE disks in RAID1 (pretty  
slow) :


test= CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 11,463 ms

test= INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 10  
) as a;

INSERT 0 10
Temps : 721,579 ms

	OK, so you see, insert speed is pretty fast. With a better CPU and faster  
disks, you can get a lot more.


test= TRUNCATE TABLE test;
TRUNCATE TABLE
Temps : 30,010 ms

test= ALTER TABLE test ADD PRIMARY KEY (f);
INFO:  ALTER TABLE / ADD PRIMARY KEY créera un index implicite «test_pkey»  
pour la table «test»

ALTER TABLE
Temps : 100,577 ms

test= INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 10  
) as a;

INSERT 0 10
Temps : 1915,928 ms

This includes the time to update the index.

test= DROP TABLE test;
DROP TABLE
Temps : 28,804 ms

test= CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 1,626 ms

test= CREATE OR REPLACE FUNCTION test_insert( )
RETURNS VOID
LANGUAGE plpgsql
AS
$$
DECLARE
_i INTEGER;
BEGIN
FOR _i IN 0..10 LOOP
INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms

test= SELECT test_insert();
 test_insert
-

(1 ligne)

Temps : 1885,382 ms

	Now you see, performing 100K individual inserts inside a plpgsql function  
is also fast.
	The postgres engine is pretty damn fast ; it's the communication overhead  
that you feel, especially switching between client and server processes.


Another example :

= INSERT INTO test (a,b,c,d,e,f) VALUES (... 10 integer tuples)
INSERT 0 10
Temps : 1836,458 ms

	VALUES is actually pretty fast. Here, there is no context switch,  
everything is done in 1 INSERT.


	However COPY is much faster because the parsing overhead and de-escaping  
of data is faster. COPY is optimized for throughput.


So, advice :

	For optimum throughput, have your application build chunks of data into  
text files and use COPY. Or if your client lib supports the copy  
interface, use it.
	You will need a fast disk system with xlog and data on separate disks,  
several CPU cores (1 insert thread will max out 1 core, use the others for  
selects), lots of RAM so index updates don't need to seek, and tuning of  
bgwriter and checkpoints to avoid load spikes.


























---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Michael Stone

On Tue, Jul 17, 2007 at 10:58:01PM +0200, Thomas Finneid wrote:
I am not sure I understand you correctly here, are you saying that 
SELECT INTO in 8.1 disables WAL logging and uses just a single fsync at 
the end? in that case it means that I could disable WAL as well and 
achieve the same performance, does it not?


Yes. The difference is that the select into optimization just means that 
if the system crashes the data you're inserting is invalid (and is 
properly cleaned up), and disabling the WAL means that if the system 
crashes everything is invalid (and can't be cleaned up). 


Mike Stone

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Michael Stone

On Tue, Jul 17, 2007 at 11:01:15PM +0200, Thomas Finneid wrote:

Arjen van der Meijden wrote:
Have you also tried the COPY-statement? Afaik select into is similar to 
what happens in there.


No, because it only works on file to db or vice versa not table to table.


I don't understand how the insert you described is table to table?

Mike Stone

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Thomas Finneid


PFC wrote:
Unfortunately its not fast enough, it needs to be done in no more than 
1-2 seconds, ( and in production it will be maybe 20-50 columns of 
data, perhaps divided over 5-10 tables.)
Additionally it needs to scale to perhaps three times as many columns 
and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
Further on it needs to allow for about 20 - 50 clients reading much of 
that data before the next batch of data arrives.


Wow. What is the application ?


Geological surveys, where they perform realtime geo/hydro-phone shots of 
areas of the size of 10x10km every 3-15 seconds.




test= CREATE OR REPLACE FUNCTION test_insert( )
RETURNS VOID
LANGUAGE plpgsql
AS
$$
DECLARE
_i INTEGER;
BEGIN
FOR _i IN 0..10 LOOP
INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms

test= SELECT test_insert();
 test_insert
-

(1 ligne)

Temps : 1885,382 ms


I tested this one and it took 4 seconds, compared to the jdbc insert 
which took 14 seconds, so its a lot faster. but not as fast as the 
SELECT INTO.


I also tested an INSERT INTO FROM SELECT, which took 1.8 seconds, now we 
are starting to talk about real performance.



However COPY is much faster because the parsing overhead and 
de-escaping of data is faster. COPY is optimized for throughput.


So, advice :

For optimum throughput, have your application build chunks of data 
into text files and use COPY. Or if your client lib supports the copy 
interface, use it.


I did test COPY, i.e. the jdbc COPY patch for pg 8.1, it performs at 
approx 1.8 seconds :) The test was done with text input, I am going to 
test it with binary input, which I expect will increase the performance 
with 20-50%.


All these test have ben performed on a laptop with a Kubuntu 6.10 
version of pg 8.1 without any special pg performance tuning. So I expect 
that compiling lates pg and doing some tuning on it and testing it on 
the a representative server will give it an additional boost in performance.


The key here is that with abundance in performance, I can experiment 
with the solution in a completely different way than if I had any 
artificial restrictions.


You will need a fast disk system with xlog and data on separate 
disks, several CPU cores (1 insert thread will max out 1 core, use the 
others for selects), lots of RAM so index updates don't need to seek, 
and tuning of bgwriter and checkpoints to avoid load spikes.


will have a look at it.

regards

thomas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Thomas Finneid

Michael Glaesemann wrote:


As they're individual inserts, I think what you're seeing is overhead 
from calling this statement 100,000 times, not just on the server but 
also the overhead through JDBC. For comparison, try


CREATE TABLE ciu_data_type_copy LIKE ciu_data_type;

INSERT INTO ciu_data_type_copy (id, loc_id, value3, value5, value8, 
value9, value10, value11)

SELECT id, loc_id, value3, value5, value8, value9, value10, value11
FROM ciu_data_type;

I think this would be more comparable to what you're seeing.


This is much faster than my previous solution, but, I also tested two 
other solutions

- a stored function with array arguments and it performed 3 times better.
- jdbc with COPY patch performed 8.4 times faster with text input, 
expect binary input to be even faster.


regards

thomas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Thomas Finneid



Michael Stone wrote:

On Tue, Jul 17, 2007 at 11:01:15PM +0200, Thomas Finneid wrote:

Arjen van der Meijden wrote:
Have you also tried the COPY-statement? Afaik select into is similar 
to what happens in there.


No, because it only works on file to db or vice versa not table to table.


I don't understand how the insert you described is table to table?


SELECT INTO is table to table, so is INSERT INTO SELECT FROM.

regards

thomas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Heikki Linnakangas
Adriaan van Os wrote:
 So, how does one (temporarily) disable WAL logging ? Or, for example,
 disable WAL logging for a temporary table ?

Operations on temporary tables are never WAL logged. Operations on other
tables are, and there's no way to disable it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Michael Glaesemann


On Jul 17, 2007, at 14:38 , Thomas Finneid wrote:

I was doing some testing on insert compared to select into. I  
inserted 100 000 rows (with 8 column values) into a table, which  
took 14 seconds, compared to a select into, which took 0.8 seconds.
(fyi, the inserts where batched, autocommit was turned off and it  
all happend on the local machine)


Now I am wondering why the select into is that much faster?


It would be helpful if you included the actual queries you're using,  
as there are a number of variables:


1) If there are any constraints on the original table, the INSERT  
will be checking those constraints. AIUI, SELECT INTO does not  
generate any table constraints.


2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1,  
foo2, foo3 FROM pre_foo or individual inserts for each row? The  
former would be faster than the latter.


2b) If you are doing individual inserts, are you wrapping them in a  
transaction? The latter would be faster.


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Arjen van der Meijden
Have you also tried the COPY-statement? Afaik select into is similar to 
what happens in there.


Best regards,

Arjen

On 17-7-2007 21:38 Thomas Finneid wrote:

Hi

I was doing some testing on insert compared to select into. I 
inserted 100 000 rows (with 8 column values) into a table, which took 14 
seconds, compared to a select into, which took 0.8 seconds.
(fyi, the inserts where batched, autocommit was turned off and it all 
happend on the local machine)


Now I am wondering why the select into is that much faster?
Does the select into translate into a specially optimised function in c 
that can cut corners which a insert can not do (e.g. lazy copying), or 
is it some other reason?


The reason I am asking is that select into shows that a number of rows 
can be inserted into a table quite a lot faster than one would think was 
possible with ordinary sql. If that is the case, it means that if I 
write an pl-pgsql insert function in C instead of sql, then I can have 
my db perform order of magnitude faster.


Any comments?

regards

thomas

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Tom Lane
Michael Glaesemann [EMAIL PROTECTED] writes:
 It would be helpful if you included the actual queries you're using,  
 as there are a number of variables:

Not to mention which PG version he's testing.  Since (I think) 8.1,
SELECT INTO knows that it can substitute one fsync for WAL-logging
the individual row inserts, since if there's a crash the new table
will disappear anyway.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Thomas Finneid



Tom Lane wrote:

Michael Glaesemann [EMAIL PROTECTED] writes:
It would be helpful if you included the actual queries you're using,  
as there are a number of variables:


Not to mention which PG version he's testing. 


Its pg 8.1, for now, I'll be upgrading to a compile optimised 8.2 when I 
do the real test on the real server.


(its on kubuntu 6.10 running on a Thinkpad T60 with dual core 1.5,GB RAM 
and 100GB SATA, just in case anybody feels that is of any interrest.)




Since (I think) 8.1,
SELECT INTO knows that it can substitute one fsync for WAL-logging
the individual row inserts, since if there's a crash the new table
will disappear anyway.


I am not sure I understand you correctly here, are you saying that 
SELECT INTO in 8.1 disables WAL logging and uses just a single fsync at 
the end? in that case it means that I could disable WAL as well and 
achieve the same performance, does it not?


regards

thomas


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Michael Stone

On Tue, Jul 17, 2007 at 10:50:22PM +0200, Thomas Finneid wrote:
I havent done this test in a stored function yet, nor have I tried it 
with a C client so far, so there is the chance that it is java/jdbc that 
makes the insert so slow. I'll get to that test soon if there is any 
chance my theory makes sence.


What you're seeing is perfectly normal. Switch to COPY for fast inserts. 
(When you use inserts you need to wait for a round-trip for each row, 
instead of sending data to the server as fast as possible.)


Mike Stone

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Mark Lewis
If you're performing via JDBC, are you using addBatch/executeBatch, or
are you directly executing each insert?  If you directly execute each
insert, then your code will wait for a server round-trip between each
insert.

That still won't get you to the speed of select into, but it should
help.  You could also look at the pgsql-jdbc archives for the JDBC
driver patches which allow you to use COPY-style bulk loading, which
should get you to the performance level of COPY, which should be
reasonably close to the performance of select into.

-- Mark Lewis

On Tue, 2007-07-17 at 22:50 +0200, Thomas Finneid wrote:
 
 Michael Glaesemann wrote:
  
  On Jul 17, 2007, at 14:38 , Thomas Finneid wrote:
  
  I was doing some testing on insert compared to select into. I 
  inserted 100 000 rows (with 8 column values) into a table, which took 
  14 seconds, compared to a select into, which took 0.8 seconds.
  (fyi, the inserts where batched, autocommit was turned off and it all 
  happend on the local machine)
 
  Now I am wondering why the select into is that much faster?
  
  It would be helpful if you included the actual queries you're using, as 
  there are a number of variables:
 
 create table ciu_data_type
 (
   id  integer,
   loc_id  integer,
   value1  integer,
   value2  real,
   value3  integer,
   value4  real,
   value5  real,
   value6  char(2),
   value7  char(3),
   value8  bigint,
   value9  bigint,
   value10 real,
   value11 bigint,
   value12 smallint,
   value13 double precision,
   value14 real,
   value15 real,
   value16 char(1),
   value17 varchar(18),
   value18 bigint,
   value19 char(4)
 );
 
 performed with JDBC
 
 insert into ciu_data_type (id, loc_id, value3, value5, value8, value9, 
 value10, value11 ) values (?,?,?,?,?,?,?,?)
 
 select * into ciu_data_type_copy from ciu_data_type
 
  1) If there are any constraints on the original table, the INSERT will 
  be checking those constraints. AIUI, SELECT INTO does not generate any 
  table constraints.
 
 No constraints in this test.
 
  2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1, foo2, 
  foo3 FROM pre_foo or individual inserts for each row? The former would 
  be faster than the latter.
  
  2b) If you are doing individual inserts, are you wrapping them in a 
  transaction? The latter would be faster.
 
 disabling autocommit, but nothing more than that
 
 
 I havent done this test in a stored function yet, nor have I tried it 
 with a C client so far, so there is the chance that it is java/jdbc that 
 makes the insert so slow. I'll get to that test soon if there is any 
 chance my theory makes sence.
 
 regards
 
 thomas
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Thomas Finneid



Arjen van der Meijden wrote:
Have you also tried the COPY-statement? Afaik select into is similar to 
what happens in there.


No, because it only works on file to db or vice versa not table to table.

regards

thoams

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Michael Glaesemann


On Jul 17, 2007, at 15:50 , Thomas Finneid wrote:


Michael Glaesemann wrote:


2a) Are you using INSERT INTO foo (foo1, foo2, foo2) SELECT foo1,  
foo2, foo3 FROM pre_foo or individual inserts for each row? The  
former would be faster than the latter.



performed with JDBC

insert into ciu_data_type (id, loc_id, value3, value5, value8,  
value9, value10, value11 ) values (?,?,?,?,?,?,?,?)


As they're individual inserts, I think what you're seeing is overhead  
from calling this statement 100,000 times, not just on the server but  
also the overhead through JDBC. For comparison, try


CREATE TABLE ciu_data_type_copy LIKE ciu_data_type;

INSERT INTO ciu_data_type_copy (id, loc_id, value3, value5, value8,  
value9, value10, value11)

SELECT id, loc_id, value3, value5, value8, value9, value10, value11
FROM ciu_data_type;

I think this would be more comparable to what you're seeing.

I havent done this test in a stored function yet, nor have I tried  
it with a C client so far, so there is the chance that it is java/ 
jdbc that makes the insert so slow. I'll get to that test soon if  
there is any chance my theory makes sence.


Just testing in psql with \timing should be fairly easy.

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread PFC


I was doing some testing on insert compared to select into. I  
inserted 100 000 rows (with 8 column values) into a table, which took 14  
seconds, compared to a select into, which took 0.8 seconds.
(fyi, the inserts where batched, autocommit was turned off and it all  
happend on the local machine)


Did you use prepared statements ?
	Did you use INSERT INTO ... VALUES () with a long list of values, or just  
100K insert statements ?


	It's the time to parse statements, plan, execute, roundtrips with the  
client, context switches, time for your client library to escape the data  
and encode it and for postgres to decode it, etc. In a word : OVERHEAD.


By the way which language and client library are you using ?

	FYI 14s / 100k = 140 microseconds per individual SQL query. That ain't  
slow at all.


Does the select into translate into a specially optimised function in c  
that can cut corners which a insert can not do (e.g. lazy copying), or  
is it some other reason?


	Yeah : instead of your client having to encode 100K * 8 values, send it  
over a socket, and postgres decoding it, INSERT INTO SELECT just takes the  
data, and writes the data. Same thing as writing a file a byte at a time  
versus using a big buffer.


The reason I am asking is that select into shows that a number of rows  
can be inserted into a table quite a lot faster than one would think was  
possible with ordinary sql. If that is the case, it means that if I  
write an pl-pgsql insert function in C instead of sql, then I can have  
my db perform order of magnitude faster.


	Fortunately this is already done for you : there is the PREPARE  
statement, which will remove the parsing overhead. If you must insert many  
rows, use VALUES (),(),()...


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] insert vs select into performance

2007-07-17 Thread Thomas Finneid



PFC wrote:


I was doing some testing on insert compared to select into. I 
inserted 100 000 rows (with 8 column values) into a table, which took 
14 seconds, compared to a select into, which took 0.8 seconds.
(fyi, the inserts where batched, autocommit was turned off and it all 
happend on the local machine)


Did you use prepared statements ?
Did you use INSERT INTO ... VALUES () with a long list of values, or 
just 100K insert statements ?


It was prepared statements and I tried it both batched and non-batched 
(not much difference on a local machine)


It's the time to parse statements, plan, execute, roundtrips with 
the client, context switches, time for your client library to escape the 
data and encode it and for postgres to decode it, etc. In a word : 
OVERHEAD.


I know there is some overhead, but that much when running it batched...?


By the way which language and client library are you using ?

FYI 14s / 100k = 140 microseconds per individual SQL query. That 
ain't slow at all.


Unfortunately its not fast enough, it needs to be done in no more than 
1-2 seconds, ( and in production it will be maybe 20-50 columns of data, 
perhaps divided over 5-10 tables.)
Additionally it needs to scale to perhaps three times as many columns 
and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
Further on it needs to allow for about 20 - 50 clients reading much of 
that data before the next batch of data arrives.


I know the computer is going to be a much faster one than the one I am 
testing with, but I need to make sure the solution scales well.



regars

thomas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org