Re: [PERFORM] INSERT times - same storage space but more fields -> much slower inserts

2009-04-15 Thread Matthew Wakeling

On Wed, 15 Apr 2009, Matthew Wakeling wrote:
If anyone needs this code in Java, we have a version at 
http://www.intermine.org/


Download source code: http://www.intermine.org/wiki/SVNCheckout

Javadoc: http://www.intermine.org/api/


Sorry, that should be http://www.flymine.org/api/

Matthew

--
What goes up must come down. Ask any system administrator.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] INSERT times - same storage space but more fields -> much slower inserts

2009-04-15 Thread Matthew Wakeling

On Tue, 14 Apr 2009, Stephen Frost wrote:

What does your test harness currently look like, and what would you like
to see to test the binary-format COPY?  I'd be happy to write up the
code necessary to implement binary-format COPY for this.


If anyone needs this code in Java, we have a version at 
http://www.intermine.org/


Download source code: http://www.intermine.org/wiki/SVNCheckout

Javadoc: http://www.intermine.org/api/

The code is contained in the org.intermine.sql.writebatch package, in the 
intermine/objectstore/main/src/org/intermine/sql/writebatch directory in 
the source.


The public interface is org.intermine.sql.writebatch.Batch.

The Postgres-specific binary COPY code is in 
org.intermine.sql.writebatch.BatchWriterPostgresCopyImpl.


The implementation unfortunately relies on a very old modified version of 
the Postgres JDBC driver, which is in the intermine/objectstore/main/lib 
directory.


The code is released under the LGPL, and we would appreciate notification 
if it is used.


The code implements quite a sophisticated system for writing rows to 
database tables very quickly. It batches together multiple writes into 
COPY statements, and writes them in the background in another thread, 
while fully honouring flush calls. When it is using the database 
connection is well-defined. I hope someone can find it useful.


Matthew

--
-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-. .-.   .-.
||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ /|||
|/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X|||/
'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'   `-' `-'

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] INSERT times - same storage space but more fields -> much slower inserts

2009-04-14 Thread Stephen Frost
Craig,

* Craig Ringer (cr...@postnewspapers.com.au) wrote:
> In other cases, binary-format COPY would be unsafe without some way to
> determine remote endianness and sizeof(various types).

As Tom mentioned already, the binary protocol is actually pretty well
defined, and it's in network-byte-order, aka, big-endian.  The only
issue that I can think of off-hand that you need to know about the
server is if it's using 64-bit integers for date-times or if it's using
float.  That's a simple check to do, however, specifically with:

show integer_datetimes;

It's also alot cheaper to do the necessary byte-flipping to go from
whatever-endian to network-byte-order than to do the whole printf/atoi
conversion.  Handling timestamps takes a bit more magic but you can just
pull the appropriate code/#defines from the server backend, but I don't
think that's even an issue for this particular set.

What does your test harness currently look like, and what would you like
to see to test the binary-format COPY?  I'd be happy to write up the
code necessary to implement binary-format COPY for this.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] INSERT times - same storage space but more fields -> much slower inserts

2009-04-14 Thread Tom Lane
Craig Ringer  writes:
> Unlike PQexecPrepared(...), binary-format COPY doesn't handle endian and
> type size issues for you. You need to convert the data to the database
> server's endianness and type sizes, but I don't think the PostgreSQL
> protocol provides any way to find those out.

The on-the-wire binary format is much better specified than you think.
(The documentation of it sucks, however.)  It's big-endian in all cases
and the datatype sizes are well defined.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] INSERT times - same storage space but more fields -> much slower inserts

2009-04-14 Thread Craig Ringer
Stephen Frost wrote:
> * Matthew Wakeling (matt...@flymine.org) wrote:
>> On Tue, 14 Apr 2009, Stephen Frost wrote:
>>> Bacula should be using COPY for the batch data loads, so hopefully won't
>>> suffer too much from having the fields split out.  I think it would be
>>> interesting to try doing PQexecPrepared with binary-format data instead
>>> of using COPY though.  I'd be happy to help you implement a test setup
>>> for doing that, if you'd like.
>> You can always do binary-format COPY.
> 
> I've never played with binary-format COPY actually.  I'd be happy to
> help test that too though.

I'd have to check the source/a protocol dump to be sure, but I think
PQexecPrepared(...), while it takes binary arguments, actually sends
them over the wire in text form. PostgreSQL does have a binary protocol
as well, but it suffers from the same issues as binary-format COPY:

Unlike PQexecPrepared(...), binary-format COPY doesn't handle endian and
type size issues for you. You need to convert the data to the database
server's endianness and type sizes, but I don't think the PostgreSQL
protocol provides any way to find those out.

It's OK if we're connected via a UNIX socket (and thus are on the same
host), though I guess a sufficiently perverse individual could install a
32-bit bacula+libpq, and run a 64-bit PostgreSQL server, or even vice versa.

It should also be OK when connected to `localhost' (127.0.0.0/8) .

In other cases, binary-format COPY would be unsafe without some way to
determine remote endianness and sizeof(various types).

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] INSERT times - same storage space but more fields -> much slower inserts

2009-04-14 Thread Stephen Frost
* Matthew Wakeling (matt...@flymine.org) wrote:
> On Tue, 14 Apr 2009, Stephen Frost wrote:
>> Bacula should be using COPY for the batch data loads, so hopefully won't
>> suffer too much from having the fields split out.  I think it would be
>> interesting to try doing PQexecPrepared with binary-format data instead
>> of using COPY though.  I'd be happy to help you implement a test setup
>> for doing that, if you'd like.
>
> You can always do binary-format COPY.

I've never played with binary-format COPY actually.  I'd be happy to
help test that too though.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] INSERT times - same storage space but more fields -> much slower inserts

2009-04-14 Thread Matthew Wakeling

On Tue, 14 Apr 2009, Stephen Frost wrote:

Bacula should be using COPY for the batch data loads, so hopefully won't
suffer too much from having the fields split out.  I think it would be
interesting to try doing PQexecPrepared with binary-format data instead
of using COPY though.  I'd be happy to help you implement a test setup
for doing that, if you'd like.


You can always do binary-format COPY.

Matthew

--
An ant doesn't have a lot of processing power available to it. I'm not trying
to be speciesist - I wouldn't want to detract you from such a wonderful
creature, but, well, there isn't a lot there, is there?
   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] INSERT times - same storage space but more fields -> much slower inserts

2009-04-14 Thread Stephen Frost
Craig,

* Craig Ringer (cr...@postnewspapers.com.au) wrote:
> I've been doing some testing for the Bacula project, which uses
> PostgreSQL as one of the databases in which it stores backup catalogs.

We also use Bacula with a PostgreSQL backend.

> I've been evaluating a schema change for Bacula that takes a field
> that's currently stored as a gruesome-to-work-with base64-encoded
> representation of a binary blob, and expands it into a set of integer
> fields that can be searched, indexed, etc.

This would be extremely nice.

> The table size of the expanded form is marginally smaller than for the
> base64-encoded string version. However, INSERT times are *CONSIDERABLY*
> greater for the version with more fields. It takes 1011 seconds to
> insert the base64 version, vs 1290 seconds for the expanded-fields
> version. That's a difference of 279 seconds, or 27%.
> 
> Despite that, the final table sizes are the same.
> 
> If I use tab-separated input and COPY, the original-format file is
> 1300MB and the expanded-structure format is 1618MB. The performance hit
> on COPY-based insert is not as bad, at 161s vs 182s (13%), but still
> quite significant.
> 
> Any ideas about what I might be able to do to improve the efficiency of
> inserting records with many integer fields?

Bacula should be using COPY for the batch data loads, so hopefully won't
suffer too much from having the fields split out.  I think it would be
interesting to try doing PQexecPrepared with binary-format data instead
of using COPY though.  I'd be happy to help you implement a test setup
for doing that, if you'd like.

Thanks,

Stephen


signature.asc
Description: Digital signature


[PERFORM] INSERT times - same storage space but more fields -> much slower inserts

2009-04-14 Thread Craig Ringer
Hi

I've been doing some testing for the Bacula project, which uses
PostgreSQL as one of the databases in which it stores backup catalogs.

Insert times are critical in this environment, as the app may insert
millions of records a day.

I've been evaluating a schema change for Bacula that takes a field
that's currently stored as a gruesome-to-work-with base64-encoded
representation of a binary blob, and expands it into a set of integer
fields that can be searched, indexed, etc.

The table size of the expanded form is marginally smaller than for the
base64-encoded string version. However, INSERT times are *CONSIDERABLY*
greater for the version with more fields. It takes 1011 seconds to
insert the base64 version, vs 1290 seconds for the expanded-fields
version. That's a difference of 279 seconds, or 27%.

Despite that, the final table sizes are the same.

The SQL dump for the base64 version is 1734MB and the expanded one is
2189MB, about a 25% increase. Given that the final table sizes are the
same, is the slowdown likely to just be the cost of parsing the extra
SQL, converting the textual representations of the numbers, etc?

If I use tab-separated input and COPY, the original-format file is
1300MB and the expanded-structure format is 1618MB. The performance hit
on COPY-based insert is not as bad, at 161s vs 182s (13%), but still
quite significant.

Any ideas about what I might be able to do to improve the efficiency of
inserting records with many integer fields?


In case it's of interest, the base64 and expanded schema are:


CREATE TABLE file (
fileid bigint NOT NULL,
fileindex integer DEFAULT 0 NOT NULL,
jobid integer NOT NULL,
pathid integer NOT NULL,
filenameid integer NOT NULL,
markid integer DEFAULT 0 NOT NULL,
lstat text NOT NULL,
md5 text NOT NULL
);



CREATE TABLE file (
fileid bigint,
fileindex integer,
jobid integer,
pathid integer,
filenameid integer,
markid integer,
st_dev integer,
st_ino integer,
st_mod integer,
st_nlink integer,
st_uid integer,
st_gid integer,
st_rdev bigint,
st_size integer,
st_blksize integer,
st_blocks integer,
st_atime integer,
st_mtime integer,
st_ctime integer,
linkfi integer,
md5 text
);


( Yes, those are the fields of a `struct lstat' ).

--
Craig Ringer


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Insert Times

2004-01-27 Thread Alan Stange
PC Drew wrote:

I tested this out and saw no improvement:
 

I'd still suspect some class loading issues and HotSpot compilation 
issues are polluting your numbers.Try using a PreparedStatement to 
another table first in order to make sure that classes bytecode has been 
loaded.   There are some command line options to the JVM to have it 
print out some status info when it is loading classes and compiling 
methods; you might want to turn on those options as well.

-- Alan

EXPLAIN ANALYZE SELECT * FROM one;
Seq Scan on one  (cost=0.00..20.00 rows=1000 width=404) (actual time=0.04..0.50 
rows=51 loops=1)
Total runtime: 0.75 msec
EXPLAIN ANALYZE SELECT * FROM one;
Seq Scan on one  (cost=0.00..20.00 rows=1000 width=404) (actual time=0.06..0.50 
rows=51 loops=1)
Total runtime: 0.64 msec
EXPLAIN ANALYZE SELECT * FROM one;
Seq Scan on one  (cost=0.00..20.00 rows=1000 width=404) (actual time=0.04..0.40 
rows=51 loops=1)
Total runtime: 0.54 msec
EXPLAIN ANALYZE SELECT * FROM one;
Seq Scan on one  (cost=0.00..20.00 rows=1000 width=404) (actual time=0.04..0.41 
rows=51 loops=1)
Total runtime: 0.54 msec
EXPLAIN ANALYZE SELECT * FROM one;
Seq Scan on one  (cost=0.00..20.00 rows=1000 width=404) (actual time=0.04..0.41 
rows=51 loops=1)
Total runtime: 0.53 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1 loops=1)
Total runtime: 0.85 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1 loops=1)
Total runtime: 0.15 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1 loops=1)
Total runtime: 0.14 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1 loops=1)
Total runtime: 0.12 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1 loops=1)
Total runtime: 0.12 msec


-Original Message-
From:   Leeuw van der, Tim [mailto:[EMAIL PROTECTED]
Sent:   Tue 1/27/2004 12:38 AM
To: PC Drew; [EMAIL PROTECTED]
Cc: 
Subject:RE: [PERFORM] Insert Times
Hi,

My personal feeling on this is, that the long time taken for the first query
is for loading all sorts of libraries, JVM startup overhead etc.
What if you first do some SELECT (whatever), on a different table, to warm
up the JVM and the database?
regards,

--Tim

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you received
this in error, please contact the sender and delete the e-mail and its
attachments from all computers. 



---(end of broadcast)---
TIP 8: explain analyze is your friend
 



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Insert Times

2004-01-27 Thread PC Drew
I tested this out and saw no improvement:

EXPLAIN ANALYZE SELECT * FROM one;
Seq Scan on one  (cost=0.00..20.00 rows=1000 width=404) (actual time=0.04..0.50 
rows=51 loops=1)
Total runtime: 0.75 msec
EXPLAIN ANALYZE SELECT * FROM one;
Seq Scan on one  (cost=0.00..20.00 rows=1000 width=404) (actual time=0.06..0.50 
rows=51 loops=1)
Total runtime: 0.64 msec
EXPLAIN ANALYZE SELECT * FROM one;
Seq Scan on one  (cost=0.00..20.00 rows=1000 width=404) (actual time=0.04..0.40 
rows=51 loops=1)
Total runtime: 0.54 msec
EXPLAIN ANALYZE SELECT * FROM one;
Seq Scan on one  (cost=0.00..20.00 rows=1000 width=404) (actual time=0.04..0.41 
rows=51 loops=1)
Total runtime: 0.54 msec
EXPLAIN ANALYZE SELECT * FROM one;
Seq Scan on one  (cost=0.00..20.00 rows=1000 width=404) (actual time=0.04..0.41 
rows=51 loops=1)
Total runtime: 0.53 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1 loops=1)
Total runtime: 0.85 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1 loops=1)
Total runtime: 0.15 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1 loops=1)
Total runtime: 0.14 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1 loops=1)
Total runtime: 0.12 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1 loops=1)
Total runtime: 0.12 msec



-Original Message-
From:   Leeuw van der, Tim [mailto:[EMAIL PROTECTED]
Sent:   Tue 1/27/2004 12:38 AM
To: PC Drew; [EMAIL PROTECTED]
Cc: 
Subject:RE: [PERFORM] Insert Times

Hi,

My personal feeling on this is, that the long time taken for the first query
is for loading all sorts of libraries, JVM startup overhead etc.

What if you first do some SELECT (whatever), on a different table, to warm
up the JVM and the database?

regards,

--Tim

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you received
this in error, please contact the sender and delete the e-mail and its
attachments from all computers. 




---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Insert Times

2004-01-27 Thread Leeuw van der, Tim
Hi,

My personal feeling on this is, that the long time taken for the first query
is for loading all sorts of libraries, JVM startup overhead etc.

What if you first do some SELECT (whatever), on a different table, to warm
up the JVM and the database?

regards,

--Tim

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you received
this in error, please contact the sender and delete the e-mail and its
attachments from all computers. 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Insert Times

2004-01-26 Thread Brad Gawne
Drew,

Just a guess because I don't know PostGres that well, but it could be the
SQL parse time and once the statement is parsed, it may be pinned in parse
cache for subsequent calls.

Regards,
Brad Gawne


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of PC Drew
Sent: January 26, 2004 16:00
To: [EMAIL PROTECTED]
Subject: [PERFORM] Insert Times

I'm trying to track down some performance issues where time (in
milliseconds) is critical.  One thing that I've noticed is that it seems
like the first insert takes an inordinate amount of time.  
Inserts after the first one are acceptable.  My production environment is
like this:

Solaris 9
JBoss
Java 1.4.2
PostgreSQL 7.3 JDBC drivers
PostgreSQL 7.3.2 database

I've isolated the problem in order to more accurately recreate it by using
the following table definition and insert statement:

gregqa=# \d one
  Table "public.one"
  Column |  Type  | Modifiers
++---
  id | integer|
  msg| character varying(255) |


explain analyze insert into one (id, msg) values (1, 'blah');


I'm currently using Jython (Python written in Java) and the JDBC drivers to
recreate the problem with this code:

from java.sql import *
from java.lang import Class

Class.forName("org.postgresql.Driver")
db =
DriverManager.getConnection("jdbc:postgresql://localhost:5432/blah",
"blah", "")

for i in range(5):
 query = "EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');"
 print query

 st = db.createStatement()
 rs = st.executeQuery(query)
 rsmd = rs.getMetaData()
 cols = rsmd.getColumnCount()
 cols_range = range(1, cols + 1)

 while rs.next():
 for col in cols_range:
 print rs.getString(col)

 rs.close()
 st.close()

db.close()


When I run this code (which will execute the query 5 times before
finishing), here's the output I get:

[bert:~/tmp] > env CLASSPATH=pg73jdbc3.jar jython dbquery.py EXPLAIN ANALYZE
INSERT INTO one (id, msg) VALUES (1, 'blah'); Result  (cost=0.00..0.01
rows=1 width=0) (actual time=0.02..0.02 rows=1
loops=1)
Total runtime: 0.59 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah'); Result
(cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1
loops=1)
Total runtime: 0.17 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah'); Result
(cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1
loops=1)
Total runtime: 0.12 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah'); Result
(cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1
loops=1)
Total runtime: 0.12 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah'); Result
(cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1
loops=1)
Total runtime: 0.12 msec

[bert:~/tmp] > env CLASSPATH=pg73jdbc3.jar jython dbquery.py EXPLAIN ANALYZE
INSERT INTO one (id, msg) VALUES (1, 'blah'); Result  (cost=0.00..0.01
rows=1 width=0) (actual time=0.02..0.02 rows=1
loops=1)
Total runtime: 0.55 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah'); Result
(cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1
loops=1)
Total runtime: 0.15 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah'); Result
(cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1
loops=1)
Total runtime: 0.13 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah'); Result
(cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1
loops=1)
Total runtime: 0.13 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah'); Result
(cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1
loops=1)
Total runtime: 0.17 msec

(I ran it twice to show that it is consistently repeatable)

Now, of course this query isn't very interesting and shouldn't take very
long, but it does illustrate the time difference between the first query and
the last four.  On my bigger insert query, it's taking 79msec for the first
query and ~0.9msec for the last four.  Any ideas as to why the first query
would take so long?  I can provide more information if necessary, but I
think the example is pretty simple as is.

--
PC Drew


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Insert Times

2004-01-26 Thread PC Drew
I'm trying to track down some performance issues where time (in 
milliseconds) is critical.  One thing that I've noticed is that it 
seems like the first insert takes an inordinate amount of time.  
Inserts after the first one are acceptable.  My production environment 
is like this:

Solaris 9
JBoss
Java 1.4.2
PostgreSQL 7.3 JDBC drivers
PostgreSQL 7.3.2 database
I've isolated the problem in order to more accurately recreate it by 
using the following table definition and insert statement:

gregqa=# \d one
 Table "public.one"
 Column |  Type  | Modifiers
++---
 id | integer|
 msg| character varying(255) |
explain analyze insert into one (id, msg) values (1, 'blah');

I'm currently using Jython (Python written in Java) and the JDBC 
drivers to recreate the problem with this code:

from java.sql import *
from java.lang import Class
Class.forName("org.postgresql.Driver")
db = 
DriverManager.getConnection("jdbc:postgresql://localhost:5432/blah", 
"blah", "")

for i in range(5):
query = "EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 
'blah');"
print query

st = db.createStatement()
rs = st.executeQuery(query)
rsmd = rs.getMetaData()
cols = rsmd.getColumnCount()
cols_range = range(1, cols + 1)
while rs.next():
for col in cols_range:
print rs.getString(col)
rs.close()
st.close()
db.close()

When I run this code (which will execute the query 5 times before 
finishing), here's the output I get:

[bert:~/tmp] > env CLASSPATH=pg73jdbc3.jar jython dbquery.py
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1 
loops=1)
Total runtime: 0.59 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1 
loops=1)
Total runtime: 0.17 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1 
loops=1)
Total runtime: 0.12 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1 
loops=1)
Total runtime: 0.12 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1 
loops=1)
Total runtime: 0.12 msec

[bert:~/tmp] > env CLASSPATH=pg73jdbc3.jar jython dbquery.py
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1 
loops=1)
Total runtime: 0.55 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.02..0.02 rows=1 
loops=1)
Total runtime: 0.15 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.01 rows=1 
loops=1)
Total runtime: 0.13 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1 
loops=1)
Total runtime: 0.13 msec
EXPLAIN ANALYZE INSERT INTO one (id, msg) VALUES (1, 'blah');
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.01..0.02 rows=1 
loops=1)
Total runtime: 0.17 msec

(I ran it twice to show that it is consistently repeatable)

Now, of course this query isn't very interesting and shouldn't take 
very long, but it does illustrate the time difference between the first 
query and the last four.  On my bigger insert query, it's taking 79msec 
for the first query and ~0.9msec for the last four.  Any ideas as to 
why the first query would take so long?  I can provide more information 
if necessary, but I think the example is pretty simple as is.

--
PC Drew
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html