Re: [PERFORM] TRUNCATE TABLE

2007-07-18 Thread Adriaan van Os

Tom Lane wrote:


Thus the timing issue (at least as exhibited by this script) has nothing
whatever to do with the time to delete a file, but with the time to
create one.  Since the part of DROP being timed has probably got no I/O
involved at all (the tuples being touched are almost surely still in
shared buffers), it's unsurprising that it is consistently fast.


In my original profiling, CREATE TEMPORARY TABLE/DROP TABLE wasn't much faster than TRUNCATE TABLE. 
When I try it again now, I see that DROP TABLE is consistently fast, while the timings of CREATE 
TEMPORARY TABLE vary as much as those of TRUNCATE TABLE. Your observations on the time needed to 
open a file confirm that, I think.


In my test databases, autovacuum is off.

Regards,

Adriaan van Os

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] importance of fast disks with pg

2007-07-18 Thread Heikki Linnakangas
Thomas Finneid wrote:
 During the somes I did I noticed that it does not necessarily seem to be
 true that one needs the fastest disks to have a pg system that is fast.
 
 It seems to me that its more important to:
 - choose the correct methods to use for the operation
 - tune the pg memory settings
 - tune/disable pg xlog/wal etc
 
 It also seems to me that fast disks are more important for db systems of
 the OLTP type applications with real concurrency of both readers and
 writes across many, possibly larger, tables etc.
 
 Are the above statements close to having any truth in them?

It depends.

The key to performance is to identify the bottleneck. If your CPU is
running at 50%, and spends 50% of the time waiting for I/O, a faster
disk will help. But only up to a point. After you add enough I/O
capability that the CPU is running at 100%, getting faster disks doesn't
help anymore. At that point you need to get more CPU power.

Here's the algorithm for increasing application throughput:

while throughput is not high enough
{
  identify bottleneck
  resolve bottleneck, by faster/more hardware, or by optimizing application
}

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] ionice to make vacuum friendier?

2007-07-18 Thread Heikki Linnakangas
Ron Mayer wrote:
 Seems Linux has IO scheduling through a program called ionice.
 
 Has anyone here experimented with using it rather than
 vacuum sleep settings?

I looked at that briefly for smoothing checkpoints, but it was
unsuitable for that purpose because it only prioritizes reads, not writes.

It maybe worth trying for vacuum, though vacuum too can do a lot of
writes. In the worst case, the OS cache is saturated with dirty pages,
which blocks all writes in the system.

If it did prioritize writes as well, that would be *excellent*. Any
kernel hackers out there looking for a project?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread PFC


It's the time to parse statements, plan, execute, roundtrips with  
the client, context switches, time for your client library to escape  
the data and encode it and for postgres to decode it, etc. In a word :  
OVERHEAD.


I know there is some overhead, but that much when running it batched...?


Well, yeah ;)

Unfortunately its not fast enough, it needs to be done in no more than  
1-2 seconds, ( and in production it will be maybe 20-50 columns of data,  
perhaps divided over 5-10 tables.)
Additionally it needs to scale to perhaps three times as many columns  
and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
Further on it needs to allow for about 20 - 50 clients reading much of  
that data before the next batch of data arrives.


Wow. What is the application ?

	Test run on a desktop PC, Athlon 64 3200+, 2 IDE disks in RAID1 (pretty  
slow) :


test= CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 11,463 ms

test= INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 10  
) as a;

INSERT 0 10
Temps : 721,579 ms

	OK, so you see, insert speed is pretty fast. With a better CPU and faster  
disks, you can get a lot more.


test= TRUNCATE TABLE test;
TRUNCATE TABLE
Temps : 30,010 ms

test= ALTER TABLE test ADD PRIMARY KEY (f);
INFO:  ALTER TABLE / ADD PRIMARY KEY créera un index implicite «test_pkey»  
pour la table «test»

ALTER TABLE
Temps : 100,577 ms

test= INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 10  
) as a;

INSERT 0 10
Temps : 1915,928 ms

This includes the time to update the index.

test= DROP TABLE test;
DROP TABLE
Temps : 28,804 ms

test= CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 1,626 ms

test= CREATE OR REPLACE FUNCTION test_insert( )
RETURNS VOID
LANGUAGE plpgsql
AS
$$
DECLARE
_i INTEGER;
BEGIN
FOR _i IN 0..10 LOOP
INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms

test= SELECT test_insert();
 test_insert
-

(1 ligne)

Temps : 1885,382 ms

	Now you see, performing 100K individual inserts inside a plpgsql function  
is also fast.
	The postgres engine is pretty damn fast ; it's the communication overhead  
that you feel, especially switching between client and server processes.


Another example :

= INSERT INTO test (a,b,c,d,e,f) VALUES (... 10 integer tuples)
INSERT 0 10
Temps : 1836,458 ms

	VALUES is actually pretty fast. Here, there is no context switch,  
everything is done in 1 INSERT.


	However COPY is much faster because the parsing overhead and de-escaping  
of data is faster. COPY is optimized for throughput.


So, advice :

	For optimum throughput, have your application build chunks of data into  
text files and use COPY. Or if your client lib supports the copy  
interface, use it.
	You will need a fast disk system with xlog and data on separate disks,  
several CPU cores (1 insert thread will max out 1 core, use the others for  
selects), lots of RAM so index updates don't need to seek, and tuning of  
bgwriter and checkpoints to avoid load spikes.


























---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Michael Stone

On Tue, Jul 17, 2007 at 10:58:01PM +0200, Thomas Finneid wrote:
I am not sure I understand you correctly here, are you saying that 
SELECT INTO in 8.1 disables WAL logging and uses just a single fsync at 
the end? in that case it means that I could disable WAL as well and 
achieve the same performance, does it not?


Yes. The difference is that the select into optimization just means that 
if the system crashes the data you're inserting is invalid (and is 
properly cleaned up), and disabling the WAL means that if the system 
crashes everything is invalid (and can't be cleaned up). 


Mike Stone

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Michael Stone

On Tue, Jul 17, 2007 at 11:01:15PM +0200, Thomas Finneid wrote:

Arjen van der Meijden wrote:
Have you also tried the COPY-statement? Afaik select into is similar to 
what happens in there.


No, because it only works on file to db or vice versa not table to table.


I don't understand how the insert you described is table to table?

Mike Stone

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] How to use a trigger to write rows to a remote server

2007-07-18 Thread Michael Dengler

Hi All,

I am trying to find out how to use a trigger function on a table to copy any
inserted row to a remote PG server.

ie:

Row X is inserted into TableX in DB1 on server1TableX trigger function
fires and contacts DB2 on server2 and inserts the row into TableY on
server2.

I've looked around and can't see to find this. Essentially I need to know
how to write to a remote DB server from within a trigger function.

This is not replication, I'm not interested in a full blown trigger based
replication solution.

Any Help is greatly appreciated!

Thanks

Mike


Re: [PERFORM] How to use a trigger to write rows to a remote server

2007-07-18 Thread Csaba Nagy
On Wed, 2007-07-18 at 15:36, Michael Dengler wrote:
 Row X is inserted into TableX in DB1 on server1TableX trigger
 function fires and contacts DB2 on server2 and inserts the row into
 TableY on server2. 

This kind of problem is usually solved more robustly by inserting the
change into a local table and let the remote server (or some external
program) poll that periodically, and make the necessary changes to the
remote server. This method does not have the problems Heikki mentions in
his reply with disconnections and transaction rollbacks, as the external
program/remote server will only see committed transactions and it can
apply the accumulated changes after connection is recovered in case of
failure, without blocking the activity on the master.

This is also covered in a few past posts on the postgres lists (I guess
you should look in the general list for that), in particular you could
be interested in the possibility of notifications if you want your
poller to be notified immediately when a change occurs.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] How to use a trigger to write rows to a remote server

2007-07-18 Thread Tom Lane
Michael Dengler [EMAIL PROTECTED] writes:
 I am trying to find out how to use a trigger function on a table to copy any
 inserted row to a remote PG server.
 ...
 This is not replication, I'm not interested in a full blown trigger based
 replication solution.

To be blunt, you're nuts.  You *are* building a trigger based
replication system, and the fact that you think you can cut corners
just shows how little you know about the problems involved.

Use Slony, or some other solution that someone else has already gotten
the bugs out of.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] How to use a trigger to write rows to a remote server

2007-07-18 Thread Adam Tauno Williams
On Wed, 2007-07-18 at 16:02 +0200, Csaba Nagy wrote:
 On Wed, 2007-07-18 at 15:36, Michael Dengler wrote:
  Row X is inserted into TableX in DB1 on server1TableX trigger
  function fires and contacts DB2 on server2 and inserts the row into
  TableY on server2. 
 This kind of problem is usually solved more robustly by inserting the
 change into a local table and let the remote server (or some external

If you don't want to build your own push/pull system [actually hard to
do well] then use something like xmlBlaster or some other MOM.  You get
logging, transactions, and other features thrown in.

http://www.xmlblaster.org/xmlBlaster/doc/requirements/contrib.replication.html

-- 
Adam Tauno Williams, Network  Systems Administrator
Consultant - http://www.whitemiceconsulting.com
Developer - http://www.opengroupware.org


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Parrallel query execution for UNION ALL Queries

2007-07-18 Thread Benjamin Arai
Hi,

If I have a query such as:

SELECT * FROM (SELECT * FROM A) UNION ALL (SELECT * FROM B) WHERE
blah='food';

Assuming the table A and B both have the same attributes and the data
between the table is not partitioned in any special way, does Postgresql
execute WHERE blah=food on both table simultaiously or what?  If not, is
there a way to execute the query on both in parrallel then aggregate the
results?

To give some context, I have a very large amount of new data being loaded
each week.  Currently I am partitioning the data into a new table every
month which is working great from a indexing standpoint.  But I want to
parrallelize searches if possible to reduce the perofrmance loss of having
multiple tables.

Benjamin


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Parrallel query execution for UNION ALL Queries

2007-07-18 Thread Scott Marlowe

On 7/18/07, Benjamin Arai [EMAIL PROTECTED] wrote:

Hi,

If I have a query such as:

SELECT * FROM (SELECT * FROM A) UNION ALL (SELECT * FROM B) WHERE
blah='food';

Assuming the table A and B both have the same attributes and the data
between the table is not partitioned in any special way, does Postgresql
execute WHERE blah=food on both table simultaiously or what?  If not, is
there a way to execute the query on both in parrallel then aggregate the
results?

To give some context, I have a very large amount of new data being loaded
each week.  Currently I am partitioning the data into a new table every
month which is working great from a indexing standpoint.  But I want to
parrallelize searches if possible to reduce the perofrmance loss of having
multiple tables.


Most of the time, the real issue would be the I/O throughput for such
queries, not the CPU capability.

If you have only one disk for your data storage, you're likely to get
WORSE performance if you have two queries running at once, since the
heads would not be going back and forth from one data set to the
other.

EnterpriseDB, a commercially enhanced version of PostgreSQL can do
query parallelization, but it comes at a cost, and that cost is making
sure you have enough spindles / I/O bandwidth that you won't be
actually slowing your system down.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] How to use a trigger to write rows to a remote server

2007-07-18 Thread Michael Dengler

Hmm..I was hoping to avoid personal insults

Anyway, Nuts or not...what I am attempting is to simply have row from one
table inserted into another servers DB I don't see it as replication
because:

a) The destination table will have a trigger that modifies the arriving data
to fit its table scheme.
b) It is not critical that the data be synchronous (ie a lost row on the
destination DB is not a big deal)
c) I see as more of a provision of data to the destination DB NOT A
REPLICATION OF DATA.

Essentially the remote server just wants to know when some record arrives at
the source server and wants to know some of the info contained in the new
record.

And yes it may be that I know little about the myriad of problems involved
with replication...but I do know how to carry on a civil, adult
conversationmaybe we can have a knowledge exchange.

Cheers

Mike



On 7/18/07, Tom Lane [EMAIL PROTECTED] wrote:


Michael Dengler [EMAIL PROTECTED] writes:
 I am trying to find out how to use a trigger function on a table to copy
any
 inserted row to a remote PG server.
 ...
 This is not replication, I'm not interested in a full blown trigger
based
 replication solution.

To be blunt, you're nuts.  You *are* building a trigger based
replication system, and the fact that you think you can cut corners
just shows how little you know about the problems involved.





Use Slony, or some other solution that someone else has already gotten

the bugs out of.

regards, tom lane



Re: [PERFORM] How to use a trigger to write rows to a remote server

2007-07-18 Thread Erik Jones


On Jul 18, 2007, at 11:30 AM, Michael Dengler wrote:


Hmm..I was hoping to avoid personal insults

Anyway, Nuts or not...what I am attempting is to simply have row  
from one table inserted into another servers DB I don't see it as  
replication because:


a) The destination table will have a trigger that modifies the  
arriving data to fit its table scheme.
b) It is not critical that the data be synchronous (ie a lost row  
on the destination DB is not a big deal)
c) I see as more of a provision of data to the destination DB NOT A  
REPLICATION OF DATA.


Essentially the remote server just wants to know when some record  
arrives at the source server and wants to know some of the info  
contained in the new record.


And yes it may be that I know little about the myriad of problems  
involved with replication...but I do know how to carry on a civil,  
adult conversationmaybe we can have a knowledge exchange.


Cheers

Mike


Mike,

If all you need is for your trigger to make a simple query on another  
db then you can use dblink or an untrusted version of one of the  
available procedural languages such as plperlu or plpythonu.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] When/if to Reindex

2007-07-18 Thread Steven Flatt

We're using Postgres 8.2.4.

I'm trying to decide whether it's worthwhile to implement a process that
does periodic reindexing.  In a few ad hoc tests, where I've tried to set up
data similar to how our application does it, I've noticed decent performance
increases after doing a reindex as well as the planner being more likely to
choose an index scan.

Some background: we make extensive use of partitioned tables.  In fact, I'm
really only considering reindexing partitions that have just closed.  In
our simplest/most general case, we have a table partitioned by a timestamp
column, each partition 24 hours wide.  The partition will have an index on
the timestamp column as well as a few other indexes including a primary key
index (all b-tree).  Is there a programmatic way I can decide, upon the
closing of a partition, which, if any, of these indexes will benefit from
a reindex?  Can I determine things like average node density, node depth, or
any other indication as to the quality of an index?  Will pg_class.relpages
be any help here?

Is it a simple matter of running some queries, reindexing the table, then
running the queries again to determine overall performance change?  If so,
what queries would exercise this best?

Just trying to determine if the extra cost of reindexing newly closed
partitions will be worth the performance benefit of querying the data.
Reindexing a table with a day's worth of data is taking on the order of a
few hours (10s of millions of rows).

The docs say that:

...for B-tree indexes a freshly-constructed index is somewhat faster to
access than one that has been updated many times, because logically adjacent
pages are usually also physically adjacent in a newly built index... It
might be worthwhile to reindex periodically just to improve access speed.

Thanks,
Steve


Re: [PERFORM] ionice to make vacuum friendier?

2007-07-18 Thread Jim C. Nasby
On Wed, Jul 18, 2007 at 10:03:00AM +0100, Heikki Linnakangas wrote:
 Ron Mayer wrote:
  Seems Linux has IO scheduling through a program called ionice.
  
  Has anyone here experimented with using it rather than
  vacuum sleep settings?
 
 I looked at that briefly for smoothing checkpoints, but it was
 unsuitable for that purpose because it only prioritizes reads, not writes.
 
 It maybe worth trying for vacuum, though vacuum too can do a lot of
 writes. In the worst case, the OS cache is saturated with dirty pages,
 which blocks all writes in the system.
 
 If it did prioritize writes as well, that would be *excellent*. Any
 kernel hackers out there looking for a project?

My understanding is that FreeBSD will prioritize IO based on process
priority, though I have no idea how it's actually accomplished or how
effective it is. But if we put in special support for this for Linux we
should consider FBSD as well.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpB5pBTxbkJ2.pgp
Description: PGP signature


Re: [PERFORM] Parrallel query execution for UNION ALL Queries

2007-07-18 Thread Jim C. Nasby
On Wed, Jul 18, 2007 at 11:30:48AM -0500, Scott Marlowe wrote:
 EnterpriseDB, a commercially enhanced version of PostgreSQL can do
 query parallelization, but it comes at a cost, and that cost is making
 sure you have enough spindles / I/O bandwidth that you won't be
 actually slowing your system down.

I think you're thinking ExtendDB. :)
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp4I6wlmsXWx.pgp
Description: PGP signature


Re: [PERFORM] When/if to Reindex

2007-07-18 Thread Jim C. Nasby
On Wed, Jul 18, 2007 at 01:08:30PM -0400, Steven Flatt wrote:
 We're using Postgres 8.2.4.
 
 I'm trying to decide whether it's worthwhile to implement a process that
 does periodic reindexing.  In a few ad hoc tests, where I've tried to set up
 data similar to how our application does it, I've noticed decent performance
 increases after doing a reindex as well as the planner being more likely to
 choose an index scan.
 
 Some background: we make extensive use of partitioned tables.  In fact, I'm
 really only considering reindexing partitions that have just closed.  In
 our simplest/most general case, we have a table partitioned by a timestamp
 column, each partition 24 hours wide.  The partition will have an index on
 the timestamp column as well as a few other indexes including a primary key
 index (all b-tree).  Is there a programmatic way I can decide, upon the
 closing of a partition, which, if any, of these indexes will benefit from
 a reindex?  Can I determine things like average node density, node depth, or
 any other indication as to the quality of an index?  Will pg_class.relpages
 be any help here?

Looking at that stuff will help determine if the index is bloated, or if
it's just bigger than optimal. Once you're done writing to an index, it
might be worth reindexing with a fillfactor of 100% to shrink things
down a bit.

 Is it a simple matter of running some queries, reindexing the table, then
 running the queries again to determine overall performance change?  If so,
 what queries would exercise this best?
 
 Just trying to determine if the extra cost of reindexing newly closed
 partitions will be worth the performance benefit of querying the data.
 Reindexing a table with a day's worth of data is taking on the order of a
 few hours (10s of millions of rows).
 
 The docs say that:
 
 ...for B-tree indexes a freshly-constructed index is somewhat faster to
 access than one that has been updated many times, because logically adjacent
 pages are usually also physically adjacent in a newly built index... It
 might be worthwhile to reindex periodically just to improve access speed.

That's the other consideration, though if you're seeing a big difference
I suspect it's an issue of indexes fitting in cache or not.
-- 
Jim Nasby  [EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgpZ9ie7CO9kL.pgp
Description: PGP signature


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Thomas Finneid


PFC wrote:
Unfortunately its not fast enough, it needs to be done in no more than 
1-2 seconds, ( and in production it will be maybe 20-50 columns of 
data, perhaps divided over 5-10 tables.)
Additionally it needs to scale to perhaps three times as many columns 
and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
Further on it needs to allow for about 20 - 50 clients reading much of 
that data before the next batch of data arrives.


Wow. What is the application ?


Geological surveys, where they perform realtime geo/hydro-phone shots of 
areas of the size of 10x10km every 3-15 seconds.




test= CREATE OR REPLACE FUNCTION test_insert( )
RETURNS VOID
LANGUAGE plpgsql
AS
$$
DECLARE
_i INTEGER;
BEGIN
FOR _i IN 0..10 LOOP
INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms

test= SELECT test_insert();
 test_insert
-

(1 ligne)

Temps : 1885,382 ms


I tested this one and it took 4 seconds, compared to the jdbc insert 
which took 14 seconds, so its a lot faster. but not as fast as the 
SELECT INTO.


I also tested an INSERT INTO FROM SELECT, which took 1.8 seconds, now we 
are starting to talk about real performance.



However COPY is much faster because the parsing overhead and 
de-escaping of data is faster. COPY is optimized for throughput.


So, advice :

For optimum throughput, have your application build chunks of data 
into text files and use COPY. Or if your client lib supports the copy 
interface, use it.


I did test COPY, i.e. the jdbc COPY patch for pg 8.1, it performs at 
approx 1.8 seconds :) The test was done with text input, I am going to 
test it with binary input, which I expect will increase the performance 
with 20-50%.


All these test have ben performed on a laptop with a Kubuntu 6.10 
version of pg 8.1 without any special pg performance tuning. So I expect 
that compiling lates pg and doing some tuning on it and testing it on 
the a representative server will give it an additional boost in performance.


The key here is that with abundance in performance, I can experiment 
with the solution in a completely different way than if I had any 
artificial restrictions.


You will need a fast disk system with xlog and data on separate 
disks, several CPU cores (1 insert thread will max out 1 core, use the 
others for selects), lots of RAM so index updates don't need to seek, 
and tuning of bgwriter and checkpoints to avoid load spikes.


will have a look at it.

regards

thomas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Thomas Finneid

Michael Glaesemann wrote:


As they're individual inserts, I think what you're seeing is overhead 
from calling this statement 100,000 times, not just on the server but 
also the overhead through JDBC. For comparison, try


CREATE TABLE ciu_data_type_copy LIKE ciu_data_type;

INSERT INTO ciu_data_type_copy (id, loc_id, value3, value5, value8, 
value9, value10, value11)

SELECT id, loc_id, value3, value5, value8, value9, value10, value11
FROM ciu_data_type;

I think this would be more comparable to what you're seeing.


This is much faster than my previous solution, but, I also tested two 
other solutions

- a stored function with array arguments and it performed 3 times better.
- jdbc with COPY patch performed 8.4 times faster with text input, 
expect binary input to be even faster.


regards

thomas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Thomas Finneid



Michael Stone wrote:

On Tue, Jul 17, 2007 at 11:01:15PM +0200, Thomas Finneid wrote:

Arjen van der Meijden wrote:
Have you also tried the COPY-statement? Afaik select into is similar 
to what happens in there.


No, because it only works on file to db or vice versa not table to table.


I don't understand how the insert you described is table to table?


SELECT INTO is table to table, so is INSERT INTO SELECT FROM.

regards

thomas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] improvement suggestions for performance design

2007-07-18 Thread Thomas Finneid


Kalle Hallivuori wrote:

Hi.

2007/7/8, Thomas Finneid [EMAIL PROTECTED]:


Kalle Hallivuori wrote:

  COPY is plentitudes faster than INSERT:
  http://www.postgresql.org/docs/8.1/interactive/sql-copy.html
 
  If you can't just push the data straight into the final table with
  COPY, push it into a temporary table that you go through with the
  database procedure.
 
  Shameless plug: If you use Java and miss COPY functionality in the
  driver, it's available at
 
  http://kato.iki.fi/sw/db/postgresql/jdbc/copy/
 
  I was able to practically nullify time spent inserting with that.

Interresting, I will definately have a look at it.
What is the maturity level of the code at this point? and what is
potentially missing to bring it up to production quality? (stability is
of the utmost importance in my usage scenario.)


It's my third implementation, based on earlier work by Kris Jurka, a
maintainer of the JDBC driver. (It is really quite short so it's easy
to keep it clear.) I consider it mature enough to have accommodated it
as part of an upcoming large application, but I'd really like to hear
others' opinions. Documentation I should add one of these days, maybe
even rewrite the javadoc.


Hi I have tested your COPY patch (actually I tested 
postgresql-jdbc-8.2-505-copy-20070716.jdbc3.jar) and it is really fast, 
actually just as fast as serverside COPY (boths tests was performed on 
local machine).


This means I am interrested in using it in my project, but I have some 
concerns that needs to be adressed, (and I am prepared to help in any 
way I can). The following are the concerns I have


- While testing I got some errors, which needs to be fixed (detailed below)
- The patch must be of production grade quality
- I would like the patch to be part of the official pg JDBC driver.


The error I got the most is :

This command runs a single run, single thread and generates 1 rows 
of data


[EMAIL PROTECTED]:~/svn/pores$ java -server -Xms20m -Xmx256m -cp 
/usr/java/jdk1.5.0_06/jre/lib/rt.jar:.:src/:test/:conf/:lib/postgresql-jdbc-8.2-505-copy-20070716.jdbc3.jar 
wg.daemon.Main -m SINGLE_WRITE -t 1 -r 1 -c 1 -p CPBulk

Initialising connection...
Performing insert...
Build bulk data time: 0s 211ms
toString() bulk data time: 0s 4ms
time: 0s 205ms
org.postgresql.util.PSQLException: Unexpected command completion while 
copying: COPY
at 
org.postgresql.core.v3.QueryExecutorImpl.executeCopy(QueryExecutorImpl.java:706)

at org.postgresql.copy.CopyManager.copyIntoDB(CopyManager.java:50)
at org.postgresql.copy.CopyManager.copyIntoDB(CopyManager.java:37)
at wg.storage.pores1.CPBulk.addAttributes(CPBulk.java:72)
at wg.daemon.Daemon.run(Daemon.java:57)
[EMAIL PROTECTED]:~/svn/pores$ ls -al lib/


regards

thomas


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] insert vs select into performance

2007-07-18 Thread Heikki Linnakangas
Adriaan van Os wrote:
 So, how does one (temporarily) disable WAL logging ? Or, for example,
 disable WAL logging for a temporary table ?

Operations on temporary tables are never WAL logged. Operations on other
tables are, and there's no way to disable it.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Optmal tags design?

2007-07-18 Thread lists
I am planning to add a tags (as in the web 2.0 thing) feature to my web
based application. I would like some feedback from the experts here on
what the best database design for that would be.

The possibilities I have come up with are:
* A tags table containing the tag and id number of what it links to.
select pid from tags where tag='bla'
select tag from tags where pid=xxx.

* a tags table where each tag exists only once, and a table with the tag
ID and picture ID to link them together.

select pid from tags inner join picture_tags using(tag_id) where tag='bla'
select tag from tags inner join picture_tags using(tag_id) where pid='xxx'

* A full text index in the picture table containing the tags

select pid from pictures where tags @@ to_tsquery('bla')
(or the non-fti version)
select pid from pictures where tags ~* '.*bla.*'

select tags from pictures where pid=xxx;

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Optmal tags design?

2007-07-18 Thread Bryan Murphy

We store tags on our items like this like this:

Tag.ID INT NOT NULL PRIMARY KEY
Tag.Value TEXT LCASE NOT NULL UNIQUE

Item.ID INT NOT NULL PRIMARY KEY

ItemTagBinding.ItemID INT NOT NULL REFERENCES Item.ID
ItemTagBinding.TagID INT NOT NULL REFERENCES Tag.ID
ItemTagBinding.ItemID + ItemTagBinding.TagID UNIQUE

with appropriate indexes on the columns we need to frequently query.

We have about 3 million tag bindings right now, and have not run into any
performance issues related to tagging other than generating tag clouds
(which we pre-calculate anyway).

I'll have to get back to you when we get up to 10's, or even 100's of
millions and let you know how it scaled.

Bryan

On 7/18/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


I am planning to add a tags (as in the web 2.0 thing) feature to my web
based application. I would like some feedback from the experts here on
what the best database design for that would be.

The possibilities I have come up with are:
* A tags table containing the tag and id number of what it links to.
select pid from tags where tag='bla'
select tag from tags where pid=xxx.

* a tags table where each tag exists only once, and a table with the tag
ID and picture ID to link them together.

select pid from tags inner join picture_tags using(tag_id) where tag='bla'
select tag from tags inner join picture_tags using(tag_id) where pid='xxx'

* A full text index in the picture table containing the tags

select pid from pictures where tags @@ to_tsquery('bla')
(or the non-fti version)
select pid from pictures where tags ~* '.*bla.*'

select tags from pictures where pid=xxx;

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match