Re: [PERFORM] TRUNCATE TABLE
Tom Lane wrote: Thus the timing issue (at least as exhibited by this script) has nothing whatever to do with the time to delete a file, but with the time to create one. Since the part of DROP being timed has probably got no I/O involved at all (the tuples being touched are almost surely still in shared buffers), it's unsurprising that it is consistently fast. In my original profiling, CREATE TEMPORARY TABLE/DROP TABLE wasn't much faster than TRUNCATE TABLE. When I try it again now, I see that DROP TABLE is consistently fast, while the timings of CREATE TEMPORARY TABLE vary as much as those of TRUNCATE TABLE. Your observations on the time needed to open a file confirm that, I think. In my test databases, autovacuum is off. Regards, Adriaan van Os ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] importance of fast disks with pg
Thomas Finneid wrote: During the somes I did I noticed that it does not necessarily seem to be true that one needs the fastest disks to have a pg system that is fast. It seems to me that its more important to: - choose the correct methods to use for the operation - tune the pg memory settings - tune/disable pg xlog/wal etc It also seems to me that fast disks are more important for db systems of the OLTP type applications with real concurrency of both readers and writes across many, possibly larger, tables etc. Are the above statements close to having any truth in them? It depends. The key to performance is to identify the bottleneck. If your CPU is running at 50%, and spends 50% of the time waiting for I/O, a faster disk will help. But only up to a point. After you add enough I/O capability that the CPU is running at 100%, getting faster disks doesn't help anymore. At that point you need to get more CPU power. Here's the algorithm for increasing application throughput: while throughput is not high enough { identify bottleneck resolve bottleneck, by faster/more hardware, or by optimizing application } -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] ionice to make vacuum friendier?
Ron Mayer wrote: Seems Linux has IO scheduling through a program called ionice. Has anyone here experimented with using it rather than vacuum sleep settings? I looked at that briefly for smoothing checkpoints, but it was unsuitable for that purpose because it only prioritizes reads, not writes. It maybe worth trying for vacuum, though vacuum too can do a lot of writes. In the worst case, the OS cache is saturated with dirty pages, which blocks all writes in the system. If it did prioritize writes as well, that would be *excellent*. Any kernel hackers out there looking for a project? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] insert vs select into performance
It's the time to parse statements, plan, execute, roundtrips with the client, context switches, time for your client library to escape the data and encode it and for postgres to decode it, etc. In a word : OVERHEAD. I know there is some overhead, but that much when running it batched...? Well, yeah ;) Unfortunately its not fast enough, it needs to be done in no more than 1-2 seconds, ( and in production it will be maybe 20-50 columns of data, perhaps divided over 5-10 tables.) Additionally it needs to scale to perhaps three times as many columns and perhaps 2 - 3 times as many rows in some situation within 1 seconds. Further on it needs to allow for about 20 - 50 clients reading much of that data before the next batch of data arrives. Wow. What is the application ? Test run on a desktop PC, Athlon 64 3200+, 2 IDE disks in RAID1 (pretty slow) : test= CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT); CREATE TABLE Temps : 11,463 ms test= INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 10 ) as a; INSERT 0 10 Temps : 721,579 ms OK, so you see, insert speed is pretty fast. With a better CPU and faster disks, you can get a lot more. test= TRUNCATE TABLE test; TRUNCATE TABLE Temps : 30,010 ms test= ALTER TABLE test ADD PRIMARY KEY (f); INFO: ALTER TABLE / ADD PRIMARY KEY créera un index implicite «test_pkey» pour la table «test» ALTER TABLE Temps : 100,577 ms test= INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 10 ) as a; INSERT 0 10 Temps : 1915,928 ms This includes the time to update the index. test= DROP TABLE test; DROP TABLE Temps : 28,804 ms test= CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT); CREATE TABLE Temps : 1,626 ms test= CREATE OR REPLACE FUNCTION test_insert( ) RETURNS VOID LANGUAGE plpgsql AS $$ DECLARE _i INTEGER; BEGIN FOR _i IN 0..10 LOOP INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i); END LOOP; END; $$; CREATE FUNCTION Temps : 51,948 ms test= SELECT test_insert(); test_insert - (1 ligne) Temps : 1885,382 ms Now you see, performing 100K individual inserts inside a plpgsql function is also fast. The postgres engine is pretty damn fast ; it's the communication overhead that you feel, especially switching between client and server processes. Another example : = INSERT INTO test (a,b,c,d,e,f) VALUES (... 10 integer tuples) INSERT 0 10 Temps : 1836,458 ms VALUES is actually pretty fast. Here, there is no context switch, everything is done in 1 INSERT. However COPY is much faster because the parsing overhead and de-escaping of data is faster. COPY is optimized for throughput. So, advice : For optimum throughput, have your application build chunks of data into text files and use COPY. Or if your client lib supports the copy interface, use it. You will need a fast disk system with xlog and data on separate disks, several CPU cores (1 insert thread will max out 1 core, use the others for selects), lots of RAM so index updates don't need to seek, and tuning of bgwriter and checkpoints to avoid load spikes. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] insert vs select into performance
On Tue, Jul 17, 2007 at 10:58:01PM +0200, Thomas Finneid wrote: I am not sure I understand you correctly here, are you saying that SELECT INTO in 8.1 disables WAL logging and uses just a single fsync at the end? in that case it means that I could disable WAL as well and achieve the same performance, does it not? Yes. The difference is that the select into optimization just means that if the system crashes the data you're inserting is invalid (and is properly cleaned up), and disabling the WAL means that if the system crashes everything is invalid (and can't be cleaned up). Mike Stone ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] insert vs select into performance
On Tue, Jul 17, 2007 at 11:01:15PM +0200, Thomas Finneid wrote: Arjen van der Meijden wrote: Have you also tried the COPY-statement? Afaik select into is similar to what happens in there. No, because it only works on file to db or vice versa not table to table. I don't understand how the insert you described is table to table? Mike Stone ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] How to use a trigger to write rows to a remote server
Hi All, I am trying to find out how to use a trigger function on a table to copy any inserted row to a remote PG server. ie: Row X is inserted into TableX in DB1 on server1TableX trigger function fires and contacts DB2 on server2 and inserts the row into TableY on server2. I've looked around and can't see to find this. Essentially I need to know how to write to a remote DB server from within a trigger function. This is not replication, I'm not interested in a full blown trigger based replication solution. Any Help is greatly appreciated! Thanks Mike
Re: [PERFORM] How to use a trigger to write rows to a remote server
On Wed, 2007-07-18 at 15:36, Michael Dengler wrote: Row X is inserted into TableX in DB1 on server1TableX trigger function fires and contacts DB2 on server2 and inserts the row into TableY on server2. This kind of problem is usually solved more robustly by inserting the change into a local table and let the remote server (or some external program) poll that periodically, and make the necessary changes to the remote server. This method does not have the problems Heikki mentions in his reply with disconnections and transaction rollbacks, as the external program/remote server will only see committed transactions and it can apply the accumulated changes after connection is recovered in case of failure, without blocking the activity on the master. This is also covered in a few past posts on the postgres lists (I guess you should look in the general list for that), in particular you could be interested in the possibility of notifications if you want your poller to be notified immediately when a change occurs. Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] How to use a trigger to write rows to a remote server
Michael Dengler [EMAIL PROTECTED] writes: I am trying to find out how to use a trigger function on a table to copy any inserted row to a remote PG server. ... This is not replication, I'm not interested in a full blown trigger based replication solution. To be blunt, you're nuts. You *are* building a trigger based replication system, and the fact that you think you can cut corners just shows how little you know about the problems involved. Use Slony, or some other solution that someone else has already gotten the bugs out of. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to use a trigger to write rows to a remote server
On Wed, 2007-07-18 at 16:02 +0200, Csaba Nagy wrote: On Wed, 2007-07-18 at 15:36, Michael Dengler wrote: Row X is inserted into TableX in DB1 on server1TableX trigger function fires and contacts DB2 on server2 and inserts the row into TableY on server2. This kind of problem is usually solved more robustly by inserting the change into a local table and let the remote server (or some external If you don't want to build your own push/pull system [actually hard to do well] then use something like xmlBlaster or some other MOM. You get logging, transactions, and other features thrown in. http://www.xmlblaster.org/xmlBlaster/doc/requirements/contrib.replication.html -- Adam Tauno Williams, Network Systems Administrator Consultant - http://www.whitemiceconsulting.com Developer - http://www.opengroupware.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Parrallel query execution for UNION ALL Queries
Hi, If I have a query such as: SELECT * FROM (SELECT * FROM A) UNION ALL (SELECT * FROM B) WHERE blah='food'; Assuming the table A and B both have the same attributes and the data between the table is not partitioned in any special way, does Postgresql execute WHERE blah=food on both table simultaiously or what? If not, is there a way to execute the query on both in parrallel then aggregate the results? To give some context, I have a very large amount of new data being loaded each week. Currently I am partitioning the data into a new table every month which is working great from a indexing standpoint. But I want to parrallelize searches if possible to reduce the perofrmance loss of having multiple tables. Benjamin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Parrallel query execution for UNION ALL Queries
On 7/18/07, Benjamin Arai [EMAIL PROTECTED] wrote: Hi, If I have a query such as: SELECT * FROM (SELECT * FROM A) UNION ALL (SELECT * FROM B) WHERE blah='food'; Assuming the table A and B both have the same attributes and the data between the table is not partitioned in any special way, does Postgresql execute WHERE blah=food on both table simultaiously or what? If not, is there a way to execute the query on both in parrallel then aggregate the results? To give some context, I have a very large amount of new data being loaded each week. Currently I am partitioning the data into a new table every month which is working great from a indexing standpoint. But I want to parrallelize searches if possible to reduce the perofrmance loss of having multiple tables. Most of the time, the real issue would be the I/O throughput for such queries, not the CPU capability. If you have only one disk for your data storage, you're likely to get WORSE performance if you have two queries running at once, since the heads would not be going back and forth from one data set to the other. EnterpriseDB, a commercially enhanced version of PostgreSQL can do query parallelization, but it comes at a cost, and that cost is making sure you have enough spindles / I/O bandwidth that you won't be actually slowing your system down. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] How to use a trigger to write rows to a remote server
Hmm..I was hoping to avoid personal insults Anyway, Nuts or not...what I am attempting is to simply have row from one table inserted into another servers DB I don't see it as replication because: a) The destination table will have a trigger that modifies the arriving data to fit its table scheme. b) It is not critical that the data be synchronous (ie a lost row on the destination DB is not a big deal) c) I see as more of a provision of data to the destination DB NOT A REPLICATION OF DATA. Essentially the remote server just wants to know when some record arrives at the source server and wants to know some of the info contained in the new record. And yes it may be that I know little about the myriad of problems involved with replication...but I do know how to carry on a civil, adult conversationmaybe we can have a knowledge exchange. Cheers Mike On 7/18/07, Tom Lane [EMAIL PROTECTED] wrote: Michael Dengler [EMAIL PROTECTED] writes: I am trying to find out how to use a trigger function on a table to copy any inserted row to a remote PG server. ... This is not replication, I'm not interested in a full blown trigger based replication solution. To be blunt, you're nuts. You *are* building a trigger based replication system, and the fact that you think you can cut corners just shows how little you know about the problems involved. Use Slony, or some other solution that someone else has already gotten the bugs out of. regards, tom lane
Re: [PERFORM] How to use a trigger to write rows to a remote server
On Jul 18, 2007, at 11:30 AM, Michael Dengler wrote: Hmm..I was hoping to avoid personal insults Anyway, Nuts or not...what I am attempting is to simply have row from one table inserted into another servers DB I don't see it as replication because: a) The destination table will have a trigger that modifies the arriving data to fit its table scheme. b) It is not critical that the data be synchronous (ie a lost row on the destination DB is not a big deal) c) I see as more of a provision of data to the destination DB NOT A REPLICATION OF DATA. Essentially the remote server just wants to know when some record arrives at the source server and wants to know some of the info contained in the new record. And yes it may be that I know little about the myriad of problems involved with replication...but I do know how to carry on a civil, adult conversationmaybe we can have a knowledge exchange. Cheers Mike Mike, If all you need is for your trigger to make a simple query on another db then you can use dblink or an untrusted version of one of the available procedural languages such as plperlu or plpythonu. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] When/if to Reindex
We're using Postgres 8.2.4. I'm trying to decide whether it's worthwhile to implement a process that does periodic reindexing. In a few ad hoc tests, where I've tried to set up data similar to how our application does it, I've noticed decent performance increases after doing a reindex as well as the planner being more likely to choose an index scan. Some background: we make extensive use of partitioned tables. In fact, I'm really only considering reindexing partitions that have just closed. In our simplest/most general case, we have a table partitioned by a timestamp column, each partition 24 hours wide. The partition will have an index on the timestamp column as well as a few other indexes including a primary key index (all b-tree). Is there a programmatic way I can decide, upon the closing of a partition, which, if any, of these indexes will benefit from a reindex? Can I determine things like average node density, node depth, or any other indication as to the quality of an index? Will pg_class.relpages be any help here? Is it a simple matter of running some queries, reindexing the table, then running the queries again to determine overall performance change? If so, what queries would exercise this best? Just trying to determine if the extra cost of reindexing newly closed partitions will be worth the performance benefit of querying the data. Reindexing a table with a day's worth of data is taking on the order of a few hours (10s of millions of rows). The docs say that: ...for B-tree indexes a freshly-constructed index is somewhat faster to access than one that has been updated many times, because logically adjacent pages are usually also physically adjacent in a newly built index... It might be worthwhile to reindex periodically just to improve access speed. Thanks, Steve
Re: [PERFORM] ionice to make vacuum friendier?
On Wed, Jul 18, 2007 at 10:03:00AM +0100, Heikki Linnakangas wrote: Ron Mayer wrote: Seems Linux has IO scheduling through a program called ionice. Has anyone here experimented with using it rather than vacuum sleep settings? I looked at that briefly for smoothing checkpoints, but it was unsuitable for that purpose because it only prioritizes reads, not writes. It maybe worth trying for vacuum, though vacuum too can do a lot of writes. In the worst case, the OS cache is saturated with dirty pages, which blocks all writes in the system. If it did prioritize writes as well, that would be *excellent*. Any kernel hackers out there looking for a project? My understanding is that FreeBSD will prioritize IO based on process priority, though I have no idea how it's actually accomplished or how effective it is. But if we put in special support for this for Linux we should consider FBSD as well. -- Jim Nasby [EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgpB5pBTxbkJ2.pgp Description: PGP signature
Re: [PERFORM] Parrallel query execution for UNION ALL Queries
On Wed, Jul 18, 2007 at 11:30:48AM -0500, Scott Marlowe wrote: EnterpriseDB, a commercially enhanced version of PostgreSQL can do query parallelization, but it comes at a cost, and that cost is making sure you have enough spindles / I/O bandwidth that you won't be actually slowing your system down. I think you're thinking ExtendDB. :) -- Jim Nasby [EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgp4I6wlmsXWx.pgp Description: PGP signature
Re: [PERFORM] When/if to Reindex
On Wed, Jul 18, 2007 at 01:08:30PM -0400, Steven Flatt wrote: We're using Postgres 8.2.4. I'm trying to decide whether it's worthwhile to implement a process that does periodic reindexing. In a few ad hoc tests, where I've tried to set up data similar to how our application does it, I've noticed decent performance increases after doing a reindex as well as the planner being more likely to choose an index scan. Some background: we make extensive use of partitioned tables. In fact, I'm really only considering reindexing partitions that have just closed. In our simplest/most general case, we have a table partitioned by a timestamp column, each partition 24 hours wide. The partition will have an index on the timestamp column as well as a few other indexes including a primary key index (all b-tree). Is there a programmatic way I can decide, upon the closing of a partition, which, if any, of these indexes will benefit from a reindex? Can I determine things like average node density, node depth, or any other indication as to the quality of an index? Will pg_class.relpages be any help here? Looking at that stuff will help determine if the index is bloated, or if it's just bigger than optimal. Once you're done writing to an index, it might be worth reindexing with a fillfactor of 100% to shrink things down a bit. Is it a simple matter of running some queries, reindexing the table, then running the queries again to determine overall performance change? If so, what queries would exercise this best? Just trying to determine if the extra cost of reindexing newly closed partitions will be worth the performance benefit of querying the data. Reindexing a table with a day's worth of data is taking on the order of a few hours (10s of millions of rows). The docs say that: ...for B-tree indexes a freshly-constructed index is somewhat faster to access than one that has been updated many times, because logically adjacent pages are usually also physically adjacent in a newly built index... It might be worthwhile to reindex periodically just to improve access speed. That's the other consideration, though if you're seeing a big difference I suspect it's an issue of indexes fitting in cache or not. -- Jim Nasby [EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) pgpZ9ie7CO9kL.pgp Description: PGP signature
Re: [PERFORM] insert vs select into performance
PFC wrote: Unfortunately its not fast enough, it needs to be done in no more than 1-2 seconds, ( and in production it will be maybe 20-50 columns of data, perhaps divided over 5-10 tables.) Additionally it needs to scale to perhaps three times as many columns and perhaps 2 - 3 times as many rows in some situation within 1 seconds. Further on it needs to allow for about 20 - 50 clients reading much of that data before the next batch of data arrives. Wow. What is the application ? Geological surveys, where they perform realtime geo/hydro-phone shots of areas of the size of 10x10km every 3-15 seconds. test= CREATE OR REPLACE FUNCTION test_insert( ) RETURNS VOID LANGUAGE plpgsql AS $$ DECLARE _i INTEGER; BEGIN FOR _i IN 0..10 LOOP INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i); END LOOP; END; $$; CREATE FUNCTION Temps : 51,948 ms test= SELECT test_insert(); test_insert - (1 ligne) Temps : 1885,382 ms I tested this one and it took 4 seconds, compared to the jdbc insert which took 14 seconds, so its a lot faster. but not as fast as the SELECT INTO. I also tested an INSERT INTO FROM SELECT, which took 1.8 seconds, now we are starting to talk about real performance. However COPY is much faster because the parsing overhead and de-escaping of data is faster. COPY is optimized for throughput. So, advice : For optimum throughput, have your application build chunks of data into text files and use COPY. Or if your client lib supports the copy interface, use it. I did test COPY, i.e. the jdbc COPY patch for pg 8.1, it performs at approx 1.8 seconds :) The test was done with text input, I am going to test it with binary input, which I expect will increase the performance with 20-50%. All these test have ben performed on a laptop with a Kubuntu 6.10 version of pg 8.1 without any special pg performance tuning. So I expect that compiling lates pg and doing some tuning on it and testing it on the a representative server will give it an additional boost in performance. The key here is that with abundance in performance, I can experiment with the solution in a completely different way than if I had any artificial restrictions. You will need a fast disk system with xlog and data on separate disks, several CPU cores (1 insert thread will max out 1 core, use the others for selects), lots of RAM so index updates don't need to seek, and tuning of bgwriter and checkpoints to avoid load spikes. will have a look at it. regards thomas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] insert vs select into performance
Michael Glaesemann wrote: As they're individual inserts, I think what you're seeing is overhead from calling this statement 100,000 times, not just on the server but also the overhead through JDBC. For comparison, try CREATE TABLE ciu_data_type_copy LIKE ciu_data_type; INSERT INTO ciu_data_type_copy (id, loc_id, value3, value5, value8, value9, value10, value11) SELECT id, loc_id, value3, value5, value8, value9, value10, value11 FROM ciu_data_type; I think this would be more comparable to what you're seeing. This is much faster than my previous solution, but, I also tested two other solutions - a stored function with array arguments and it performed 3 times better. - jdbc with COPY patch performed 8.4 times faster with text input, expect binary input to be even faster. regards thomas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] insert vs select into performance
Michael Stone wrote: On Tue, Jul 17, 2007 at 11:01:15PM +0200, Thomas Finneid wrote: Arjen van der Meijden wrote: Have you also tried the COPY-statement? Afaik select into is similar to what happens in there. No, because it only works on file to db or vice versa not table to table. I don't understand how the insert you described is table to table? SELECT INTO is table to table, so is INSERT INTO SELECT FROM. regards thomas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] improvement suggestions for performance design
Kalle Hallivuori wrote: Hi. 2007/7/8, Thomas Finneid [EMAIL PROTECTED]: Kalle Hallivuori wrote: COPY is plentitudes faster than INSERT: http://www.postgresql.org/docs/8.1/interactive/sql-copy.html If you can't just push the data straight into the final table with COPY, push it into a temporary table that you go through with the database procedure. Shameless plug: If you use Java and miss COPY functionality in the driver, it's available at http://kato.iki.fi/sw/db/postgresql/jdbc/copy/ I was able to practically nullify time spent inserting with that. Interresting, I will definately have a look at it. What is the maturity level of the code at this point? and what is potentially missing to bring it up to production quality? (stability is of the utmost importance in my usage scenario.) It's my third implementation, based on earlier work by Kris Jurka, a maintainer of the JDBC driver. (It is really quite short so it's easy to keep it clear.) I consider it mature enough to have accommodated it as part of an upcoming large application, but I'd really like to hear others' opinions. Documentation I should add one of these days, maybe even rewrite the javadoc. Hi I have tested your COPY patch (actually I tested postgresql-jdbc-8.2-505-copy-20070716.jdbc3.jar) and it is really fast, actually just as fast as serverside COPY (boths tests was performed on local machine). This means I am interrested in using it in my project, but I have some concerns that needs to be adressed, (and I am prepared to help in any way I can). The following are the concerns I have - While testing I got some errors, which needs to be fixed (detailed below) - The patch must be of production grade quality - I would like the patch to be part of the official pg JDBC driver. The error I got the most is : This command runs a single run, single thread and generates 1 rows of data [EMAIL PROTECTED]:~/svn/pores$ java -server -Xms20m -Xmx256m -cp /usr/java/jdk1.5.0_06/jre/lib/rt.jar:.:src/:test/:conf/:lib/postgresql-jdbc-8.2-505-copy-20070716.jdbc3.jar wg.daemon.Main -m SINGLE_WRITE -t 1 -r 1 -c 1 -p CPBulk Initialising connection... Performing insert... Build bulk data time: 0s 211ms toString() bulk data time: 0s 4ms time: 0s 205ms org.postgresql.util.PSQLException: Unexpected command completion while copying: COPY at org.postgresql.core.v3.QueryExecutorImpl.executeCopy(QueryExecutorImpl.java:706) at org.postgresql.copy.CopyManager.copyIntoDB(CopyManager.java:50) at org.postgresql.copy.CopyManager.copyIntoDB(CopyManager.java:37) at wg.storage.pores1.CPBulk.addAttributes(CPBulk.java:72) at wg.daemon.Daemon.run(Daemon.java:57) [EMAIL PROTECTED]:~/svn/pores$ ls -al lib/ regards thomas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert vs select into performance
Adriaan van Os wrote: So, how does one (temporarily) disable WAL logging ? Or, for example, disable WAL logging for a temporary table ? Operations on temporary tables are never WAL logged. Operations on other tables are, and there's no way to disable it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Optmal tags design?
I am planning to add a tags (as in the web 2.0 thing) feature to my web based application. I would like some feedback from the experts here on what the best database design for that would be. The possibilities I have come up with are: * A tags table containing the tag and id number of what it links to. select pid from tags where tag='bla' select tag from tags where pid=xxx. * a tags table where each tag exists only once, and a table with the tag ID and picture ID to link them together. select pid from tags inner join picture_tags using(tag_id) where tag='bla' select tag from tags inner join picture_tags using(tag_id) where pid='xxx' * A full text index in the picture table containing the tags select pid from pictures where tags @@ to_tsquery('bla') (or the non-fti version) select pid from pictures where tags ~* '.*bla.*' select tags from pictures where pid=xxx; ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Optmal tags design?
We store tags on our items like this like this: Tag.ID INT NOT NULL PRIMARY KEY Tag.Value TEXT LCASE NOT NULL UNIQUE Item.ID INT NOT NULL PRIMARY KEY ItemTagBinding.ItemID INT NOT NULL REFERENCES Item.ID ItemTagBinding.TagID INT NOT NULL REFERENCES Tag.ID ItemTagBinding.ItemID + ItemTagBinding.TagID UNIQUE with appropriate indexes on the columns we need to frequently query. We have about 3 million tag bindings right now, and have not run into any performance issues related to tagging other than generating tag clouds (which we pre-calculate anyway). I'll have to get back to you when we get up to 10's, or even 100's of millions and let you know how it scaled. Bryan On 7/18/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I am planning to add a tags (as in the web 2.0 thing) feature to my web based application. I would like some feedback from the experts here on what the best database design for that would be. The possibilities I have come up with are: * A tags table containing the tag and id number of what it links to. select pid from tags where tag='bla' select tag from tags where pid=xxx. * a tags table where each tag exists only once, and a table with the tag ID and picture ID to link them together. select pid from tags inner join picture_tags using(tag_id) where tag='bla' select tag from tags inner join picture_tags using(tag_id) where pid='xxx' * A full text index in the picture table containing the tags select pid from pictures where tags @@ to_tsquery('bla') (or the non-fti version) select pid from pictures where tags ~* '.*bla.*' select tags from pictures where pid=xxx; ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match