[PERFORM] Insert performance vs Table size
Hi all I'm wondering if and how the size of a table affects speed of inserts into it? What if the table has indexes, does that alter the answer? Thanks ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Insert performance vs Table size
Praveen Raja: I think the size of a table don't affect the speed of inserts into it.Because PostgreSQL just doing something like append on the data files. But the index do speed-down the inserts. Because PostgreSQL should maintain the index when doing inserts. hope this is useful for your question. === 2005-06-27 19:24:06 you wrote:=== Hi all I'm wondering if and how the size of a table affects speed of inserts into it? What if the table has indexes, does that alter the answer? Thanks ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] = = = = = = = = = = = = = = = = = = = = Best regards! 李江华 Seamus Dean Alibaba.com TEL:0571-85022088-2287 [EMAIL PROTECTED] 2005-06-27 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Insert performance vs Table size
Hi, At 13:24 27/06/2005, Praveen Raja wrote: I'm wondering if and how the size of a table affects speed of inserts into it? What if the table has indexes, does that alter the answer? Many parameters will affect the result: - whether there are any indexes (including the primary key, unique constraints...) to update or not - whether there are any foreign keys from or to that table - the size of the rows - whether the table (or at least the bits being updated) fit in RAM or not - whether the table has holes (due to former updates/deletes and vacuum) and how they are placed - and probably a bunch of other things... Obviously, if you have an append-only (no updates, no deletes) table with no indexes and no foreign keys, the size of the table should not matter much. As soon as one of those conditions is not met table size will have an impact, probably small as long as whatever is needed can be held in RAM, a lot bigger once it's not the case. Hope that helps, Jacques. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Insert performance vs Table size
Just to clear things up a bit, the scenario that I'm interested in is a table with a large number of indexes on it (maybe 7-8). In this scenario other than the overhead of having to maintain the indexes (which I'm guessing is the same regardless of the size of the table), does the size of the table play a role in determining insert performance (and I mean only insert performance)? -Original Message- From: Jacques Caron [mailto:[EMAIL PROTECTED] Sent: 27 June 2005 13:40 To: Praveen Raja Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Insert performance vs Table size Hi, At 13:24 27/06/2005, Praveen Raja wrote: I'm wondering if and how the size of a table affects speed of inserts into it? What if the table has indexes, does that alter the answer? Many parameters will affect the result: - whether there are any indexes (including the primary key, unique constraints...) to update or not - whether there are any foreign keys from or to that table - the size of the rows - whether the table (or at least the bits being updated) fit in RAM or not - whether the table has holes (due to former updates/deletes and vacuum) and how they are placed - and probably a bunch of other things... Obviously, if you have an append-only (no updates, no deletes) table with no indexes and no foreign keys, the size of the table should not matter much. As soon as one of those conditions is not met table size will have an impact, probably small as long as whatever is needed can be held in RAM, a lot bigger once it's not the case. Hope that helps, Jacques. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Insert performance vs Table size
Hi, At 13:50 27/06/2005, Praveen Raja wrote: Just to clear things up a bit, the scenario that I'm interested in is a table with a large number of indexes on it (maybe 7-8). If you're after performance you'll want to carefully consider which indexes are really useful and/or redesign your schema so that you can have less indexes on that table. 7 or 8 indexes is quite a lot, and that really has a cost. In this scenario other than the overhead of having to maintain the indexes (which I'm guessing is the same regardless of the size of the table) Definitely not: indexes grow with the size of the table. Depending on what columns you index (and their types), the indexes may be a fraction of the size of the table, or they may be very close in size (in extreme cases they may even be larger). With 7 or 8 indexes, that can be quite a large volume of data to manipulate, especially if the values of the columns inserted can span the whole range of the index (rather than being solely id- or time-based, for instance, in which case index updates are concentrated in a small area of each of the indexes), as this means you'll need to have a majority of the indexes in RAM if you want to maintain decent performance. does the size of the table play a role in determining insert performance (and I mean only insert performance)? In this case, it's really the indexes that'll cause you trouble, though heavily fragmented tables (due to lots of deletes or updates) will also incur a penalty just for the data part of the inserts. Also, don't forget the usual hints if you are going to do lots of inserts: - batch them in large transactions, don't do them one at a time - better yet, use COPY rather than INSERT - in some situations, you might be better of dropping the indexes, doing large batch inserts, then re-creating the indexes. YMMV depending on the existing/new ratio, whether you need to maintain indexed access to the tables, etc. - pay attention to foreign keys Jacques. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance - moving from oracle to postgresql
There are some immediate questions from our engineers about performance - Oracle has one particular performance enhancement that Postgres is missing. If you do a select that returns 100,000 rows in a given order, and all you want are rows 99101 to 99200, then Oracle can do that very efficiently. With Postgres, it has to read the first 99200 rows and then discard the first 99100. But... If we really want to look at performance, then we ought to put together a set of benchmarks of some typical tasks. I agree with Rod: you are correct but this is a very odd objection. You are declaring a set but are only interested in a tiny subset of that based on arbitrary critera. You can do this with cursors or with clever querying (not without materializing the full set however), but why? Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application
Hi ! My company is evaluating to compatibilizate our system (developed in C++) to PostgreSQL. Our programmer made a lot of tests and he informed me that the performance using ODBC is very similar than using libpq, even with a big number of simultaneous connections/queries. Of course that for us is simpler use ODBC because will be easier to maintan as we already support a lot of other databases using ODBC (MySQL, DB2, etc). Someone already had this experience? What are the key benefits using libpq insted of ODBC ? Our application have a heavy load and around 150 concorrent users. Regards, Rodrigo Carvalhaes -- Esta mensagem foi verificada pelo sistema de antivírus e acredita-se estar livre de perigo. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application
Hi ! My company is evaluating to compatibilizate our system (developed in C++) to PostgreSQL. Our programmer made a lot of tests and he informed me that the performance using ODBC is very similar than using libpq, even with a big number of simultaneous connections/queries. Of course that for us is simpler use ODBC because will be easier to maintan as we already support a lot of other databases using ODBC (MySQL, DB2, etc). Someone already had this experience? What are the key benefits using libpq insted of ODBC ? Our application have a heavy load and around 150 concorrent users. The ODBC driver for postgresql implements its own protocol stack. Unfortunately, it is still on protocol revision 2 (out of 3). Also, IMO libpq is a little better tested and durable than the odbc driver. This naturally follows from the fact that libpq is more widely used and more actively developed than odbc. If you are heavily C++ invested you can consider wrapping libpq yourself if you want absolute maximum performance. If you happen to be developing on Borland platform give strong consideration to Zeos connection library which is very well designed (it wraps libpq). You might want to consider posting your question to the odbc list. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application
i would take a peek at psqlodbc-8.0 drivers .. i wouldn't battle with other version you might find such as (unixodbc ones) -elz -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Merlin Moncure Sent: 27 juin 2005 10:29 To: grupos Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application Hi ! My company is evaluating to compatibilizate our system (developed in C++) to PostgreSQL. Our programmer made a lot of tests and he informed me that the performance using ODBC is very similar than using libpq, even with a big number of simultaneous connections/queries. Of course that for us is simpler use ODBC because will be easier to maintan as we already support a lot of other databases using ODBC (MySQL, DB2, etc). Someone already had this experience? What are the key benefits using libpq insted of ODBC ? Our application have a heavy load and around 150 concorrent users. The ODBC driver for postgresql implements its own protocol stack. Unfortunately, it is still on protocol revision 2 (out of 3). Also, IMO libpq is a little better tested and durable than the odbc driver. This naturally follows from the fact that libpq is more widely used and more actively developed than odbc. If you are heavily C++ invested you can consider wrapping libpq yourself if you want absolute maximum performance. If you happen to be developing on Borland platform give strong consideration to Zeos connection library which is very well designed (it wraps libpq). You might want to consider posting your question to the odbc list. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [HACKERS] How two perform TPC-H test on postgresql-8.0.2
[moved to pgsql-performance] Currently I want to take a TPC-H test on postgresql-8.0.2. I have downloaded the DBGEN and QGEN from the homepage of TPC. But I encountered many problems which forced me to request some help. 1. How to load the data from flat file generated by dbgen tool? To the best of my knowledge, there is a SQL Loader in Oracle 2. How to simulate the currency environment? Where can I download a client which connects to DB server through ODBC? Get DBT3 from Sourceforge (search on osdldbt). This is OSDL's TPCH-like test. However, given your knowledge of PostgreSQL you're unlikely to get any kind of result you can use -- TPCH requires siginficant database tuning knowledge. I don't necessarily agree. In fact, I remember reading the standards for one of the TPC benchmarks and it said you were not supposed to specifically tune for the test. Any submission, including one with stock settings, should be given consideration (and the .conf settings should be submitted along with the benchmark results). This can only help to increase the body of knowledge on configuring the database. Merlin ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] ETL optimization
Dennis Bjorklund wrote: On Thu, 23 Jun 2005, Bricklen Anderson wrote: iii. UNIQUE constraint on table t1. This didn't seem to perform too badly with fewer rows (preliminary tests), but as you'd expect, on error the whole transaction would roll back. Is it possible to skip a row if it causes an error, as opposed to aborting the transaction altogether? You don't need to roll back the whole transaction if you use savepoints or the exception features in pl/pgsql Take a look at this example: http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE Hmmm... forgot about savepoints. That's an interesting idea that I'll have to check out. I earlier mentioned that I was going to test the delete + insert version, and it works pretty well. I got it down to about 3 minutes using that method. I'll test the savepoint and the exception version that you listed as well. Thanks! -- ___ This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. ___ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance analysis of plpgsql code
On Tue, Jun 28, 2005 at 01:54:08AM +, Karl O. Pinc wrote: On 06/27/2005 06:33:03 PM, Michael Fuhr wrote: See timeofday(). That only gives you the time at the start of the transaction, so you get no indication of how long anything in the transaction takes. Did you read the documentation or try it? Perhaps you're thinking of now(), current_timestamp, and friends, which don't advance during a transaction; but as the documentation states, timeofday() returns the wall-clock time and does advance during transactions. I just ran tests on versions of PostgreSQL going back to 7.2.8 and in all of them timeofday() advanced during a transaction. Does it not work on your system? If not then something's broken -- what OS and version of PostgreSQL are you using? -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Poor index choice -- multiple indexes of the same
On 06/27/2005 05:37:41 PM, Josh Berkus wrote: Karl, Seems to me that when there's a constant value in the query and an = comparision it will always be faster to use the (b-tree) index that's ordered first by the constant value, as then all further blocks are guarenteed to have a higher relevant information density. At least when compared with another index that has the same columns in it. That really depends on the stats. Such a choice would *not* be appropriate if the comparison was expected to return 1- rows while the = condition applied to 15% of the table. We're talking internals here so I don't know what I'm talking about, but, when the = comparison returns 15% of the table you can find your way straight to the 1- (sic) relevent rows because that 15% is further sorted by the second column of the index. So that's one disk read and after that when you scan the rest of the blocks every datum is relevant/returned. So your scan will pass through fewer disk blocks. The only case that would make sense to consider using the other index is if the planner knew it could get the answer in 1 disk read, in which case it should be able to get the answer out of either index in one disk read as both indexes are on the same columns. What is your STATISTICS_TARGET for the relevant columns set to? STATISTICS_TARGET is the default, which I read as 10 the docs. When's the last time you ran analyze? I'm doing this in a torture test script, loading data. Every fibnocci number of rows * 100 I VACCUM ANALYZE. So, 100, 200, 300, 500, 800, etc. Just for grins I've created the index I'd like it to use and run VACUUM ANALYZE and shown the EXPLAIN ANALYZE below. If this is all updated, you want to post the pg_stats rows for the relevant columns? Pg_stats rows below. (I've tried to wrap the lines short so as not to mess up anybody's mailer.) # create index census_sname_date on census (sname, date); CREATE INDEX # vacuum analyze census; VACUUM # explain analyze select 1 from census where date '1975-9-21' and sname = 'RAD' and status != 'A' ; QUERY PLAN --- --- Index Scan using census_date_sname on census (cost=0.00..2169.51 rows=1437 width=0) (actual time=40.610..40.610 rows=0 loops=1) Index Cond: ((date '1975-09-21'::date) AND (sname = 'RAD'::bpchar)) Filter: (status 'A'::bpchar) Total runtime: 40.652 ms (4 rows) Compare with: # drop index census_date_sname; DROP INDEX # explain analyze select date from census where sname = 'RAD' and date '1975-9-21' and status != 'A' limit 1; QUERY PLAN --- --- Limit (cost=0.00..3.37 rows=1 width=4) (actual time=0.097..0.097 rows=0 loops=1) - Index Scan using census_sname_date on census (cost=0.00..5203.95 rows=1544 width=4) (actual time=0.094..0.094 rows=0 loops=1) Index Cond: ((sname = 'RAD'::bpchar) AND (date '1975-09-21'::date)) Filter: (status 'A'::bpchar) Total runtime: 0.133 ms (5 rows) # select * from pg_stats where tablename = 'census' and (attname = 'sname' or attname = 'date'); schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation +---+-+---+---+--- -+ -+ -- --+--- -+ - babase | census | date | 0 | 4 | 4687 | {1979-02-01,1976-06-16,1977-03-23,1978-08-25,1979-09-20,1971-06-28 ,1972-04-28,1972-08-27,1974-04-06,1975-03-19} | {0.002,0.0017,0.0017,0.0017,0.0017,0.0013 ,0.0013,0.0013,0.0013,0.0013} | {1959-07-15,1966-02-18,1969-02-22,1971-01-10,1972-07-26,1974-02-09 ,1975-05-27,1976-07-28,1977-08-19,1978-08-07,1979-10-02} | 1 babase | census | sname | 0 | 7 | 177 | {MAX,ALT,PRE,COW,EST,JAN,RIN,ZUM,DUT,LUL} | {0.017,0.015,0.015,0.0146667 ,0.014,0.014,0.0136667,0.0136667,0.013,0.013} | {ALI,BUN,FAN,IBI,LER,NDO,PET,RUS,SLM,TOT,XEN} | 0.0446897 (2 rows) Thanks. Karl [EMAIL PROTECTED] Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance analysis of plpgsql code
On 06/27/2005 08:34:19 PM, Michael Fuhr wrote: On Tue, Jun 28, 2005 at 01:54:08AM +, Karl O. Pinc wrote: On 06/27/2005 06:33:03 PM, Michael Fuhr wrote: See timeofday(). That only gives you the time at the start of the transaction, so you get no indication of how long anything in the transaction takes. Did you read the documentation or try it? Perhaps you're thinking of now(), current_timestamp, and friends, which don't advance during a transaction; but as the documentation states, timeofday() returns the wall-clock time and does advance during transactions. Very sorry. I did not read through the complete documentation. I just ran tests on versions of PostgreSQL going back to 7.2.8 and in all of them timeofday() advanced during a transaction. For all your work a documentation patch is appended that I think is easier to read and might avoid this problem in the future. If you don't read all the way through the current cvs version then you might think, as I did, that timeofday() is a CURRENT_TIMESTAMP related function. Sorry, but 3 lines wrap in the patch in my email client. :( Karl [EMAIL PROTECTED] Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein --- func.sgml 2005-06-26 17:05:35.0 -0500 +++ func.sgml.new 2005-06-27 21:51:05.301097896 -0500 @@ -5787,15 +5787,6 @@ /para para -There is also the function functiontimeofday()/function, which for historical -reasons returns a typetext/type string rather than a typetimestamp/type value: -screen -SELECT timeofday(); -lineannotationResult: /lineannotationcomputeroutputSat Feb 17 19:07:32.000126 2001 EST/computeroutput -/screen - /para - - para It is important to know that functionCURRENT_TIMESTAMP/function and related functions return the start time of the current transaction; their values do not @@ -5803,8 +5794,7 @@ the intent is to allow a single transaction to have a consistent notion of the quotecurrent/quote time, so that multiple modifications within the same transaction bear the same -time stamp. functiontimeofday()/function -returns the wall-clock time and does advance during transactions. +time stamp. /para note @@ -5815,6 +5805,18 @@ /note para +There is also the function functiontimeofday()/function which +returns the wall-clock time and advances during transactions. For +historical reasons functiontimeofday()/function returns a +typetext/type string rather than a typetimestamp/type +value: +screen +SELECT timeofday(); +lineannotationResult: /lineannotationcomputeroutputSat Feb 17 19:07:32.000126 2001 EST/computeroutput +/screen + /para + + para All the date/time data types also accept the special literal value literalnow/literal to specify the current date and time. Thus, the following three all return the same result: ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Poor index choice -- multiple indexes of the same
On 06/27/2005 09:36:51 PM, Karl O. Pinc wrote: I'm doing this in a torture test script, loading data. Every fibnocci number of rows * 100 I VACCUM ANALYZE. So, 100, 200, 300, 500, 800, etc. (And of course disconnect my client and re-connect so as to use the new statistics. sure would be nice if I didn't have to do this.) Karl [EMAIL PROTECTED] Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] How can I speed up this function?
We have the following function in our home grown mirroring package, but it isn't running as fast as we would like. We need to select statements from the pending_statement table, and we want to select all the statements for a single transaction (pending_trans) in one go (that is, we either select all the statements for a transaction, or none of them). We select as many blocks of statements as it takes to top the 100 statement limit (so if the last transaction we pull has enough statements to put our count at 110, we'll still take it, but then we're done). Here is our function: CREATE OR REPLACE FUNCTION dbmirror.get_pending() RETURNS SETOF dbmirror.pending_statement AS $BODY$ DECLARE count INT4; transaction RECORD; statement dbmirror.pending_statement; BEGIN count := 0; FOR transaction IN SELECT t.trans_id as ID FROM pending_trans AS t WHERE fetched = false ORDER BY trans_id LIMIT 50 LOOP update pending_trans set fetched = true where trans_id = transaction.id; FOR statement IN SELECT s.id, s.transaction_id, s.table_name, s.op, s.data FROM dbmirror.pending_statement AS s WHERE s.transaction_id = transaction.id ORDER BY s.id ASC LOOP count := count + 1; RETURN NEXT statement; END LOOP; IF count 100 THEN EXIT; END IF; END LOOP; RETURN; END;$BODY$ LANGUAGE 'plpgsql' VOLATILE; Table Schemas: CREATE TABLE dbmirror.pending_trans ( trans_id oid NOT NULL, fetched bool DEFAULT false, CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id) ) WITHOUT OIDS; CREATE TABLE dbmirror.pending_statement ( id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text), transaction_id oid NOT NULL, table_name text NOT NULL, op char NOT NULL, data text NOT NULL, CONSTRAINT pending_statement_pkey PRIMARY KEY (id) ) WITHOUT OIDS; CREATE UNIQUE INDEX idx_stmt_tran_id_id ON dbmirror.pending_statement USING btree (transaction_id, id); Postgres 8.0.1 on Linux. Any Help would be greatly appreciated. Regards -- David Mitchell Software Engineer Telogis ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Performance analysis of plpgsql code
On Tue, Jun 28, 2005 at 03:03:06AM +, Karl O. Pinc wrote: For all your work a documentation patch is appended that I think is easier to read and might avoid this problem in the future. Patches should go to the pgsql-patches list -- the people who review and apply patches might not be following this thread. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How can I speed up this function?
What's wrong with Slony? David Mitchell wrote: We have the following function in our home grown mirroring package, but it isn't running as fast as we would like. We need to select statements from the pending_statement table, and we want to select all the statements for a single transaction (pending_trans) in one go (that is, we either select all the statements for a transaction, or none of them). We select as many blocks of statements as it takes to top the 100 statement limit (so if the last transaction we pull has enough statements to put our count at 110, we'll still take it, but then we're done). Here is our function: CREATE OR REPLACE FUNCTION dbmirror.get_pending() RETURNS SETOF dbmirror.pending_statement AS $BODY$ DECLARE count INT4; transaction RECORD; statement dbmirror.pending_statement; BEGIN count := 0; FOR transaction IN SELECT t.trans_id as ID FROM pending_trans AS t WHERE fetched = false ORDER BY trans_id LIMIT 50 LOOP update pending_trans set fetched = true where trans_id = transaction.id; FOR statement IN SELECT s.id, s.transaction_id, s.table_name, s.op, s.data FROM dbmirror.pending_statement AS s WHERE s.transaction_id = transaction.id ORDER BY s.id ASC LOOP count := count + 1; RETURN NEXT statement; END LOOP; IF count 100 THEN EXIT; END IF; END LOOP; RETURN; END;$BODY$ LANGUAGE 'plpgsql' VOLATILE; Table Schemas: CREATE TABLE dbmirror.pending_trans ( trans_id oid NOT NULL, fetched bool DEFAULT false, CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id) ) WITHOUT OIDS; CREATE TABLE dbmirror.pending_statement ( id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text), transaction_id oid NOT NULL, table_name text NOT NULL, op char NOT NULL, data text NOT NULL, CONSTRAINT pending_statement_pkey PRIMARY KEY (id) ) WITHOUT OIDS; CREATE UNIQUE INDEX idx_stmt_tran_id_id ON dbmirror.pending_statement USING btree (transaction_id, id); Postgres 8.0.1 on Linux. Any Help would be greatly appreciated. Regards ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How can I speed up this function?
Christopher Kings-Lynne wrote: What's wrong with Slony? Because it's not multi-master. Our mirroring package is. -- David Mitchell Software Engineer Telogis ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] How can I speed up this function?
What's wrong with Slony? Because it's not multi-master. Our mirroring package is. I'm curious - how did you write a multi-master replication package in pgsql, when pgsql doesn't have 2 phase commits or any kind of distributed syncing or conflict resolution in a release version? Chris ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How can I speed up this function?
David Mitchell wrote: We have the following function in our home grown mirroring package, but it isn't running as fast as we would like. We need to select statements from the pending_statement table, and we want to select all the statements for a single transaction (pending_trans) in one go (that is, we either select all the statements for a transaction, or none of them). We select as many blocks of statements as it takes to top the 100 statement limit (so if the last transaction we pull has enough statements to put our count at 110, we'll still take it, but then we're done). Here is our function: CREATE OR REPLACE FUNCTION dbmirror.get_pending() RETURNS SETOF dbmirror.pending_statement AS $BODY$ DECLARE count INT4; transaction RECORD; statement dbmirror.pending_statement; BEGIN count := 0; FOR transaction IN SELECT t.trans_id as ID FROM pending_trans AS t WHERE fetched = false ORDER BY trans_id LIMIT 50 LOOP update pending_trans set fetched = true where trans_id = transaction.id; FOR statement IN SELECT s.id, s.transaction_id, s.table_name, s.op, s.data FROM dbmirror.pending_statement AS s WHERE s.transaction_id = transaction.id ORDER BY s.id ASC LOOP count := count + 1; RETURN NEXT statement; END LOOP; IF count 100 THEN EXIT; END IF; END LOOP; RETURN; END;$BODY$ LANGUAGE 'plpgsql' VOLATILE; David, I'm still a newbie and it may not affect performance but why are you aliasing the tables? Can you not simply use FOR transaction IN SELECT trans_id FROM pending_trans WHERE fetched = false ORDER BY trans_id LIMIT 50 and FOR statement IN SELECT id, transaction_id, table_name, op, data FROM dbmirror.pending_statement WHERE pending_statement.transaction_id = transaction.trans_id ORDER BY pending_statement.id I am pretty sure that the ORDER BY is slowing down both of these queries. Since you are going to go through the whole table eventually do you really need to sort the data at this point? -- Kind Regards, Keith ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] How can I speed up this function?
Christopher Kings-Lynne wrote: I'm curious - how did you write a multi-master replication package in pgsql, when pgsql doesn't have 2 phase commits or any kind of distributed syncing or conflict resolution in a release version? We didn't write it entirely in pgsql, there is a worker process that takes care of actually committing to the database. Cheers -- David Mitchell Software Engineer Telogis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] How can I speed up this function?
Hi Keith, Unfortunately, we must have those sorts. The statements within a transaction must be executed on the slave in the same order as they were on the master, and similarly, transactions must also go in the same order. As for aliasing the tables, that is just a remnant from previous versions of the code. Thanks David Keith Worthington wrote: I'm still a newbie and it may not affect performance but why are you aliasing the tables? Can you not simply use FOR transaction IN SELECT trans_id FROM pending_trans WHERE fetched = false ORDER BY trans_id LIMIT 50 and FOR statement IN SELECT id, transaction_id, table_name, op, data FROM dbmirror.pending_statement WHERE pending_statement.transaction_id = transaction.trans_id ORDER BY pending_statement.id I am pretty sure that the ORDER BY is slowing down both of these queries. Since you are going to go through the whole table eventually do you really need to sort the data at this point? -- David Mitchell Software Engineer Telogis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] How can I speed up this function?
Merge the two select statements like this and try, SELECT t.trans_id as ID,s.id, s.transaction_id, s.table_name, s.op, s.data FROM pending_trans AS t join dbmirror.pending_statement AS s on (s.transaction_id=t.id) WHERE t.fetched = false order by t.trans_id,s.id limit 100; If the above query works in the way you want, then you can also do the update using the same. with regards, S.Gnanavel -Original Message- From: [EMAIL PROTECTED] Sent: Tue, 28 Jun 2005 14:37:34 +1200 To: pgsql-performance@postgresql.org Subject: [PERFORM] How can I speed up this function? We have the following function in our home grown mirroring package, but it isn't running as fast as we would like. We need to select statements from the pending_statement table, and we want to select all the statements for a single transaction (pending_trans) in one go (that is, we either select all the statements for a transaction, or none of them). We select as many blocks of statements as it takes to top the 100 statement limit (so if the last transaction we pull has enough statements to put our count at 110, we'll still take it, but then we're done). Here is our function: CREATE OR REPLACE FUNCTION dbmirror.get_pending() RETURNS SETOF dbmirror.pending_statement AS $BODY$ DECLARE count INT4; transaction RECORD; statement dbmirror.pending_statement; BEGIN count := 0; FOR transaction IN SELECT t.trans_id as ID FROM pending_trans AS t WHERE fetched = false ORDER BY trans_id LIMIT 50 LOOP update pending_trans set fetched = true where trans_id = transaction.id; FOR statement IN SELECT s.id, s.transaction_id, s.table_name, s.op, s.data FROM dbmirror.pending_statement AS s WHERE s.transaction_id = transaction.id ORDER BY s.id ASC LOOP count := count + 1; RETURN NEXT statement; END LOOP; IF count 100 THEN EXIT; END IF; END LOOP; RETURN; END;$BODY$ LANGUAGE 'plpgsql' VOLATILE; Table Schemas: CREATE TABLE dbmirror.pending_trans ( trans_id oid NOT NULL, fetched bool DEFAULT false, CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id) ) WITHOUT OIDS; CREATE TABLE dbmirror.pending_statement ( id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text), transaction_id oid NOT NULL, table_name text NOT NULL, op char NOT NULL, data text NOT NULL, CONSTRAINT pending_statement_pkey PRIMARY KEY (id) ) WITHOUT OIDS; CREATE UNIQUE INDEX idx_stmt_tran_id_id ON dbmirror.pending_statement USING btree (transaction_id, id); Postgres 8.0.1 on Linux. Any Help would be greatly appreciated. Regards -- David Mitchell Software Engineer Telogis ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] LIKE search with ending % not optimized in v8
Hello. I believe in earlier versions, a query of the form select attrib from ttt where attrib like 'foo%' would be able to take advantage of an index. I have seen this in the past. Currently I am using v8.0.3. From what I can see is that the execultion plan seems to use a seq scan and to totally ignore the index. Is this the case? -Aditya ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] How can I speed up this function?
Hi Gnanavel, Thanks, but that will only return at most 100 statements. If there is a transaction with 110 statements then this will not return all the statements for that transaction. We need to make sure that the function returns all the statements for a transaction. Cheers David Gnanavel Shanmugam wrote: Merge the two select statements like this and try, SELECT t.trans_id as ID,s.id, s.transaction_id, s.table_name, s.op, s.data FROM pending_trans AS t join dbmirror.pending_statement AS s on (s.transaction_id=t.id) WHERE t.fetched = false order by t.trans_id,s.id limit 100; If the above query works in the way you want, then you can also do the update using the same. with regards, S.Gnanavel -Original Message- From: [EMAIL PROTECTED] Sent: Tue, 28 Jun 2005 14:37:34 +1200 To: pgsql-performance@postgresql.org Subject: [PERFORM] How can I speed up this function? We have the following function in our home grown mirroring package, but it isn't running as fast as we would like. We need to select statements from the pending_statement table, and we want to select all the statements for a single transaction (pending_trans) in one go (that is, we either select all the statements for a transaction, or none of them). We select as many blocks of statements as it takes to top the 100 statement limit (so if the last transaction we pull has enough statements to put our count at 110, we'll still take it, but then we're done). Here is our function: CREATE OR REPLACE FUNCTION dbmirror.get_pending() RETURNS SETOF dbmirror.pending_statement AS $BODY$ DECLARE count INT4; transaction RECORD; statement dbmirror.pending_statement; BEGIN count := 0; FOR transaction IN SELECT t.trans_id as ID FROM pending_trans AS t WHERE fetched = false ORDER BY trans_id LIMIT 50 LOOP update pending_trans set fetched = true where trans_id = transaction.id; FOR statement IN SELECT s.id, s.transaction_id, s.table_name, s.op, s.data FROM dbmirror.pending_statement AS s WHERE s.transaction_id = transaction.id ORDER BY s.id ASC LOOP count := count + 1; RETURN NEXT statement; END LOOP; IF count 100 THEN EXIT; END IF; END LOOP; RETURN; END;$BODY$ LANGUAGE 'plpgsql' VOLATILE; Table Schemas: CREATE TABLE dbmirror.pending_trans ( trans_id oid NOT NULL, fetched bool DEFAULT false, CONSTRAINT pending_trans_pkey PRIMARY KEY (trans_id) ) WITHOUT OIDS; CREATE TABLE dbmirror.pending_statement ( id oid NOT NULL DEFAULT nextval('dbmirror.statement_id_seq'::text), transaction_id oid NOT NULL, table_name text NOT NULL, op char NOT NULL, data text NOT NULL, CONSTRAINT pending_statement_pkey PRIMARY KEY (id) ) WITHOUT OIDS; CREATE UNIQUE INDEX idx_stmt_tran_id_id ON dbmirror.pending_statement USING btree (transaction_id, id); Postgres 8.0.1 on Linux. Any Help would be greatly appreciated. Regards -- David Mitchell Software Engineer Telogis ---(end of broadcast)--- TIP 8: explain analyze is your friend -- David Mitchell Software Engineer Telogis ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] select distinct on varchar -- wild performance differences!
Hey, all. I've bounced this around in #postgres for an hour or so, and it was suggested that I post it here as well. Hopefully someone can help me out. I have three machines. All have 512MB of ram. Machine A is a 2.0ghz celeron, running debian, pg verison 7.4.6. Machine B is a 1.8ghz celeron, running centos 3.4, pg verison 8.0.3. (7.3.9 also exhibited the behaviour below, by the way) Machine C is a 1.0ghz athlon, running centos 4.0, pg verison 7.4.7. The SAME data and schema is loaded (from a pg_dump, default parameters) onto all three machines. With the same query: select distinct model from exif_common, machines A and C return results quickly (1/4 second). Machine B chews on it for 30ish seconds! Note, this column is a VARCHAR(40). Here's an explain analyze for it. Machine A (fast): photos=# explain analyze select distinct model from exif_common; QUERY PLAN -- Unique (cost=2629.74..2732.11 rows=5 width=15) (actual time=211.358..265.049 rows=6 loops=1) - Sort (cost=2629.74..2680.93 rows=20473 width=15) (actual time=211.351..242.296 rows=20473 loops=1) Sort Key: model - Seq Scan on exif_common (cost=0.00..1163.73 rows=20473 width=15) (actual time=0.022..58.635 rows=20473 loops=1) Total runtime: 265.928 ms (5 rows) Machine B (slow): photos=# explain analyze select distinct model from exif_common; QUERY PLAN -- Unique (cost=2640.74..2743.11 rows=6 width=15) (actual time=27939.231..32914.134 rows=6 loops=1) - Sort (cost=2640.74..2691.93 rows=20473 width=15) (actual time=27939.222..27983.784 rows=20473 loops=1) Sort Key: model - Seq Scan on exif_common (cost=0.00..1174.73 rows=20473 width=15) (actual time=0.071..97.772 rows=20473 loops=1) Total runtime: 32915.031 ms (5 rows) ( yes, i know, six distinct rows out of 20,000 But holy moly! 1/4 sec vs 32.9 sec?!?! ) Now, if I do a similar query against an INT column, the speeds are more in line with each other: Machine A: photos=# explain analyze select distinct imagewidth from exif_common; QUERY PLAN - Unique (cost=2629.74..2732.11 rows=36 width=4) (actual time=179.899..225.934 rows=107 loops=1) - Sort (cost=2629.74..2680.93 rows=20473 width=4) (actual time=179.891..207.632 rows=20473 loops=1) Sort Key: imagewidth - Seq Scan on exif_common (cost=0.00..1163.73 rows=20473 width=4) (actual time=0.024..62.946 rows=20473 loops=1) Total runtime: 226.707 ms (5 rows) Machine B: photos=# explain analyze select distinct imagewidth from exif_common; QUERY PLAN - Unique (cost=2640.74..2743.11 rows=24 width=4) (actual time=209.394..287.131 rows=107 loops=1) - Sort (cost=2640.74..2691.93 rows=20473 width=4) (actual time=209.384..251.693 rows=20473 loops=1) Sort Key: imagewidth - Seq Scan on exif_common (cost=0.00..1174.73 rows=20473 width=4) (actual time=0.074..94.574 rows=20473 loops=1) Total runtime: 288.411 ms (5 rows) Machine C exhibits the same behaviour as A for all queries. This weird slow behaviour on machine B also appeared in 7.3.9. Upgrading didn't seem to help. neilc from irc thought it may be a qsort(2) quirk, but a sample C program I whipped up testing different sized data sets with a similar distribution gave very similar sort timings between the three machines.. Therefore, I don't think it's qsort(2) to blame... Anyone have any ideas as to what may be up with machine B? Thanks, -Elliott ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] parameterized LIKE does not use index
Hi, Consider the where-clauses: WHERE lower(col) LIKE 'abc'; WHERE lower(col) LIKE 'abc%'; these will both use a b-tree functional index in lower(col) if one exists. The clause WHERE lower(col) LIKE '%abc'; can't use the index as you would expect, because of the wildcard at the front (as mentioned in the manual). Thus, it has to do a seqscan, on what in my case is a very large table. But still that's not too bad, because I expect an overwhelming amount of the simple cases, and only very few that start with a percentage sign. Now, what's problematic is if i replace the literal with a parameter, like this: WHERE lower(col) LIKE ? It seems that the parameterized query gets compiled once, and because the parameter is not yet known, one cannot be sure it doesn't start with a percentage sign. Using the parameterized version causes ALL cases to use a seqscan. Of course, I could modify the application and send different SQL depending on which case we're in or just constructing a query with a literal each time, but is there a way to add a hint to the SQL that would cause the query to be re-planned if it's a case that could use the index? Or can I convince the (Perl) driver to do so? kurt. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Postgresql7.4.5 running slow on plpgsql function
hi, need some help with some experts here. currently we have a function that use together with temp table, it calls search result function, everytime this function is calling, it will go through some filter before come out as a result. now we have some major problem , the first time the function execute, it take about 13 second second time the function is execute, it take about 17 second, every time you execute the function the time taken will grow about 4 second, ? mayi know what going on here? since we use function with temp table, so every statement that related to temp table will using EXECUTE command. regards ivan
[PERFORM] index selection by query planner
Hi, I have a tablewith two indiceson the same column, one ofwhich is apartial index. I would like the query planner to use the partial index whenever the query condition lies in the range of the partial index as it would yield better performance.Is there any way to enforce the ordering for the indices? How does the query planner decide which index to use when a particular query is fired? 'Explain Analyze' showed the total index being used in a situation that could be fulfiled by the partial index. Thanks, Rohit How much free photo storage do you get? Store your friends n family photos for FREE with Yahoo! Photos. http://in.photos.yahoo.com
[PERFORM] Too slow querying a table of 15 million records
Hello! I use FreeBSD 4.11 with PostGreSQL 7.3.8. I got a huge database with roughly 15 million records. There is just one table, with a time field, a few ints and a few strings. table test fields time (timestamp), source (string), destination (string), p1 (int), p2 (int) I have run VACUUM ANALYZE ; I have created indexes on every field, but for some reason my postgre server wants to use a seqscan, even tho i know a indexed scan would be much faster. create index test_time_idx on test (time) ; create index test_source_idx on test (source) ; create index test_destination_idx on test (destination) ; create index test_p1_idx on test (p1) ; create index test_p2_idx on test (p2) ; What is really strange, is that when i query a count(*) on one of the int fields (p1), which has a very low count, postgre uses seqscan. In another count on the same int field (p1), i know he is giving about 2.2 million hits, but then he suddenly uses seqscan, instead of a indexed one. Isn't the whole idea of indexing to increase performance in large queries.. To make sort of a phonebook for the values, to make it faster to look up what ever you need... This just seems opposite.. Here is a EXPLAIN of my query database= explain select date_trunc('hour', time),count(*) as total from test where p1=53 and time now() - interval '24 hours' group by date_trunc order by date_trunc ; QUERY PLAN -- Aggregate (cost=727622.61..733143.23 rows=73608 width=8) - Group (cost=727622.61..731303.02 rows=736083 width=8) - Sort (cost=727622.61..729462.81 rows=736083 width=8) Sort Key: date_trunc('hour'::text, time) - Seq Scan on test (cost=0.00..631133.12 rows=736083 width=8) Filter: ((p1 = 53) AND (time (now() - '1 day'::interval))) (6 rows) database= drop INDEX test_TABULATOR test_source_idx test_destination_idxtest_p1_idx test_p2_idx test_time_idx After all this, i tried to set enable_seqscan to off and enable_nestedloops to on. This didnt help much either. The time to run the query is still in minutes. My results are the number of elements for each hour, and it gives about 1000-2000 hits per hour. I have read somewhere, about PostGreSQL, that it can easily handle 100-200million records. And with the right tuned system, have a great performance.. I would like to learn how :) I also found an article on a page (http://techdocs.postgresql.org/techdocs/pgsqladventuresep3.php): Tip #11: Don't bother indexing columns with huge numbers of records and a small range of values, such as BOOLEAN columns. This tip, regretfully, is perhaps the only tip where I cannot provide a good, real-world example from my work. So I'll give you a hypothetical situation instead: Imagine that you have a database table with a list of every establishment vending ice cream in the US. A simple example might look like: Where there were almost 1 million rows, but due to simplistic data entry, only three possible values for type (1-SUPERMARKET, 2-BOUTIQUE, and 3-OTHER) which are relatively evenly distributed. In this hypothetical situation, you might find (with testing using EXPLAIN) that an index on type is ignored and the parser uses a seq scan (or table scan) instead. This is because a table scan can actually be faster than an index scan in this situation. Thus, any index on type should be dropped. Certainly, the boolean column (active) requires no indexing as it has only two possible values and no index will be faster than a table scan. Then I ask, what is useful with indexing, when I can't use it on a VERY large database? It is on my 15 million record database it takes for ever to do seqscans over and over again... This is probably why, as i mentioned earlier, the reason (read the quote) why he chooses a full scan and not a indexed one... So what do I do? :confused: I'v used SQL for years, but never in such a big scale. Thus, not having to learn how to deal with large number of records. Usually a maximum of 1000 records. Now, with millions, I need to learn a way to make my sucky queries better. Im trying to learn more about tuning my system, makeing better queries and such. I'v found some documents on the Internet, but far from the best. Feedback most appreciated! Regards, a learning PostGreSQL user ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]