Re: [PERFORM] Postgresql optimisation
-Original Message- From: Denis BUCHER And each morning huge tables are DELETED and all data is INSERTed new from a script. (Well, huge is very relative, it's only 400'000 records) If you are deleting ALL rows in the tables, then I would suggest using TRUNCATE instead of DELETE. Truncate will be faster deleting and it will not accumulate dead tuples. Also if you switch to truncate then you should ANALYSE the tables after you finish inserting. Note that VACUUM ANALYSE is not necessary after a truncate/insert because there should be no dead tuples to vacuum. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgresql optimisation
From: Matthew Wakeling Perhaps reading the other replies in the thread before replying yourself might be advisable, because this previous reply directly contradicts you: On Wed, 28 Oct 2009, Kevin Grittner wrote: I recommend VACUUM ANALYZE of the table(s) after this step. Without that, the first query to read each tuple sets its hint bits and rewrites it, causing a surprising delay at unpredictable times (although heavier near the start of the day). There *is* a benefit of running VACUUM ANALYSE rather than just ANALYSE. Matthew I did read the other replies first, I guess I just missed Kevin Grittner's somehow. I noticed several people were worried the OP had problems with bloat, which is why I suggested TRUNCATE if possible. That was my main point. I guess I made the other comment because I feel beginners with postgres quite often don't understand the difference between VACUUM and ANALYSE, and for large tables an ANALYSE alone can take much less time. I didn't think about hint bits because I've never noticed a big impact from them, but that is probably just because of my particular situation. Now that it has been pointed out to me I agree it is good advise for the OP to use VACUUM ANALSE. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the role of #fsync and #synchronous_commit in configuration file .
From: keshav upadhyaya Subject: [PERFORM] What is the role of #fsync and #synchronous_commit in configuration file . Hi , I want to imporve the performance for inserting of huge data in my table . I have only one idex in table . First question - i want to know the role played by #fsync = onand #synchronous_commit = on I want to understand more in details what exactly had happened one is made them off , is it dangerous to do this ? as it will not sync the data in each commit . The settings are described in the docs: http://www.postgresql.org/docs/8.4/interactive/runtime-config-wal.html If you turn fsync off, you risk data loss in case of power or hardware failure. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU load on Postgres Server during Peak times!!!!
From: Shiva Raman Subject: Re: [PERFORM] High CPU load on Postgres Server during Peak times Andy Colson Wrote : , Eww. I think that's bad. A connection that has a transaction open will cause lots of row versions, which use up ram, and make it slower to step through the table (even with an index). You really need to fix up your code and make sure you commit transactions. (any statement (select, insert, update) will start a new transaction that you need to explicitly commit). With reference to this suggestion by Andy Colson, we checked the application code and found that only INSERT, UPDATE has COMMIT and SELECT has no commit, We are using a lot of Ajax Suggest in the all the forms accessed for fetching the data using SELECT statements which are not explicitly committed. We have started updating the code on this. You need a COMMIT for every BEGIN. If you just run a SELECT statement without first beginning a transaction, then you should not end up with a connection that is Idle in Transaction. If you are beginning a transaction, doing a select, and then not committing, then yes that is a bug. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select max() much slower than select min()
-Original Message- From: Brian Cox Subject: [PERFORM] select max() much slower than select min() seems like max() shouldn't take any longer than min() and certainly not 10 times as long. Any ideas on how to determine the max more quickly? That is odd. It seems like max should actually have to scan fewer rows than min should. It might still be bloat in the table, because unless you did VACUUM FULL there could still be dead rows. A vacuum verbose would show if there is bloat or not. Also maybe you could try a two column index like this: create index test_index on ts_stats_transet_user_interval (ts_interval_start_time, ts_id); Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance
We have two machines. Both running Linux Redhat, both running postgres 8.2.5. Both have nearly identical 125 GB databases. In fact we use PITR Recovery to Replicate from one to the other. I have to ask the obvious question. Do you regularly analyze the machine you replicate too? Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Yet another slow nested loop
-Original Message- From: Alexander Staubo - Nested Loop (cost=0.00..5729774.95 rows=10420 width=116) (actual time=262614.470..262614.470 rows=0 loops=1) Join Filter: ((photos.taken_at (event_instances.time + '-01:00:00'::interval)) AND (photos.taken_at (event_instances.time + '07:00:00'::interval))) - Nested Loop (cost=0.00..2055574.35 rows=11869630 width=120) (actual time=21.750..121838.012 rows=14013998 loops=1) Do you have any of the other enable_* options set to false? What do you have random_page_cost set to? I ask because I'm surprised to see postgres choose to loop when it knows it will have to loop 11 million times. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Unexpected query plan results
-Original Message- From: Anne Rosset Subject: Re: [PERFORM] Unexpected query plan results SELECT SUM(1) FROM item WHERE is_deleted = 'f'; sum - 1824592 (1 row) SELECT SUM(1) FROM item WHERE folder_id = 'tracker3641 /sf/sfmain/do/go/tracker3641?returnUrlKey=1243878161701'; sum 122412 (1 row) SELECT SUM(1) FROM item WHERE folder_id = 'tracker3641 /sf/sfmain/do/go/tracker3641?returnUrlKey=1243878161701' AND is_deleted = 'f'; sum - 71 (1 row) SELECT SUM(1) FROM item WHERE folder_id = 'tracker3641 /sf/sfmain/do/go/tracker3641?returnUrlKey=1243878161701' AND is_deleted = 't'; sum 122341 (1 row) Something's not right here. If the whole table has only 468173 rows, you can't have 1.8 million deleted rows where is_deleted = false. ...Robert The item table has 2324829 rows The artifact table has 468173 rows. Thanks, Anne I'd been thinking about the sort, but I hadn't thought yet if that index scan on item could be made faster. Could you post the table definition of item including the indexes on it? Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Unexpected query plan results
From: Anne Rosset Subject: [PERFORM] Unexpected query plan results Hi, We have one query which has a left join. If we run this query without the left join, it runs slower than with the left join. [snip] I am having a hard time to understand why the query runs faster with the left join. It looks like the query plan for the query without the left join is less than optimal. Adding the left join just seemed to shake things up enough that postgres picked a better plan. The slow step in the query without the left join appears to be sorting the result of a hash join so it can be used in a merge join. - Sort (cost=47640.91..47808.10 rows=66876 width=70) (actual time=4273.919..4401.387 rows=168715 loops=1) Sort Key: (artifact.id)::text - Hash Join (cost=9271.96..42281.07 rows=66876 width=70) (actual time=124.119..794.667 rows=184378 loops=1) The plan might be sped up by removing the sort or making the sort faster. Postgres thinks the Hash Join will only produce 66,876 rows, but it produces 184,378 rows. If it made a better estimate of the results of the hash join, it might not choose this plan. I don't really know if there is a way to improve the estimate on a join when the estimates of the inputs look pretty good. As a test you might try disabling sorts by setting enable_sort to false, then run the explain analyze again to see what you get. You might be able to make the sort faster by increasing work mem. What do you have work mem set to now and what version of Postgres are you using? Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Unexpected query plan results
From: Anne Rosset Subject: Re: [PERFORM] Unexpected query plan results Thank Dave. We are using postgresql-server-8.2.4-1PGDG and have work-mem set to 20MB. What value would you advise? thanks, Anne Work-mem is kind of tricky because the right setting depends on how much ram your machine has, is the machine dedicated to postgres, and how many simultaneous connections you have. If this is a test server, and not used in production, you could just play around with the setting and see if your query gets any faster. Here are the docs on work mem http://www.postgresql.org/docs/8.2/interactive/runtime-config-resource.html# RUNTIME-CONFIG-RESOURCE-MEMORY -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
What I don't understand is the part where you talking about disabling hash joins: * result: planner replaced hash join is replaced by merge join * execution time: 0.84ms ! * NOTE: curiously planner is expecting to execute this query in 0.29ms - so it's supposed from its logic to be faster, so why this plan is not used from the beginning???... Sort (cost=4562.83..4568.66 rows=2329 width=176) (actual time=0.237..0.237 rows=20 loops=1) Sort Key: h.horder Sort Method: quicksort Memory: 30kB - Merge Join (cost=4345.89..4432.58 rows=2329 width=176) (actual time=0.065..0.216 rows=20 loops=1) Merge Cond: (s.ref = h.ref_stat) - Index Scan using stat_ref_idx on stat s (cost=0.00..49.25 rows=1000 width=45) (actual time=0.018..0.089 rows=193 loops=1) - Sort (cost=4345.89..4351.72 rows=2329 width=135) (actual time=0.042..0.043 rows=20 loops=1) Sort Key: h.ref_stat Sort Method: quicksort Memory: 30kB - Index Scan using history_ref_idx on history h (cost=0.00..4215.64 rows=2329 width=135) (actual time=0.012..0.025 rows=20 loops=1) Index Cond: (ref_object = '01'::bpchar) Total runtime: 0.288 ms (12 rows) The explain analyze ran the query in 0.288 ms. That is the actual time it took to run the query on the server. It is not an estimate of the time. You measured 0.84 ms to run the query, which seems to imply either a problem in one of the timing methods or that 66% of your query execution time is sending the results to the client. I'm curious how you did you execution time measurements. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Deleting millions of rows
-Original Message- From: Brian Cox Subject: [PERFORM] Deleting millions of rows I'm using 8.3.5. Table ts_defects has 48M rows. Through psql: delete from ts_defects; Result: out of memory/Can't allocate size: 32 I then did 10 or so deletes to get rid of the rows. Afterwards, inserts into or queries on this table performed significantly slower. I tried a vacuum analyze, but this didn't help. To fix this, I dumped and restored the database. 1) why can't postgres delete all rows in a table if it has millions of rows? 2) is there any other way to restore performance other than restoring the database? Thanks, Brian If you are deleting an entire table, then the TRUNCATE command is the way to go. TRUNCATE is very fast and leaves no dead rows behind. The problem with a normal delete is that the rows are not actually removed from the file. Once the table is VACUUMED the dead space is marked as available to be reused, but plain VACUUM doesn't remove any space either. A VACUUM FULL or CLUSTER will actually remove dead space, but they can take a while to run. (I've heard CLUSTER is supposed to be faster than VACUUM FULL) Another way is to create a new table with the same definition as the old table, select the rows you want to keep into the new table, drop the old table, and then rename the new table to have the old table's name. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgresql performance
-Original Message- From: SPMLINGAM Subject: [PERFORM] postgresql performance Dear Friends, I have a table with 50 lakhs records, the table has more then 10 fields, i have primary key, i have select query with count(*) without any condition, it takes 17 seconds. 17 seconds to scan 5 million records doesn't sound that bad to me. Postgresql does not store a count of records, and so it has to actually scan the table to count all the records. This was a design choice because select count(*) isn't usually used in a production system. I have another one query which will do joins with other small tables, it takes 47 seconds to give output, the result has 2 lakhs records. the indexing is not used. I have created one index with one field ( which i used in this query, the field value has duplicates also ). You should post which version of Postgresql you are using, your table definition, and the output of EXPLAIN ANALYSE run on your query. If you have a lot of IO wait, you are most likely IO bound. When Postgresql is using a lot of CPU it is likely performing a sort or hashing. Pulling a large number of rows out of an even larger table can be difficult to do extremely quickly, but if you post the EXPLAIN ANALYZE output we would know if things could be improved or not. Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance
Re: [PERFORM] Query only slow on first run
-Original Message- From: tmp We have primarily two tables of interest here: questions (~100k rows) and posts (~400k rows). Each post refers to a question, but only the posts rows for which the corresponding question.status = 1 are relevant. This reduces the number of relevant question rows to about 10k. Earlier you said only a small subset of questions have a status of 1, so I assumed you meant like 100 not 10k :) According to the explain analyze there are only 646 rows in posts which match your criteria, so it does seem like scanning posts first might be the right thing to do. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query only slow on first run
-Original Message- From: cluster Probably by buying much faster disk hardware. Or buy more RAM, so that the data can stay cached. So the only problem here is lack of RAM and/or disk speed? I don't think you can reach that conclusion yet. Like everybody said the reason the query was faster the second time was that the disk pages were cached in RAM, and pulling the data out of RAM is way faster than disk. If I were you, I would try to optimize the query for when the disk pages aren't in RAM. In order to test the query without having anything cached you need to clear out Postgres's shared buffers and the OS cache. That can be tricky, but it may be as easy as running a big select on another table. As for optimizing the query, I noticed that all three joins are done by nested loops. I wonder if another join method would be faster. Have you analyzed all the tables? You aren't disabling hash joins or merge joins are you? If you aren't, then as a test I would try disabling nested loops by doing set enable_nestloop=false and see if the query is any faster for you. If it is faster without nested loops, then you might need to look into changing some settings. Dave ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query only slow on first run
-Original Message- From: cluster If I disable the nested loops, the query becomes *much* slower. A thing that strikes me is the following. As you can see I have the constraint: q.status = 1. Only a small subset of the data set has this status. I have an index on q.status but for some reason this is not used. Instead the constraint are ensured with a Filter: (q.status = 1) in an index scan for the primary key in the q table. If the small subset having q.status = 1 could be isolated quickly using an index, I would expect the query to perform better. I just don't know why the planner doesn't use the index on q.status. What version of Postgres are you using? Do you know what your join_collapse_limit is set to? You might be able to force it to scan for questions with a status of 1 first to see if it helps by changing the FROM clause to: FROM posts p, question_tags qt, (SELECT * FROM questions WHERE status = 1 OFFSET 0) q Dave ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD
-Original Message- From: Ow Mun Heng Subject: Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD Even for Postgresql, nested loops are still evil and hampers performance. I don't know about that. There are times when it is the right plan: explain analyze select * from table1 t1 inner join table2 t2 on t1.f_id = t2.id where t1.id = 'xyzzy'; QUERY PLAN Nested Loop (cost=0.00..17.65 rows=1 width=344) (actual time=0.080..0.096 rows=1 loops=1) - Index Scan using table1_pkey on table1 t (cost=0.00..9.18 rows=1 width=238) (actual time=0.044..0.048 rows=1 loops=1) Index Cond: ((id)::text = 'xyzzy'::text) - Index Scan using table2_pkey on table2 i (cost=0.00..8.46 rows=1 width=106) (actual time=0.019..0.023 rows=1 loops=1) Index Cond: (t.f_id = i.id) Total runtime: 0.224 ms set enable_nestloop=off; SET explain analyze select * from table1 t1 inner join table2 t2 on t1.f_id = t2.id where t1.id = 'xyzzy'; QUERY PLAN Hash Join (cost=9.18..72250.79 rows=1 width=344) (actual time=13493.572..15583.049 rows=1 loops=1) Hash Cond: (i.id = t.f_id) - Seq Scan on table2 i (cost=0.00..61297.40 rows=2188840 width=106) (actual time=0.015..8278.347 rows=2188840 loops=1) - Hash (cost=9.18..9.18 rows=1 width=238) (actual time=0.056..0.056 rows=1 loops=1) - Index Scan using table1_pkey on table1 t (cost=0.00..9.18 rows=1 width=238) (actual time=0.040..0.045 rows=1 loops=1) Index Cond: ((id)::text = 'xyzzy'::text) Total runtime: 15583.212 ms (I changed the table names, but everything else is real.) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Optimizing PostgreSQL for Windows
From: Christian Rengstl Subject: [PERFORM] Optimizing PostgreSQL for Windows Hi list, I have the following query: select t.a1, t.a2 from table1 t inner join table2 s using(id) where t.pid='xyz' and s.chromosome=9 order by s.pos; - Bitmap Heap Scan on table1 t (cost=388.25..27357.57 rows=22286 width=23) (actual time=112.595..32989.663 rows=22864 loops=1) Recheck Cond: ((pid)::text = 'xyz'::text) - Bitmap Index Scan on idx_table1 (cost=0.00..382.67 rows=22286 width=0) (actual time=103.790..103.790 rows=22864 loops=1) Index Cond: ((pid)::text = 'xyz'::text) The bitmap heap scan on table1 seems very slow. What version of Postgres are you using? There were performance enhancements in 8.1 and 8.2. What kind of a hard drive are you using? I would guess a single SATA drive would give you better performance than that, but I don't know for sure. Do you regularly vacuum the table? Not enough vacuuming can lead to tables filled with dead rows, which can increase the amount of data needing to be scanned considerably. Dave ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] select count(*) performance (vacuum did not help)
-Original Message- From: Gábor Farkas i see. will i achieve the same thing by simply dropping that table and re-creating it? Yes. Or even easier (if you don't need the data anymore) you can use the truncate command. Which deletes everything in the table including dead rows. Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!
From: smiley2211 Subject: Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!! - Seq Scan on encounters_questions_answers eqa (cost=1.00..17608.66 rows=464766 width=8) (actual time=0.003..735.934 rows=464766 loop s=7430) It looks like enable_seqscan is set to false. For some reason that might have worked on 7.4, but I would try turning that back on for 8.1. Sequential scans aren't always bad, sometimes they are faster than index scans. I would first try running the system with all the enable_* settings on. If you can't turn on logging its going to be very hard to track down the problem. The easiest way to track down a problem normally is to set log_min_duration to something like 2000ms. Then Postgres will log all slow queries. Then you can run EXPLAIN ANALYZE on the slow queries to find the problem. I think Carlos had a good idea when he asked about the encoding on the new server vs the old. Does your application use the like keyword to compare text fields? If so, you might need to create indexes which use the text_pattern_ops operator classes. With unicode postgres cannot use an index scan for a query like SELECT * FROM foo WHERE name LIKE 'Bob%' unless there is an index like CREATE INDEX name_index ON foo (name text_pattern_ops). However if you are not using like queries, then this is not your problem. More on operator classes: http://www.postgresql.org/docs/8.1/interactive/indexes-opclass.html Dave ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Low CPU Usage
From: [EMAIL PROTECTED] Subject: Re: [PERFORM] Low CPU Usage I have no idea how to continue researching this problem. Now I'm going to do some networks tests. I would go back to the slow program and try to capture the slow queries in the log file. Once you have some queries which are running slow then you can run EXPLAIN ANALYZE to see what the bottle neck is. It seems like you've found pgAdmin is slow sending across the network, but we don't know if that has anything to do with your original problems. Just my 2 cents. Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Query works when kludged, but would prefer best practice solution
I think Tom is talking about something like this: explain select * from foo(); QUERY PLAN -- Function Scan on foo (cost=0.00..12.50 rows=1000 width=50) The planner is estimating the function will return 1000 rows. explain select * from foo() where id 0; QUERY PLAN - Function Scan on foo (cost=0.00..15.00 rows=333 width=50) Filter: (id 0) In the second case I am asking for all ids greater than zero, but my ids are all positive integers. The planner doesn't know that, so it assumes the where clause will decrease the number of results. I would still say this is a kludge, and since you already found a kludge that works, this may not help you at all. Dave -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Carlo Stonebanks Sent: Tuesday, September 18, 2007 1:29 AM To: 'Tom Lane'; 'Merlin Moncure' Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query works when kludged, but would prefer best practice solution Hi Tom, Thanks for the suggestion - this concept is pretty new to me. Can you expand a bit on the idea of how to place such a dummy constraint on a function, and the conditions on which it affects the planner? Would this require that constraint_exclusion be set on? (When I go to sleep, I have a dream -- and in this dream Tom writes a brilliant three line code sample that makes it all clear to me, and I wake up a PostgreSQL guru) ;-) Carlo -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: September 17, 2007 11:30 PM To: Merlin Moncure Cc: Carlo Stonebanks; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query works when kludged, but would prefer best practice solution Merlin Moncure [EMAIL PROTECTED] writes: On 9/17/07, Carlo Stonebanks [EMAIL PROTECTED] wrote: Please see the section marked as PROBLEM in ORIGINAL QUERY plan below. This looks like it might be the problem tom caught and rigged a solution to: http://people.planetpostgresql.org/dfetter/index.php?/archives/134-PostgreSQ L-Weekly-News-September-03-2007.html (look fro band-aid). No, fraid not, that was about misestimation of outer joins, and I see no outer join here. What I do see is misestimation of a set-returning-function's output: - Function Scan on zips_in_mile_range (cost=0.00..12.50 rows=1000 width=40) (actual time=149.850..149.920 rows=66 loops=1) There's not any very nice way to improve that in existing releases :-(. In 8.3 it will be possible to add a ROWS option to function definitions to replace the default 1000 rows estimate with some other number, but that still helps little if the number of result rows is widely variable. As far as kluges go: rather than kluging conditions affecting unrelated tables, maybe you could put in a dummy constraint on the function's output --- ie, a condition you know is always true, but the planner won't know that, and will scale down its result-rows estimate accordingly. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] update, truncate and vacuum
From: Scott Feldstein Subject: [PERFORM] update, truncate and vacuum Hi, I have a couple questions about how update, truncate and vacuum would work together. 1) If I update a table foo (id int, value numeric (20, 6)) with update foo set value = 100 where id = 1 Would a vacuum be necessary after this type of operation since the updated value is a numeric? (as opposed to a sql type where its size could potentially change i.e varchar) Yes a vacuum is still necessary. The type doesn't really matter. Postgres effectively does a delete and insert on all updates. 2) After several updates/deletes to a table, if I truncate it, would it be necessary to run vacuum in order to reclaim the space? No a vacuum is not necessary after a truncate because the whole data file is deleted once a truncate commits. There aren't any dead rows because there aren't any rows. Dave ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] test / live environment, major performance difference
From: Christo Du Preez Sent: Tuesday, June 12, 2007 2:38 AM Where do I set the planner settings or are you reffering to settings in postgres.conf that may affect the planner? Yes I'm reffering to settings in postgres.conf. I'm wondering if enable_indexscan or something got turned off on the server for some reason. Here is a description of those settings: http://www.postgresql.org/docs/8.2/interactive/runtime-config-query.html So when you move data from the laptop to the server, I see that your script correctly runs an analyze after the load, so have you run analyze on the fast laptop lately? Hopefully running analyze wouldn't make the planner choose a worse plan on the laptop, but if we are trying to get things consistant between the laptop and server, that is something I would try. If the consistancy problem really is a problem of the planner not using index scans on the server, then if you can, please post the table definition for the table with a million rows and an EXPLAIN ANALYZE of a query which selects a few rows from the table. Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] How can fixed and variable width columns perform similarly?
I think the manual is implying that if you store a value like Sid in a field either of type varchar(128) or type text there is no performance difference. The manual is not saying that you get the same performance storing a 500k text field as when you store the value Sid. Dave -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Siddharth Anand Sent: Friday, April 27, 2007 10:32 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] How can fixed and variable width columns perform similarly? Hi! I read the link below and am puzzled by or curious about something. http://www.postgresql.org/docs/8.1/interactive/datatype-character.html The Tip below is intriguing Tip: There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type. While character(n) has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead. How can a field that doesn't have a limit like text perform similarly to char varying(128), for example? At some point, we need to write data to disk. The more data that needs to be written, the longer the disk write will take, especially when it requires finding free sectors to write to. Another interesting quote from the same page is the following: Long values are also stored in background tables so they do not interfere with rapid access to the shorter column values. If the long values are stored in a separate table, on a different part of the disk, doesn't this imply an extra disk seek? Won't it therefore take longer? Sid ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] not using indexes on large table
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeroen Kleijer The problems comes when I try to do a query without using a where clause because by then, it completely discards the indexes and does a complete table scan which takes over half an hour! (40.710.725 rows, 1110258 pages, 1715 seconds) I've tried several things but doing a query like: select distinct volume from project_access_times or select distinct qtree from project_access_times always result in a full sequential table scan even after a 'vacuum' and 'vacuum analyze'. To my knowledge Postgres doesn't use indexes for distinct queries or grouping. Also you are getting horrible IO performance. Our old slow test machine can scan a table of 12 million rows in 100 seconds, and our production server can do the same in 20 seconds. If possible, I would try running the same thing on your local hard drive. That way you can see how much the netapp and NFS are slowing you down. Although in the end if you need very fast distinct queries, you will need to maintain a separate table. Dave ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Basic Q on superfluous primary keys
I think a database with all natural keys is unrealistic. For example if you have a table that refers to people, are you going to use their name as a primary key? Names change all the time due to things like marriage, divorce, or trouble with the law. We have tables with 20 million rows which reference back to a table of people, and if I used the person's name as key, it would be a major pain when somebody's name changes. Even if there is referential integrity, one person might be referred to by 25% of the 20 million rows, so the update would take quite a long time. Also the table will be filled with dead rows and the indexes will likely be bloated. If I want to clean that up, it will take a vacuum full or a cluster which will lock the whole table and run for hours. If I use a surrogate key, I can change their name in one row and be done with it. Just my 2 cents. Dave ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Beginner Question
In your first post you said that the query is taking much longer than a second, and in your second post you say the performance is horrible, but explain analyze shows the query runs in 219 milliseconds, which doesn't seem too bad to me. I wonder if the slow part for you is returning all the rows to the client? How are you running this query? (JDBC, ODBC, C library?) Do you really need all the rows? Maybe you could use a cursor to page through the rows? Dave -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of s d Sent: Monday, April 09, 2007 7:46 PM To: Jan de Visser Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Beginner Question Hi Jan, Adding this Index slowed down things by a factor of 4. Also, the performance is so horrible (example bellow) that i am certain i am doing something wrong. Does the following explain gives any ideas ? Thanks =# EXPLAIN ANALYZE select * from word_association where (word1 ='the' or word2='the') and count 10; QUERY PLAN -- -- Bitmap Heap Scan on word_association (cost=250.86..7256.59 rows=4624 width=22) (actual time=13.461..211.568 rows=6601 loops=1) Recheck Cond: (((word1)::text = 'the'::text) OR ((word2)::text = 'the'::text)) Filter: (count 10) - BitmapOr (cost=250.86..250.86 rows=12243 width=0) (actual time=9.052..9.052 rows=0 loops=1) - Bitmap Index Scan on word_association_index1_1 (cost=0.00..153.20 rows=7579 width=0) (actual time=5.786..5.786 rows=7232 loops=1) Index Cond: ((word1)::text = 'the'::text) - Bitmap Index Scan on word_association_index2_1 (cost=0.00..95.34 rows=4664 width=0) (actual time=3.253..3.253 rows=4073 loops=1) Index Cond: ((word2)::text = 'the'::text) Total runtime: 219.987 ms (9 rows) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Weird performance drop
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Vincenzo Romano Is there any workaround? In my opinion the later the query planner decisions are taken the more effective they can be. It could be an option for the function (body) to delay any query planner decision. I think a possible workaround is to use a plpgsql function and the execute statement. The docs will have more info. Dave ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] scalablility problem
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Xiaoning Ding Hi all, When I run multiple TPC-H queries (DBT3) on postgresql, I found the system is not scalable. My machine has 8GB memory, and 4 Xeon Dual Core processor ( 8 cores in total). OS kernel is linux 2.6.9. Postgresql is 7.3.18. Is there anyway you can upgrade to 8.2? There have been a lot of performance and scalability enhancements. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Weird performance drop
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Vincenzo Romano I thought that the query planner usually did a bad job on function bodies because they'd appear opaque to it. In this case it seems to me that the body is opaque only if I use the like operator. If you run explain on a query that looks like select * from a_table where a_column like 'foo%' (and you have the appropriate index) you will see that postgres rewrites the where clause as a_column = 'foo' and a_column 'fop'. I think your problem is that the query is planned when the function is created, and at that time postgres doesn't know the value you are comparing against when you use the like operator, so postgres can't rewrite the query using = and . The problem doesn't happen for plain equality because postgres doesn't need to know anything about what you are comparing against in order to use equality. Somebody else can correct me if I'm wrong. Dave ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Planner doing seqscan before indexed join
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dan Harris After some digging, I've found that the planner is choosing to apply a necessary seq scan to the table. Unfortunately, it's scanning the whole table, when it seems that it could have joined it to a smaller table first and reduce the amount of rows it would have to scan dramatically ( 70 million to about 5,000 ). Joining will reduce the amount of rows to scan for the filter, but performing the join is non-trivial. If postgres is going to join two tables together without applying any filter first then it will have to do a seqscan of one of the tables, and if it chooses the table with 5000 rows, then it will have to do 5000 index scans on a table with 70 million records. I don't know which way would be faster. I wonder if you could find a way to use an index to do the text filter. Maybe tsearch2? I haven't used anything like that myself, maybe someone else has more input. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Nested Loop
-Original Message- From: [EMAIL PROTECTED] On Behalf Of Gauri Kanekar Subject: Re: [PERFORM] Nested Loop join_collapse_limit = 1 # JOINs Is there a reason you have this set to 1? Postgres can't consider multiple join orders when you do that. I would try setting that back to the default and seeing if this query is any faster. Other than that it looked like the problems with the query might be bad estimates of rows. One is that postgres expects there to be 1 matching row from rd when there are actually 30. You might try increasing the statistics targets on rd.sd and rd.sdt, reanalyzing, and seeing if that helps. Also postgres expects the join of rd and rm to return about 27205 rows when it actually returns 10 million. I'm not sure what you can do about that. Maybe if Postgres gets a better estimate for rd it would then estimate the join better. Dave ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Execution plan changed after upgrade from 7.3.9 to 8.2.3
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Subject: Re: [PERFORM] Execution plan changed after upgrade from 7.3.9 to 8.2.3 The following did not change anything in the execution plan ALTER TABLE lm05_t_tarif_panneau ALTER COLUMN lrg_min SET STATISTICS 1000 ALTER TABLE lm05_t_tarif_panneau ALTER COLUMN lrg_max SET STATISTICS 1000 ANALYZE lm05_t_tarif_panneau I was able to improve response time by creating indexes, but I would like to avoid changing the database structure because it is not maintained by ourseleves, but by the third party vendor. I would actually try increasing the statistics on table lm05_t_couleur_panneau columns ht_min, ht_max, cod_aspect, and cod_gamme_panneau. Because I think the planner is thrown off because the sequential scan on lm05_t_couleur_panneau returns 280 rows when it expects 1. Maybe to start you could just SET default_statistics_target=1000, analyze everything, and see if that makes any difference. Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Execution plan changed after upgrade from 7.3.9 to 8.2.3
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Subject: Re: [PERFORM] Execution plan changed after upgrade from 7.3.9 to 8.2.3 Increasing the default_statistics_target to 1000 did not help. It just make the vacuum full analyze to take longer to complete. Just FYI when you change statistics you only need to run ANALYZE, not VACUUM ANALYZE, and definetly not VACUUM FULL ANALYZE. I don't know what else to suggest for this query since you can't change the SQL. I would talk to the vendor and ask them to add indexes if you know that helps. Dave ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Bad Row Count Estimate on View with 8.2
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane I took another look and think I found the problem: 8.2's new code for flattening UNION ALL subqueries into append relations is failing to initialize all the fields of the appendrel, which confuses estimate_num_groups (and perhaps other places). I think this will fix it for you. I gave this a try on our test machine yesterday and it worked. The planner was estimating that the group by on the union would return about 300 rows which is very similar to what 8.1.2 thought. Actually it returned about 3000 rows, but still it is a good enough estimate to pick a plan which takes 100ms instead of a plan which takes 100 seconds. Thanks, Dave ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Bad Row Count Estimate on View with 8.2
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane In fact, since there isn't any parent relation in a UNION, I'm not sure that this patch actually changed your results ... but I'm not seeing what else would've ... Thanks for looking into it. I thought I might actually test if it was the patch you mentioned which changed my results, but I haven't had time. Because you mentioned it was grouping on the results of a UNION ALL which was throwing off the row estimate I changed my query from a UNION ALL/GROUP BY to a GROUP BY/FULL OUTER JOIN. The view runs a hair slower by itself, but the better estimate of rows makes it work much better for joining with. If anybody is curious, this is what I changed too: SELECT coalesce(pos_set.fund_id, trade_set.fund_id) as fund_id, coalesce(pos_set.owner_trader_id, trade_set.owner_trader_id) as owner_trader_id, coalesce(pos_set.strategy_id, trade_set.strategy_id) as strategy_id, coalesce(pos_set.cf_account_id, trade_set.cf_account_id) as cf_account_id, coalesce(pos_set.instrument_id, trade_set.instrument_id) as instrument_id, coalesce(pos_set.pos, 0) + coalesce(trade_set.pos, 0) as pos, coalesce(pos_set.cost, 0) + coalesce(trade_set.cost, 0) as cost FROM ( SELECT om_position.fund_id, om_position.owner_trader_id, om_position.strategy_id, om_position.cf_account_id, om_position.instrument_id, om_position.pos, om_position.cost FROM om_position WHERE om_position.as_of_date = date(now()) ) as pos_set full outer join ( SELECT om_trade.fund_id, om_trade.owner_trader_id, om_trade.strategy_id, om_trade.cf_account_id, om_trade.instrument_id, sum(om_trade.qty::numeric(22,9)) AS pos, sum(om_trade.cost) as cost FROM om_trade WHERE om_trade.process_state = 0 OR om_trade.process_state = 2 GROUP BY om_trade.fund_id, om_trade.owner_trader_id, om_trade.strategy_id, om_trade.cf_account_id, om_trade.instrument_id ) as trade_set ON pos_set.fund_id = trade_set.fund_id and pos_set.owner_trader_id = trade_set.owner_trader_id and pos_set.strategy_id = trade_set.strategy_id and pos_set.cf_account_id = trade_set.cf_account_id and pos_set.instrument_id = trade_set.instrument_id; ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Bad Row Count Estimate on View with 8.2
on om_trade_partial_process_state_index (cost=0.00..4.26 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1) Index Cond: (process_state = 2) Total runtime: 27.055 ms Thanks, Dave Dutcher Telluride Asset Management 952.653.6411 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Problem with grouping, uses Sort and GroupAggregate, HashAggregate is better(?)
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rolf Østvik (HA/EXA) Have you tried set enable_sort=off with 8.1.2? I'm not sure if that will change anything because it has to do at least one sort. Its just a lots faster to do a hashagg + small sort than one big sort in this case. (I wonder if there should be enable_groupagg?) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Large table performance
Have you run vacuum and analyze on the table? What version of Postgres are you running? What OS are you using? This looks like a straight forward query. With any database the first time you run the query its going to be slower because it actually has to read off disk. The second time its faster because some or all of the data/indexes will be cached. However 10 seconds sounds like a long time for pulling 10,000 records out of a table of 3 million. If you post an EXPLAIN ANALYZE, it might give us a clue. Dave -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Dobbrow Sent: Friday, January 12, 2007 6:31 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Large table performance Hello - I have a fairly large table (3 million records), and am fetching 10,000 non-contigous records doing a simple select on an indexed column ie select grades from large_table where teacher_id = X This is a test database, so the number of records is always 10,000 and i have 300 different teacher ids. The problem is, sometimes fetching un-cached records takes 0.5 secs and sometimes (more often) is takes more like 10.0 seconds (fetching the same records for a given teacher_id a second time takes about 0.25 secs) Has anyone seen similar behavior or know what the solution might be? any help much appreciated, Mark ps. My shared_buffers is set at 5000 (kernal max), and work_mem=8192
Re: [PERFORM] Slow Query on Postgres 8.2
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane [ fools around with it for awhile... ] I think this is already fixed for 8.2.1. Note the costs of the two related index scans: I installed 8.2.1 this morning and it works much better. The query that was taking 3411.429ms on 8.2.0 now takes 9.3ms. Thanks for your help. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Slow Query on Postgres 8.2
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane Um ... what indexes has this table got exactly? It's very unclear what alternatives the planner is being faced with. Here is the table definition. Thanks. Table public.om_transaction Column | Type |Modifiers -++- transaction_id | character varying(20) | not null default '0'::character varying type| character varying(20) | not null default ''::character varying fund_id | character varying(10) | not null default ''::character varying owner_trader_id | character varying(10) | not null default ''::character varying strategy_id | character varying(30) | not null default ''::character varying instrument_id | integer| default 0 cf_account_id | integer| not null default 0 as_of_date | date | not null default '0001-01-01'::date insert_date | date | not null default '0001-01-01'::date amount | numeric(22,9) | not null default 0.0 currency_id | integer| not null default 0 process_state | integer| not null comment | character varying(256) | default ''::character varying Indexes: om_transaction_pkey PRIMARY KEY, btree (transaction_id) cf_account_id_om_transaction_index btree (cf_account_id) currency_id_om_transaction_index btree (currency_id) fund_id_om_transaction_index btree (fund_id) instrument_id_om_transaction_index btree (instrument_id) om_transaction_om_transaction_index btree (as_of_date, fund_id, strategy_id, owner_trader_id, cf_account_id, instrument_id, type) om_transaction_partial_process_state_index btree (process_state) WHERE process_state = 0 owner_trader_id_om_transaction_index btree (owner_trader_id) strategy_id_om_transaction_index btree (strategy_id) Foreign-key constraints: $1 FOREIGN KEY (owner_trader_id) REFERENCES om_trader(trader_id) $2 FOREIGN KEY (fund_id) REFERENCES om_fund(fund_id) $3 FOREIGN KEY (strategy_id) REFERENCES om_strategy(strategy_id) $4 FOREIGN KEY (cf_account_id) REFERENCES om_cf_account(id) $5 FOREIGN KEY (instrument_id) REFERENCES om_instrument(id) $6 FOREIGN KEY (currency_id) REFERENCES om_instrument(id) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Slow Query on Postgres 8.2
The source data is a little different. The fast query was on our production 8.1 server, and the other was a test 8.2 server with day old data. The production server has like 3.84 million rows vs 3.83 million rows in test, so the statistics might be a little different, but I would figure the compairison is still valid. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adam Rich Sent: Thursday, January 04, 2007 7:19 PM To: 'Dave Dutcher'; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow Query on Postgres 8.2 Dave, Is it me or are the two examples you attached returning different row counts? That means either the source data is different, or your queries are. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Dutcher Sent: Thursday, January 04, 2007 5:32 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Slow Query on Postgres 8.2 Hello, I am looking at upgrading from 8.1.2 to 8.2.0, and I've found a query which runs a lot slower. Here is the query: select type, currency_id, instrument_id, sum(amount) as total_amount from om_transaction where strategy_id in ('BASKET1','BASKET2','BASKET3','BASKET4','BASKET5','BASKET6','BASKET7','BASK ET8','BASKET9','BASKET10','BASKET11') and owner_trader_id in ('dave','sam','bob','tad', 'tim','harry','frank','bart','lisa','homer','marge','maggie','apu','milhouse ','disco stu') and cf_account_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,29) and as_of_date '2006-12-04' and as_of_date = '2006-12-05' group by type, currency_id, instrument_id; I changed the values in the in statements to fake ones, but it still takes over three seconds on 8.2, where 8.1 only takes 26 milliseconds. When I increase the number of valules in the IN clauses, the query rapidly gets worse. I tried increasing my stats target to 1000 and analyzing, but that didn't help so I put that back to 10. While the query is running the CPU is at 100%. Is there a more efficient way to write a query like this? I've attached the output from EXPLAIN ANALYZE in a file because it is somewhat large. Thanks, Dave Dutcher Telluride Asset Management 952.653.6411
Re: [PERFORM] Regex performance issue
-Original Message- From: [EMAIL PROTECTED] On Behalf Of Alexandru Coseru asterisk= explain analyze SELECT * FROM destlist WHERE '0039051248787' ~ prefix AND id_ent='-2' AND dir=0 ORDER by length(prefix) DESC; QUERY PLAN -- -- Sort (cost=7925.07..7925.15 rows=31 width=67) (actual time=857.715..857.716 rows=2 loops=1) Sort Key: length((prefix)::text) - Bitmap Heap Scan on destlist (cost=60.16..7924.30 rows=31 width=67) (actual time=2.156..857.686 rows=2 loops=1) Recheck Cond: ((id_ent = -2) AND (dir = 0)) Filter: ('0039051248787'::text ~ (prefix)::text) - Bitmap Index Scan on destlist_indx2 (cost=0.00..60.16 rows=6193 width=0) (actual time=1.961..1.961 rows=5205 loops=1) Index Cond: ((id_ent = -2) AND (dir = 0)) Total runtime: 857.804 ms (8 rows) mmumu btree (prefix varchar_pattern_ops) I'm surpised Postgres isn't using the index on prefix seeing as the index uses the varchar_pattern_ops operator class. It could be that the index isn't selective enough, or is Postgres not able to use an index with Posix regular expressions? The docs seem to say that it can, but I'd be curious to see what happens if you use LIKE instead of ~. Dave ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query plan for heavy SELECT with lite sub-SELECTs
-Original Message- From: [EMAIL PROTECTED] Nikolay Samokhvalov What should I do to make Postgres work properly in such cases (I have a lot of similar queries; surely, they are executed w/o seqscans, but overall picture is the same - I see that starting from sub-selects dramatically decrease performance)? How about this: explain analyze select (select typname from pg_type where pg_type.oid=mainq.prorettype limit 1) from (select * from pg_proc offset 1500 limit 1) mainq; QUERY PLAN - Subquery Scan mainq (cost=50.99..56.85 rows=1 width=4) (actual time=13.646..13.659 rows=1 loops=1) - Limit (cost=50.99..51.02 rows=1 width=310) (actual time=13.575..13.579 rows=1 loops=1) - Seq Scan on pg_proc (cost=0.00..62.34 rows=1834 width=310) (actual time=0.014..7.297 rows=1501 loops=1) SubPlan - Limit (cost=0.00..5.82 rows=1 width=64) (actual time=0.038..0.043 rows=1 loops=1) - Index Scan using pg_type_oid_index on pg_type (cost=0.00..5.82 rows=1 width=64) (actual time=0.028..0.028 rows=1 loops=1) Index Cond: (oid = $0) Total runtime: 13.785 ms I would expect you to get closer to 2 ms on that query. My machine takes 13 ms to do just the seq scan of pg_proc. Dave ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Vacuum and Memory Loss
Hello friends, I am responsible for maintaining a high volume website using postgresql 8.1.4. Given the amount of reads and writes, I vacuum full the server a few times a week around 1, 2 AM shutting down the site for a few minutes. The next day morning around 10 - 11 AM the server slows down to death. It used to be that the error 'Too many clients' would be recorded, until I increased the number of clients it can handle, and now it simply slows down to death having lots and lots of postmaster processes running: If you are saying that running the vacuum full helps your performance, then you want to make sure you are running plain vacuum and analyze frequently enough. If you have a database which has lots of update and delete statements, and you do not run vacuum regularly enough, you can end up with lots dead blocks slowing down database scans. If you do lots of updates and deletes you should shedule vacuum and analyze more often, or you might want to look into running auto vacuum: http://www.postgresql.org/docs/8.1/interactive/maintenance.html#AUTOVACUUM If you aren't doing lots of updates and deletes, then maybe you just have a busy database. Lots of postmaster processes implies you have lots of clients connecting to your database. You can turn on stats_command_string and then check the pg_stat_activity table to see what these connections are doing. If they are running queries, you can try to optimize them. Try turning on logging of long running queries with log_min_duration_statement. Then use EXPLAIN ANALYZE to see why the query is slow and if anything can be done to speed it up. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL Caching
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adnan DURSUN i want to be can read an execution plan when i look at it. So, is there any doc about how it should be read ? You are asking how to read the output from EXPLAIN? This page is a good place to start: http://www.postgresql.org/docs/8.1/interactive/performance-tips.html ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] simple case using index on windows but not on linux
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of lc_collate is C, as are all the other lc settings. I have run the analyze commands. Still the same. That is strange. I figured it had to be related to the locale and the LIKE operator. I'm not an expert on these locale issues, but I'd be curious to see if it would start using an index if you added an index like this: CREATE INDEX test_index ON t_order (c_number varchar_pattern_ops); Dave ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performace Optimization for Dummies
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Carlo Stonebanks Subject: [PERFORM] Performace Optimization for Dummies At this early stage in the project, we are initializing our portal's database with millions of rows of imported data in over 50 different flattened tables; each table's structure is unique to the data provider. This requires a pretty complex import program, because the data must be matched semantically, not literally. Even with all of the expression matching and fuzzy logic in the code,our performance statistics show that the program spends over 75% of its time in SQL queries looking for matching and/or duplicate data. The import is slow - and degrades as the tables grow. So your program first transforms the data and then inserts it? And it is the transforming process which is running select statements that is slow? If that is the case you could use duration logging to find the slow select statement, and then you could post an EXPLAIN ANALYZE of the select. One question off the top of my head is are you using regular expressions for your fuzzy logic if so do your indexes have the right operator classes? (see http://www.postgresql.org/docs/8.1/static/indexes-opclass.html) Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Forcing the use of particular execution plans
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tim Truman Hi, I have the following query which has been running very slowly and after a lot of testing/trial and error I found an execution plan that ran the query in a fraction of the time (and then lost the statistics that produced it). What I wish to know is how to force the query to use the faster execution plan. It would be a bit easier to diagnose the problem if you posted EXPLAIN ANALYZE rather than just EXPLAIN. The two plans you posted looked very similar except for the order of the nested loop in subquery 1 and an index scan rather than a seq scan in subquery 2. My guess would be that the order of the nested loop is determined mostly by estimates of matching rows. If you ran an EXPLAIN ANALYZE you could tell if the planner is estimating correctly. If it is not, you could try increasing your statistics target and running ANALYZE. To make the planner prefer an index scan over a seq scan, I would first check the statistics again, and then you can try setting enable_seqscan to false (enable_seqscan is meant more for testing than production) or, you could try reducing random_page_cost, but you should test that against a range of queries before putting it in production. Dave ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] running benchmark test on a 50GB database
I would start by reading this web page: http://powerpostgresql.com/PerfList There are probably some other web pages out there with similar information, or you can check the mailing list archives for a lot of info. If those places don't help, then you should try to indentify what queries are slow, post an EXPLAIN ANALYZE of the slow queries along with the relvent schema info (i.e. table definitions and indexes). -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nuno Alves Sent: Wednesday, September 20, 2006 10:28 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] running benchmark test on a 50GB database Hi, I am running bechmark test in a 50 GB postgresql database. I have the postgresql.conf with all parameters by default. In this configuration the database is very, very slow. Could you please tell which is the best configuration? My system: Pentium D 3.0Ghz RAM: 1GB HD: 150GB SATA Thanks in advance, Nuno ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] High CPU Load
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jérôme BENOIS explain analyze select distinct INTEGER_VALUE,DATE_VALUE,EI_ID,VALUE_TYPE,FLOAT_VALUE,ID,TEXT_ VALUE,CATEGORY_ID,STRING_VALUE,CATEGORYATTR_ID,NAME from ((( select distinct ei_id as EIID from mpng2_ei_attribute as reqin1 where reqin1.CATEGORYATTR_ID = 0 AND reqin1.TEXT_VALUE ilike '' and ei_id in ( select distinct ei_id as EIID from mpng2_ei_attribute as reqin2 where reqin2.CATEGORYATTR_ID = 0 AND reqin2.TEXT_VALUE ilike '' and ei_id in ( select distinct ei_id as EIID from mpng2_ei_attribute as reqin3 where reqin3.NAME = '' AND reqin3.STRING_VALUE = '' ) ) ) ) ) as req0 join mpng2_ei_attribute on req0.eiid = mpng2_ei_attribute.ei_id order by ei_id asc; That is a lot of distinct's. Sorts are one thing that can really use up CPU. This query is doing lots of sorts, so its not surprising the CPU usage is high. On the subqueries you have a couple of cases where you say ... in (select distinct ...) I dont think the distinct clause is necessary in that case. I'm not a hundred percent sure, but you might want to try removing them and see if the query results are the same and maybe the query will execute faster. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [Fwd: Re: [PERFORM] Performance problem with Sarge compared
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Piñeiro TIP 6: explain analyze is your friend Well, yes, it is a friend, but as the select at postgre Sarge version never finished I can't use a explain analyze. I show you the explain, with the hope that someone has any idea, but i think that this is almost indecipherable (if you want the Woody ones i can post the explain analyze). Thanks in advance. Does the machine run out of disk space every time? Is it possible to try the query on a different machine with more hard drive room? An explain analyze of the slow plan will be much more helpful than an explain, even if its from a different machine. If its generating a large temp file, it is another sign that the query is doing some kind of large cross product. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance problem with Sarge compared with Woody
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Piñeiro Subject: [PERFORM] Performance problem with Sarge compared with Woody a week ago we migrate a Woody(postgre 7.2.1) server to Sarge(postgre 7.4.7). To migrate the database we use a dump, using pg_dump with this options: pg_dump -U username -c -F p -O -v -f filename DBname We have a search, that using woody take about 1-2 minutes, but with sarge it is executing about 2 hours, and at least it crashes, with a message about a temporal file and no more disk space ( i have more than a GB of free disk space). Any idea ? The first question is did you run ANALYZE on the new database after importing your data? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [PATCHES] Template0 age is increasing speedily.
Title: Message I would expect that the age of Template0 is increasing at the same rate as every other database in your cluster. Transaction IDs are global across all databases in the cluster, so as I understand it, executing a transaction in any database will increase the age of all databases by 1. -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nimesh SatamSent: Thursday, September 07, 2006 5:49 AMTo: Chris MairCc: pgsql-performance@postgresql.orgSubject: Re: [PERFORM] [PATCHES] Template0 age is increasing speedily. Hi, Postgres Version used is 8.1.3 OS: Linux 'SELECT datname, age(datfrozenxid) FROM pg_database' postgres | 1575xyz | 1073743934template1 | 1632template0 | 61540256 This is the command which I tried and got the above output, and the number is increasing pretty fast for template0. Please let me know if this a problem. Regards, Nimesh. On 9/7/06, Chris Mair [EMAIL PROTECTED] wrote: On Thu, 2006-09-07 at 16:01 +0530, Nimesh Satam wrote: I noticed that the age oftemplate0 is increasing very rapidly..Can you please let me know how we can control this and what causes such problems. We also noticed that the database slow downs heavily at a particular time..Can you suggest any tools which will help in diagnosing the root cause behiond the data load.Hi,first of all: there is no need to cross post on 4 lists.If you have a performance problem, post on pgsql-performance.Second, please tell us which version of PostgreSQL on which operating system you're using. Diagnosing yourproblem might depend on which OS you use...Finally, explain what you mean by "the age of template0 isincreasing very rapidly", you mean "the size is increasing"? Bye,Chris.--Chris Mairhttp://www.1006.org
Re: [PERFORM] PostgreSQL performance issues
Title: Message That's an interesting situation. Your CPU's are pegged, and you're hardly doing any IO. I wonder if there is some ineficient query, or if its just very high query volume. Maybe you could try setting log_min_duration_statement to try to track down the slowest of the queries. Then post the slow queries with an explain analyze to the list. Here is some info on setting up logging: http://www.postgresql.org/docs/8.1/interactive/runtime-config-logging.html Are your queries standard SQL or do you call functions you wrote in PL/pgSQl or PL/Python or anything?
Re: [PERFORM] Query tuning
It seems to me that what would work best is an index scan backward on the eventtime index. I don't see why that wouldn't work for you, maybe the planner is just esitmating the seq scan and sort is faster for some reason. What does EXPLAIN say if you use a small limit and offset like 10? Or what does EXPLAIN say if you first run set enable_seqscan=false; (If you get the same plan, then I wouldn't bother running EXPLAIN ANALYZE, but if you get a different plan I would run EXPLAIN ANALYZE to see if the new plan is any faster.) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Subbiah, Stalin Sent: Wednesday, August 23, 2006 1:03 PM To: Chris Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query tuning I get the same plan after running vacuum analyze. Nope, I don't have index on objdomainid, objid and userdomainid. Only eventime has it. -Original Message- From: Chris [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 22, 2006 8:06 PM To: Subbiah, Stalin Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query tuning Subbiah, Stalin wrote: Actually these servers will be upgraded to 8.1.4 in couple of months. even so, you could get some bad data in there. http://www.postgresql.org/docs/8.0/static/release.html . Go through the old release notes and you'll find various race conditions, crashes etc. Here you go with explain analyze. # explain analyze SELECT * FROM EVENTLOG WHERE EVENTTIME'07/23/06 16:00:00' AND EVENTTIME'08/22/06 16:00:00' AND (OBJDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA' OR OBJID='tzRh39d0d91luNGT1weIUjLvFIcA' OR USERDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA') ORDER BY EVENTTIME DESC, SEQUENCENUM DESC LIMIT 500 OFFSET 500; QUERY PLAN -- -- -- -- -- -- -- -- - Limit (cost=15583110.14..15583111.39 rows=500 width=327) (actual time=427771.568..427772.904 rows=500 loops=1) - Sort (cost=15583108.89..15618188.88 rows=14031998 width=327) (actual time=427770.504..427771.894 rows=1000 loops=1) Sort Key: eventtime, sequencenum - Seq Scan on eventlog (cost=0.00..2334535.17 rows=14031998 width=327) (actual time=10.370..190038.764 rows=7699388 loops=1) Filter: ((eventtime '2006-07-23 16:00:00'::timestamp without time zone) AND (eventtime '2006-08-22 16:00:00'::timestamp without time zone) AND (((objdomainid)::text = 'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((objid)::text = 'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((userdomainid)::text = 'tzRh39d0d91luNGT1weIUjLvFIcA'::text))) Total runtime: 437884.134 ms (6 rows) If you analyze the table then run this again what plan does it come back with? I can't read explain output properly but I suspect (and I'm sure I'll be corrected if need be) that the sort step is way out of whack and so is the seq scan because the stats aren't up to date enough. Do you have an index on objdomainid, objid and userdomainid (one index per field) ? I wonder if that will help much. -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] VACUUM FULL needed sometimes to prevent transaction ID wraparound?
I would guess that you are not running vacuumdb as a user with permission to vacuum the postgres or template1 databases. Try telling vacuumdb to log in as postgres or whatever your superuser account is called. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marinos Yannikos Sent: Tuesday, August 22, 2006 1:11 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] VACUUM FULL needed sometimes to prevent transaction ID wraparound? Hello, we're looking into the reason why we are getting warnings about transaction ID wraparound despite a daily vaccumdb -qaz. Someone is claiming that VACUUM without FULL cannot reassign XIDs properly when max_fsm_pages was set too low (it says so here too, but this is rather old: http://www.varlena.com/GeneralBits/Tidbits/perf.html#maxfsmp). Is this true, or do we have a different issue here? We're using 8.1.3 with a database generated on 8.1.3 (i.e. not migrated from 7.x or anything like that). Thanks, Marinos ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Big diference in response time (query plan question)
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Luiz K. Matsumura Well, in this case the queries with LEFT OUTER join and with inner join returns the same result set. I don´t have the sufficient knowledge to affirm , but I suspect that if the query plan used for fk_clifor = 352 and with left outer join is applied for the first query (fk_clifor = 243 with left outer join) we will have a better total runtime. There are some manner to make this test ? It looks like Postgres used a nested loop join for the fast query and a merge join for the slow query. I don't think the left join is causing any problems. On the slower query the cost estimate of the nested loop must have been higher than the cost estimate of the merge join because of more rows. You could try disabling merge joins with the command set enable_mergejoin=false. Then run the explain analyze again to see if it is faster. If it is faster without merge join, then you could try to change your settings to make the planner prefer the nested loop. I'm not sure what the best way to do that is. Maybe you could try reducing the random_page_cost, which should make index scans cheaper. Dave ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Big diference in response time (query plan question)
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Luiz K. Matsumura Where I can see the current random_page_cost value ? There are some hint about what value I must set ? Thanks in advance. Luiz On Linux the random_page_cost is set in the postgresql.conf file. You can see what it is set to by typing show random_page_cost. This page has some guidelines on random_page_cost and other server settings: http://www.powerpostgresql.com/PerfList/ As it says on the page, make sure you test a variety of queries. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Speeding up query, Joining 55mil and 43mil records.
Title: Message Could you post an explain analyze of the query? Just FYI, if you do an explain analyze of the insert statement, it will actually do the insert. If you don't want that just post an explain analyze of the select part. To me it would be interesting to compare just the select parts of the query between Postgres and MSSQL. That way you would know if your Postgres install is slower at the query or slower at the insert. -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of nickySent: Wednesday, June 21, 2006 8:47 AMTo: pgsql-performance@postgresql.orgSubject: [PERFORM] Speeding up query, Joining 55mil and 43mil records. Hello People, I'm trying to solve a 'what i feel is a' performance/configuration/query error on my side. I'm fairly new to configuring PostgreSQL so, i might be completely wrong with my configuration. My database consists of 44 tables, about 20GB. Two of those tables are 'big/huge'. Table src.src_faktuur_verricht contains 43million records (9GB) and table src.src_faktuur_verrsec contains 55million records (6GB). Below is the 'slow' query. INSERT INTO rpt.rpt_verrichting(verrichting_id,verrichting_secid,fout_status,patientnr,verrichtingsdatum,locatie_code,afdeling_code,uitvoerder_code,aanvrager_code,verrichting_code,dbcnr,aantal_uitgevoerd,kostenplaats_code,vc_patientnr,vc_verrichting_code,vc_dbcnr)SELECT t1.id, t0.secid, t1.status, t1.patientnr, t1.datum, t1.locatie, t1.afdeling, t1.uitvoerder, t1.aanvrager, t0.code, t1.casenr, t0.aantal, t0.kostplaats, null, null, nullFROM src.src_faktuur_verrsec t0 JOIN src.src_faktuur_verricht t1 ON t0.id = t1.idWHERE substr(t0.code,1,2) not in ('14','15','16','17')AND (substr(t0.correctie,4,1) '1' OR t0.correctie is null)AND EXTRACT(YEAR from t1.datum) 2004;Output from explainHash Join (cost=1328360.12..6167462.76 rows=7197568 width=118) Hash Cond: (("outer".id)::text = ("inner".id)::text) - Seq Scan on src_faktuur_verrsec t0 (cost=0.00..2773789.90 rows=40902852 width=52) Filter: ((substr((code)::text, 1, 2) '14'::text) AND (substr((code)::text, 1, 2) '15'::text) AND (substr((code)::text, 1, 2) '16'::text) AND (substr((code)::text, 1, 2) '17'::text) AND ((substr((correctie)::text, 4, 1) '1'::text) OR (correctie IS NULL))) - Hash (cost=1188102.97..1188102.97 rows=8942863 width=80) - Bitmap Heap Scan on src_faktuur_verricht t1 (cost=62392.02..1188102.97 rows=8942863 width=80) Recheck Cond: (date_part('year'::text, datum) 2004::double precision) - Bitmap Index Scan on src_faktuur_verricht_idx1 (cost=0.00..62392.02 rows=8942863 width=0) Index Cond: (date_part('year'::text, datum) 2004::double precision)The db server runs PostgreSQL 8.1.4 on FreeBSD 6.1-Stable. 2GB of RAM. It contains two SATA150 disks, one contains PostgreSQL and the rest of the operating system and the other disk holds the pg_xlog directory.Changed lines from my postgresql.conf fileshared_buffers = 8192temp_buffers = 4096work_mem = 65536maintenance_work_mem = 1048576max_fsm_pages = 4fsync = offwal_buffers = 64effective_cache_size = 174848The query above takes around 42 minutes. However, i also have a wimpy desktop machine with 1gb ram. Windows with MSSQL 2000 (default installation), same database structure, same indexes, same query, etc and it takes 17 minutes. The big difference makes me think that i've made an error with my PostgreSQL configuration. I just can't seem to figure it out. Could someone perhaps give me some pointers, advice?Thanks in advance. Nicky
Re: [PERFORM] OT - select + must have from - sql standard syntax?
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Greg Stark Sent: Tuesday, June 13, 2006 11:16 PM Subject: Re: [PERFORM] OT - select + must have from - sql standard syntax? [SNIP] Well you could always create a dual, it was always just a regular table. We used to joke about what would happen to Oracle if you inserted an extra row in it... I've never used Oracle, so I don't understand why its called dual when it only has one row? Shouldn't it be called single? :\ Dave ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Posrgres speed problem
Do you run analyze on the production server regularly? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ruben Rubio Rey Sent: Monday, June 12, 2006 9:39 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Posrgres speed problem Hi, Im having a problem with postgres 8.1.3 on a Fedora Core 3 (kernel 2.6.9-1.667smp) I have two similar servers, one in production and another for testing purposes. Databases are equal (with a difference of some hours) In the testing server, an sql sentence takes arround 1 sec. In production server (low server load) takes arround 50 secs, and uses too much resources. Explain analyze takes too much load, i had to cancel it! Could it be a it a bug? Any ideas? Thanks in advance ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Why date index is not used
Actually It looks to me like the sorting is the slow part of this query. Maybe if you did create an index on both kuupaev and kellaaeg it might make the sorting faster. Or maybe you could try increasing the server's work mem. The sort will be much slower if the server can't do the whole thing in ram. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tomas Vondra Sent: Thursday, June 08, 2006 2:20 PM To: Andrus Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Why date index is not used More precisely - the Postgres could use the index to speed up the sorting, but in this case the sorting is very fast (less than one second according to the output), so Postgres probably decided not to use the index because it would be slower. Btw. have you run ANALYZE on the table recently? What is the number of distinct values in the 'kuupaev' column? Tomas Why Postgres 8.1 does not use makse_kuupaev_idx index in the following query ? How to speed this query up ? explain analyze select * from makse order by kuupaev desc, kellaaeg desc limit 100 Limit (cost=62907.94..62908.19 rows=100 width=876) (actual time=33699.551..33701.001 rows=100 loops=1) - Sort (cost=62907.94..63040.49 rows=53022 width=876) (actual time=33699.534..33700.129 rows=100 loops=1) Sort Key: kuupaev, kellaaeg - Seq Scan on makse (cost=0.00..2717.22 rows=53022 width=876) (actual time=0.020..308.502 rows=53028 loops=1) Total runtime: 37857.177 ms CREATE TABLE makse( kuupaev date, kellaaeg char(6) NOT NULL DEFAULT ''::bpchar, guid char(36) NOT NULL, CONSTRAINT makse_pkey PRIMARY KEY (guid) ) CREATE INDEX makse_kuupaev_idx ON makse USING btree (kuupaev); Andrus. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?
What I do when I'm feeling lazy is execute a delete statement and then an insert. I only do it when I'm inserting/updating a very small number of rows, so I've never worried if its optimal for performance. Besides I've heard that an update in postgres is similar in performance to a delete/insert. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of PFC Sent: Tuesday, May 30, 2006 5:35 PM To: Jonah H. Harris; Waldomiro Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT? PostgreSQL does not support MERGE at the moment, sorry. Issue an UPDATE, and watch the rowcount ; if the rowcount is 0, issue an INSERT. Be prepared to retry if another transaction has inserted the row meanwhile, though. MERGE would be really useful. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Getting even more insert performance (250m+rows/day)
If you can live with possible database corruption, you could try turning Fsync off. For example if you could just reinsert the data on the off chance a hardware failure corrupts the database, you might get a decent improvement. Also have you tried creating the index after you have inserted all your data? (Or maybe copy already disables the indexes while inserting?) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniel J. Luke Sent: Wednesday, May 24, 2006 2:45 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] Getting even more insert performance (250m+rows/day) I have a system that currently inserts ~ 250 million rows per day (I have about 10k more raw data than that, but I'm at the limit of my ability to get useful insert performance out of postgres). Things I've already done that have made a big difference: - modified postgresql.conf shared_buffers value - converted to COPY from individual insert statements - changed BLCKSZ to 32768 I currently get ~35k/sec inserts on a table with one index (~70k/sec inserts if I don't have any indexes). The indexed field is basically a time_t (seconds since the epoch), autovacuum is running (or postgres would stop choosing to use the index). The other fields have relatively lower cardinality. Each days worth of data gets inserted into its own table so that I can expire the data without too much effort (since drop table is much faster than running a delete and then vacuum). I would really like to be able to have 1 (or 2) more indexes on the table since it takes a while for a sequential scan of 250million rows to complete, but CPU time goes way up. In fact, it looks like I'm not currently IO bound, but CPU-bound. I think some sort of lazy-index generation (especially if it could be parallelized to use the other processors/cores that currently sit mostly idle) would be a solution. Is anyone working on something like this? Any other ideas? Where should I look if I want to start to think about creating a new index that would work this way (or am I just crazy)? Thanks for any insight! -- Daniel J. Luke ++ | * [EMAIL PROTECTED] * | | *-- http://www.geeklair.net -* | ++ | Opinions expressed are mine and do not necessarily | | reflect the opinions of my employer. | ++ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Question about explain-command...
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Clemens Eisserer Sent: Wednesday, May 10, 2006 6:50 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Question about explain-command... What does the hash-lines mean, does that mean my query does not use the indices at all? Why are some table-names and some column-names surrounded by ' '? Are they threated as text-columns? I have to admit that the tables are just filled with test-data so the analyzer may take just a very simple way since almost no data is in... For small tables, it is faster to do a sequential scan than an index scan. You probably don't have enough test data to make the planner choose an index scan. I don't think the quotes really mean anything. They are just used as delimiters. The hash lines mean your tables are being joined by hash joins. You should read this page for more info: http://www.postgresql.org/docs/8.1/interactive/performance-tips.html ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] in memory views
Title: Message Are you using the Postgres JDBC driver? Or are you using an ODBC JDBC driver? The Postgres specific driver is usually faster. -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thomas VatterSent: Wednesday, May 10, 2006 3:54 PMTo: Scott MarloweCc: Tino Wildenhain; pgsql-performance@postgresql.orgSubject: Re: [PERFORM] in memory viewsScott Marlowe wrote: On Wed, 2006-05-10 at 10:41, Thomas Vatter wrote: Scott Marlowe wrote: What happens if you do this by declaring it as a cursor and then fetching the first row? I do executeQuery(), for the resultSet I do next() and return one row, but wait, I have to review the logic in this area, I can tell you tomorrow A good short test is to run explain analyze on the query from the psql command line. If it shows an execution time of significantly less than what you get from you application, then it is likely that the real problem is that your application is receiving the whole result set via libpq and waiting for that. A cursor will solve that problem. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org Yes, the difference between psql command line and application is 6 seconds to 40 seconds. It isexactly the step resultSet = excecuteQuery() that needs 40 seconds. I use next() as a cursorthrough the resultSet, but I fear this is not enough, do I have to use createStatement(resultSetType, resultSetConcurrency) respectively prepareStatement (resultSetType, resultSetConcurrency) toachieve the cursor behaviour?regardstom
Re: [PERFORM] Memory and/or cache issues?
Title: Message For a standard config most of the memory used by Postgres is the shared buffers. The shared buffers are a cache to store blocks read from the disk, so if you do a query, Postgres will allocate and fill the shared buffers up to the max amount you set in your postgresql.conf file. Postgres doesn't release that memorybetween queries because the point is to be able to pull data from ram instead of the disk on the next query. Areyou sure your settings in postgresql.conf are standard?What are your settings for shared_buffers and work_mem? -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of mcelroy, timSent: Friday, May 05, 2006 8:58 AMTo: 'Tom Lane'Cc: pgsql-performance@postgresql.orgSubject: Re: [PERFORM] Memory and/or cache issues? Are you saying the kernel's disc cache may be getting whacked? No, I understand that PG should use as much memory as it can and the system as well. The main problem here is that with almost all the 8GB of RAM 'in use' when I try to do a pg_dump or vacuumdb I run out of memory and the system crashes I well understand that unused memory is not a good thing, just that when you have none and can't do the maint workbad stuff happens. For example, I just created a benchdb on my DEV box with 1,000,000 tuples. As this ran the mem in use jumped up 1G and it hasn't gone down? Once the PG process has finished its task shouldn't it release the memory it used? Thanks, Tim -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED]] Sent: Friday, May 05, 2006 9:44 AM To: mcelroy, tim Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Memory and/or cache issues? "mcelroy, tim" [EMAIL PROTECTED] writes: I see this on all the postgres installations, no matter what I set the postgresql.conf settings to regarding memory allocation, once postgres starts up 95% of the memory on the box is used. Is there a way within Linux to 'see' what or who is actually using this memory? Probably kernel disk cache. Are you under the misimpression that unused memory is a good thing? If a Unix-ish system *isn't* showing near zero free memory under load, the kernel is wasting valuable resources. regards, tom lane
Re: [PERFORM] Lot'sa joins - performance tip-up, please?
- Nested Loop (cost=0.00..176144.30 rows=57925 width=26) (actual time=68.322..529472.026 rows=57925 loops=1) - Seq Scan on ticketing_codes_played (cost=0.00..863.25 rows=57925 width=8) (actual time=0.042..473.881 rows=57925 loops=1) - Index Scan using ticketing_codes_pk on ticketing_codes (cost=0.00..3.01 rows=1 width=18) (actual time=9.102..9.108 rows=1 loops=57925) Index Cond: (ticketing_codes.code_id = outer.code_id) Total runtime: 542000.093 ms (27 rows) I'll be more than happy to provide any additional information that I may be able to gather. I'd be most happy if someone would scream something like four joins, smells like a poor design because design is poor, but the system is in production, and I have to bare with it. It looks like that nested loop which is joining ticketing_codes_played to ticketing_codes is the slow part. I'm curious how many rows are in the ticketing_codes table? Four or five joins does not seem like a lot to me, but it can be slow if you are joining big tables with other big tables. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] query performance question
Title: Message You are pulling a fair amount of data from the database and doing a lot of computation in the SQL. I'm not sure how fast this query could be expected to run, but I had one idea. If you've inserted and deleted a lot into this table, you will need to run vacuum ocasionally. If you haven't been doing that, I would try a VACUUM FULL ANALYZE on the table. (That will take a lock on the table and prevent clients from reading data while it is running.) -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of gulsahSent: Friday, April 28, 2006 6:31 AMTo: pgsql-performance@postgresql.orgSubject: [PERFORM] query performance questionHi,I have a performance problem with Postgresql version 8.1 installed on a Fedora Core release 4 (Stentz) with kernel version 2.6.11.The machine I am working on has 512MB of RAM and Pentium III 800 MHz CPU.I have only one table in the database which consists of 256 columns and 1 rows. Each column is of float type and each row corresponds to a vector in my application. What I want to do is to compute the distance between a predefined vector in hand and the ones in the database.The computation proceeds according to the following pseudocode: for(i=1; i=256 ; i++){ distance += abs(x1_i - x2_i); }where x1_i denotes the vector in hand's i coordinate and x2_i denotes the icoordinate of the vector in the database.The distance computation have to be done for all the vectors in the databaseby means of a query and the result set should be sorted in terms of thecomputed distances.When I implement the query and measure the time spent for it in an applicationI see that the query is handled in more than 8 seconds which is undesirable inmy application.Here what I want to ask you all is that, is it a normal performance for acomputer with the properties that I have mentioned above? Is there any solutionin your mind to increase the performance of my query?To make it more undestandable, I should give the query for vectors with size3, but in my case their size is 256.selectid as vectorid,abs(40.9546-x2_1)+abs(-72.9964-x2_2)+abs(53.5348-x2_3) as distancefrom vectordborder by distanceThank you all for your help.-gulsah Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls. Great rates starting at 1¢/min.
Re: [PERFORM] planner not using index for like operator
Title: Message If you are using a locale other than the C locale, you need to create the index with an operator class to get index scans with like. See here for details: http://www.postgresql.org/docs/8.1/interactive/indexes-opclass.html -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sriram DandapaniSent: Tuesday, April 25, 2006 12:08 PMTo: Pgsql-Performance (E-mail)Subject: [PERFORM] planner not using index for like operator For the query Select col1 from table1 Where col1 like 172.% The table has 133 million unique ip addresses. Col1 is indexed. The optimizer is using a sequential scan This is the explain analyze output "Seq Scan on table1 (cost=0.00..2529284.80 rows=1 width=15) (actual time=307591.339..565251.775 rows=524288 loops=1)" " Filter: ((col1)::text ~~ '172.%'::text)" "Total runtime: 565501.873 ms" The number of affected rows (500K) is a small fraction of the total row count.
Re: [PERFORM] Takes too long to fetch the data from database
Ive never used a cursor in Postgres, but I dont think it will help you a lot. In theory cursors make it easier to do paging, but your main problem is that getting the first page is slow. A cursor isnt going to be any faster at getting the first page than OFFSET/LIMIT is. Did you try Brunos suggestion of: SELECT * FROM wan ORDER BY stime DESC OFFSET 0 LIMIT 50; You should run an EXPLAIN ANALYZE on that query to see if it is using an index scan. Also what version of Postgres are you using? You can run select version(); to check. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of soni de Sent: Thursday, April 20, 2006 11:42 PM To: Merlin Moncure Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Takes too long to fetch the data from database I don't want to query exactly 81900 rows into set. I just want to fetch 50 or 100 rows at a time in a decreasing order of stime.(i.e 50 or 100 rows starting from last to end). if we fetched sequentially, there is also problem in fetching all the records (select * from wanwhere kname='pluto' order by stime) it is taking more than 4~5 minutes. tried it on same table having more than 326054 records. On 4/20/06, Merlin Moncure [EMAIL PROTECTED] wrote: SELECT * FROM wan ORDER BY stime LIMIT 50 OFFSET 81900; you need to try and solve the problem without using 'offset'.you could do: BEGIN; DECLARE crs cursor FOR SELECT * FROM wan ORDER BY stime; FETCH ABSOLUTE 81900 in crs; FETCH 49 in crs; CLOSE crs; COMMIT; this may be a bit faster but will not solve the fundamental problem. the more interesting question is why you want to query exactly 81900 rows into a set.This type of thinking will always get you into trouble, absolute positioning will not really work in a true sql sense.if you are browsing a table sequentially, there are much better methods. merlin
Re: [PERFORM] Little use of CPU ( 5%)
Maybe you could post the query and an EXPLAIN ANALYZE of the query. That would give more information for trying to decide what is wrong. So your question is basically why you get a slower read rate on this query than on other queries? If I had to guess, maybe it could be that you are scanning an index with a low correlation (The order of the records in the index is very different then the order of the records on the disk.) causing your drives to do a lot of seeking. A possible fix for this might be to cluster the table on the index, but I would check out the explain analyze first to see which step is really the slow one. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of luchot Sent: Friday, April 21, 2006 4:33 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Little use of CPU ( 5%) Hello , I have a problem of performance with a query. I use PostgreSQL 8.1.3. The distribution of Linux is Red Hat Enterprise Linux ES release 4 (Nahant Update 2) and the server is a bi-processor Xeon 2.4GHz with 1 Go of Ram and the size of the database files is about 60 Go. The problem is that this query uses only a few percentage of the cpu as seen with the top command : PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3342 postgres 18 0 140m 134m 132m D 5.9 13.3 17:04.06 postmaster The vm stat command : procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 184 16804 38104 933516 0 0 3092 55 667 145 12 4 71 14 0 1 184 16528 38140 933480 0 0 2236 0 1206 388 2 1 50 47 0 1 184 15008 38188 935252 0 0 2688 92 1209 396 2 0 49 48 The config of PostgresQL is : shared_buffers = 16384 (128Mo) work_mem = 65536 (64 Mo) maintenance_work_mem = 98304 (96 Mo) effective_cache_size = 84000 I think that the problem is there are too much %wait that are waiting cause of the really bad rate of lecture (bi) which is only 3 Mo/s . It is this value I do not understand because whit other queries this rate is about 120 Mo/s. I use SCSI DISK and a RAID 0 hardware system . This is the query plan of the query : QUERY PLAN Aggregate (cost=24582205.20..24582205.22 rows=1 width=13) - Nested Loop (cost=2.11..24582054.88 rows=60129 width=13) Join Filter: (inner.l_quantity (subplan)) - Seq Scan on part (cost=0.00..238744.00 rows=6013 width=4) Filter: ((p_brand = 'Brand#51'::bpchar) AND (p_container = 'MED JAR'::bpchar)) - Bitmap Heap Scan on lineitem (cost=2.11..126.18 rows=31 width=27) Recheck Cond: (outer.p_partkey = lineitem.l_partkey) - Bitmap Index Scan on id_partkey_lineitem (cost=0.00..2.11 rows=31 width=0) Index Cond: (outer.p_partkey = lineitem.l_partkey) SubPlan - Aggregate (cost=126.50..126.51 rows=1 width=10) - Index Scan using id_partkey_lineitem on lineitem (cost=0.00..126.42 rows=31 width=10) Index Cond: (l_partkey = $0) (13 rows) The number of tuples in Lineitem is 180 000 000. So my question is what I have to do to increase the rate of the read which improve the execution of the query? I add that the server is only dedicated for PostgreSQL. Regards,
Re: [PERFORM] Problem with LIKE-Performance
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Tarabas (Manuel Rorarius) Subject: [PERFORM] Problem with LIKE-Performance Hi! I am having trouble with like statements on one of my tables. It looks like you are getting a sequential scan instead of an index scan. What is your locale setting? As far as I know Postgres doesn't support using indexes with LIKE unless you are using the C locale. Also, in the future you only need to post EXPLAIN ANALYZE not EXPLAIN ANALYZE VERBOSE. Dave ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Query planner is using wrong index.
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Brian Herlihy Sent: Thursday, April 06, 2006 6:56 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query planner is using wrong index. [Snip] I am really surprised that I have to go through such contortions just to use the primary key! This area of Postgres needs improvement. Of course you mentioned that you are using 7.4.7. You might want to try upgrading to 8.1.3. There have been a lot of improvements to the performance since 7.4. I don't know if your specific problem was fixed, but it's worth a try. Also you might want to at least upgrade to 7.4.12 for the bug fixes. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [Solved] Slow performance on Windows .NET and OleDb
I use Npgsql, and the connection string I use is real simple: Server=192.168.0.36;Database=mydb;User Id=myuserid;Password=123456 Hope that helps, Dave -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Greg Quinn Sent: Wednesday, March 29, 2006 11:57 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] [Solved] Slow performance on Windows .NET and OleDb This problem was caused by the OleDb driver. I used a 3rd party .NET provider and it worked, 8000 rows in just over 100ms! Can somebody send me a sample connection string for the PostGreSql native .net driver please? I'm battling to find a valid connection string. Thanks ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] simple join uses indexes, very slow
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of george young Sent: Monday, March 27, 2006 12:48 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] simple join uses indexes, very slow [Snip] Indexes: parameters_idx btree (run, opset_num, step_num, opset, opset_ver, step, step_ver, name, split, wafers) parameters_opset_idx btree (opset, step, name) parameters_step_idx btree (step, name) Have you tried creating some different indexes on parameters? I don't know if it should matter or not, but I would try some indexes like: (run, opset_num) //Without all the other columns (opset_num, run) //Backwards (opset_num) I don't really know Postgres internals all that well. It just seems to me that parameters_idx has a lot of columns this query is not interested in. I'd just be curious to see what happens. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] simple join uses indexes, very slow
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Steinar H. Gunderson Sent: Tuesday, March 28, 2006 10:29 AM An index on (A,B,C) can be used for a query on (A,B) or (A), so it doesn't really matter. It isn't usable for a query on (B), (C) or (B,C), though. (The index rows will get bigger, of course, so you'll need more I/O if you want to scan large parts of it, but I guess that's beside the point.) I guess what I am really curious about is why was the OP getting an expensive sort when the planner tried a merge join? Most of the time was spent sorting the parameters parameters table by opset_num even though opset_num is indexed. Isn't Postgres able to walk the index instead of sorting? I was wondering if maybe Postgres wasn't recognizing that it could just walk the index because the opset_num column isn't the first in the index. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] simple join uses indexes, very slow
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Steinar H. Gunderson A merge join requires sorted inputs. Most of the time was spent sorting the parameters parameters table by opset_num even though opset_num is indexed. Isn't Postgres able to walk the index instead of sorting? The time of an index scan vs. a sequential scan + sort depends on several factors, so it's not just a matter of walking the index whenever there is one. I was just looking this over again and I realized I misread the query plan. The slowest step was the Bitmap Heap Scan not the sort. (The sort was relatively fast.) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Problem with query, server totally unresponsive
From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Jim C. Nasby Subject: Re: [PERFORM] Problem with query, server totally unresponsive On Thu, Mar 23, 2006 at 01:12:08PM +0100, Bendik Rognlien Johansen wrote: Hello, I have a big problem with one of my databases. When i run my query, after a few minutes, the postmaster shows 99% mem i top, and the server becomes totally unresponsive. You've got a bunch of sorts going on; could you be pushing the machine into swapping? I get this message when I try to cancel the query: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Did you send a kill of some kind to the backend? The machine has 2x Intel dual core processors (3GHz) and 2 Gigs of ram. Unless I missed some big news recently, no such CPU exists. Hyperthreading is absolutely not the same as dual core, and many people have found that it's best to disable hyperthreading on database servers. Maybe I'm confused by the marketing, but I think those CPUs do exist. According to New Egg the Pentium D 830 and the Pentium D 930 both are dual core Pentiums that run at 3Ghz. It also specifically says these processors don't support hyper threading, so I believe they really have two cores. Maybe you are thinking he was talking about a 3Ghz Core Duo. http://www.newegg.com/Product/ProductList.asp?Category=34N=200034+5 0001157+1302820275+1051007392Submit=ENE Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] import performance
[Snip] shared_buffers = 256 Make this higher too. If this is a dedicated machine with 512 MB of ram, set it to something like 125000. You may need to adjust shared memory settings for your operating system. See the manual for details. Whoa. Maybe I'm wrong, but isn't each buffer 8192 bytes? So you are suggesting that he set his shared buffers to a gigabyte on a machine with 512 MB of ram? Or was that just a miscalculation? Dave ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Help understanding indexes, explain, and optimizing
Actually I think LEFT OUTER JOIN is equivalent to LEFT JOIN. The Postgres manual says that the word OUTER is optional. Either way you get ...all rows in the qualified Cartesian product (i.e., all combined rows that pass its join condition), plus one copy of each row in the left-hand table for which there was no right-hand row that passed the join condition. It sounds like the original posters problem was a less than optimal join order, and from what I understand Postgres can't reorder left joins. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Sent: Monday, March 06, 2006 6:40 PM To: i.v.r. Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Help understanding indexes, explain, and optimizing i.v.r. wrote: Hi everyone, [Snip] So I'm wondering what I'm doing wrong. I migrated this database from MySQL, and on there it ran pretty fast. Have you done an 'analyze' or 'vacuum analyze' over these tables? A left outer join gets *everything* from the second table: LEFT OUTER JOIN groups ON groups.id = locations.group_id LEFT OUTER JOIN schools ON schools.location_id = locations.id So they will load everything from groups and schools. Maybe they should be left join's not left outer joins? -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] joining two tables slow due to sequential scan
What version of postgres are you using? Can you post the output from EXPLAIN ANALYZE? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tim Jones Sent: Friday, February 10, 2006 4:07 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] joining two tables slow due to sequential scan I am trying to join two tables and keep getting a sequential scan in the plan even though there is an index on the columns I am joining on. Basically this the deal ... I have two tables with docid in them which is what I am using for the join. ClinicalDocs ... (no primary key) though it does not help if I make docid primary key docid integer (index) patientid integer (index) visitid integer (index) ... Documentversions docid integer (index) docversionnumber (index) docversionidentifier (primary key) It seems to do an index scan if I put the primary key as docid. This is what occurs when I link on the patid from ClinicalDocs to patient table. However I can not make the docid primary key because it gets repeated depending on how may versions of a document I have. I have tried using a foreign key on documentversions with no sucess. In addition this query select * from documentversions join clinicaldocuments on documentversions.documentidentifier = clinicaldocuments.dssdocumentidentifier where documentversions.documentstatus = 'AC'; does index scan but if I change the order e.g select * from clinicaldocuments join documentversions on clinicaldocuments.dssdocumentidentifier = documentversions .documentidentifier where clinicaldocuments.patientidentifier= 123; does sequential scanwhat I need is bottom query it is extremely slow ... Any ideas ? Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555
Re: [PERFORM] joining two tables slow due to sequential scan
OK, if I'm reading this correctly, it looks like the planner is choosing a sequential scan because it expects 48,000 rows for that patientidentifier, but its actually only getting 3. The planner has the number of rows right for the sequential scan, so it seems like the stats are up to date. I would try increasing the stats for the patientindentifier column with 'alter table set statistics...' or increasing the default_statistics_target for the whole DB. Once you have changed the stats I believe you need to run analyze again. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tim Jones Sent: Friday, February 10, 2006 4:59 PM To: Tom Lane Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] joining two tables slow due to sequential scan ok here is real db the first query I had seems to make no sense because it is only fast if I limit the rows since almost all rows have status = 'AC' second query tables both have about 10 million rows and it takes a long time as you can see but this person only has approx 160 total documents QUERY PLAN --- Hash Join (cost=84813.14..1510711.97 rows=48387 width=555) (actual time=83266.854..91166.315 rows=3 loops=1) Hash Cond: (outer.documentidentifier = inner.dssdocumentidentifier) - Seq Scan on documentversions (cost=0.00..269141.98 rows=9677398 width=415) (actual time=0.056..49812.459 rows=9677398 loops=1) - Hash (cost=83660.05..83660.05 rows=48036 width=140) (actual time=10.833..10.833 rows=3 loops=1) - Bitmap Heap Scan on clinicaldocuments (cost=301.13..83660.05 rows=48036 width=140) (actual time=0.243..0.258 rows=3 loops=1) Recheck Cond: (patientidentifier = 690193) - Bitmap Index Scan on ix_cdocpid (cost=0.00..301.13 rows=48036 width=0) (actual time=0.201..0.201 rows=3 loops=1) Index Cond: (patientidentifier = 690193) Total runtime: 91166.540 ms Tim Jones Healthcare Project Manager Optio Software, Inc. (770) 576-3555 -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Friday, February 10, 2006 5:52 PM To: Tim Jones Cc: Scott Marlowe; Dave Dutcher; pgsql-performance@postgresql.org Subject: Re: [PERFORM] joining two tables slow due to sequential scan Tim Jones [EMAIL PROTECTED] writes: QUERY PLAN 'Hash Join (cost=899.83..4384.17 rows=482 width=1350) (actual time=0.203..0.203 rows=0 loops=1)' ... 'Total runtime: 0.392 ms' Hardly seems like evidence of a performance problem ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [PERFORMANCE] Stored Procedures
I don't think pgpool is what you need. If I understand pgpool correctly, pgpool lets you pool multiple postgres servers together. You are just looking for database connection pooling. A simple connection pool is basically just an application wide list of connections. When a client needs a connection, you just request a connection from the pool. If there is an unused connection in the pool, it is given to the client and removed from the unused pool. If there is no unused connection in the pool, then a new connection is opened. When the client is done with it, the client releases it back into the pool. You can google for 'database connection pool' and you should find a bunch of stuff. It's probably a good idea to find one already written. If you write your own you have to make sure it can deal with things like dead connections, synchronization, and maximum numbers of open connections. Dave -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marcos Sent: Monday, January 23, 2006 7:27 AM To: Markus Schaber Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] [PERFORMANCE] Stored Procedures Hi Markus You really should consider using a connection pool (most web application servers provide pooling facilities) or some other means to keep the connection between several http requests. Yes. I'm finding a connection pool, I found the pgpool but yet don't understand how it's work I'm go read more about him. Thanks Marcos ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Showing Column Statistics Number
Hi, Ive looked around through the docs, but cant seem to find an answer to this. If I change a columns statistics with Alter table alter column set statistics n, is there a way I can later go back and see what the number is for that column? I want to be able to tell which columns Ive changed the statistics on, and which ones I havent. Thanks, Dave
[PERFORM] Left Join Performance vs Inner Join Performance
Hello, I have an inner join query that runs fast, but I when I change to a left join the query runs 96 times slower. I wish I could always do an inner join, but there are rare times when there isnt data in the right hand table. I could expect a small performance hit, but the difference is so large I figure I must be doing something wrong. What I think is the strangest is how similar the two query plans are. Query (inner join version, just replace inner with left for other version): select p.owner_trader_id, p.strategy_id, m.last, m.bid, m.ask from om_position p inner join om_instrument_mark m on m.instrument_id = p.instrument_id and m.data_source_id = 5 and m.date = '2005-02-03' where p.as_of_date = '2005-02-03' and p.fund_id = 'TRIDE' and p.owner_trader_id = 'tam4' and p.strategy_id = 'BASKET1' Query plan for inner join: Nested Loop (cost=0.00..176.99 rows=4 width=43) (actual time=0.234..14.182 rows=193 loops=1) - Index Scan using as_of_date_om_position_index on om_position p (cost=0.00..68.26 rows=19 width=20) (actual time=0.171..5.210 rows=193 loops=1) Index Cond: (as_of_date = '2005-02-03'::date) Filter: (((fund_id)::text = 'TRIDE'::text) AND ((owner_trader_id)::text = 'tam4'::text) AND ((strategy_id)::text = 'BASKET1'::text)) - Index Scan using om_instrument_mark_pkey on om_instrument_mark m (cost=0.00..5.71 rows=1 width=31) (actual time=0.028..0.032 rows=1 loops=193) Index Cond: ((m.instrument_id = outer.instrument_id) AND (m.data_source_id = 5) AND (m.date = '2005-02-03'::date)) Total runtime: 14.890 ms Query plan for left join: Nested Loop Left Join (cost=0.00..7763.36 rows=19 width=43) (actual time=3.005..1346.308 rows=193 loops=1) - Index Scan using as_of_date_om_position_index on om_position p (cost=0.00..68.26 rows=19 width=20) (actual time=0.064..6.654 rows=193 loops=1) Index Cond: (as_of_date = '2005-02-03'::date) Filter: (((fund_id)::text = 'TRIDE'::text) AND ((owner_trader_id)::text = 'tam4'::text) AND ((strategy_id)::text = 'BASKET1'::text)) - Index Scan using om_instrument_mark_pkey on om_instrument_mark m (cost=0.00..404.99 rows=1 width=31) (actual time=3.589..6.919 rows=1 loops=193) Index Cond: (m.instrument_id = outer.instrument_id) Filter: ((data_source_id = 5) AND (date = '2005-02-03'::date)) Total runtime: 1347.159 ms Table Definitions: CREATE TABLE om_position ( fund_id varchar(10) NOT NULL DEFAULT ''::character varying, owner_trader_id varchar(10) NOT NULL DEFAULT ''::character varying, strategy_id varchar(30) NOT NULL DEFAULT ''::character varying, instrument_id int4 NOT NULL DEFAULT 0, as_of_date date NOT NULL DEFAULT '0001-01-01'::date, pos numeric(22,9) NOT NULL DEFAULT 0.0, cf_account_id int4 NOT NULL DEFAULT 0, cost numeric(22,9) NOT NULL DEFAULT 0.0, CONSTRAINT om_position_pkey PRIMARY KEY (fund_id, owner_trader_id, strategy_id, cf_account_id, instrument_id, as_of_date), CONSTRAINT $1 FOREIGN KEY (strategy_id) REFERENCES om_strategy (strategy_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT $2 FOREIGN KEY (fund_id) REFERENCES om_fund (fund_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT $3 FOREIGN KEY (cf_account_id) REFERENCES om_cf_account (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT $4 FOREIGN KEY (owner_trader_id) REFERENCES om_trader (trader_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH OIDS; CREATE INDEX as_of_date_om_position_index ON om_position USING btree (as_of_date); CREATE TABLE om_instrument_mark ( instrument_id int4 NOT NULL DEFAULT 0, data_source_id int4 NOT NULL DEFAULT 0, date date NOT NULL DEFAULT '0001-01-01'::date, last numeric(22,9) NOT NULL DEFAULT 0.0, bid numeric(22,9) NOT NULL DEFAULT 0.0, ask numeric(22,9) NOT NULL DEFAULT 0.0, comment varchar(150) NOT NULL DEFAULT ''::character varying, trader_id varchar(10) NOT NULL DEFAULT 'auto'::character varying, CONSTRAINT om_instrument_mark_pkey PRIMARY KEY (instrument_id, data_source_id, date), CONSTRAINT $1 FOREIGN KEY (instrument_id) REFERENCES om_instrument (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT $2 FOREIGN KEY (data_source_id) REFERENCES om_data_source (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT om_instrument_mark_trader_id_fkey FOREIGN KEY (trader_id) REFERENCES om_trader (trader_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH OIDS; Thanks for any help