[PERFORM] BUG: endless lseek(.., SEEK_END) from select queries on x64 builds
hello, i have a strange and reproducible bug with some select queries and 64bit postgresql builds (works fine on 32bit builds). The postgres process will run with 100% cpu-load (no io-wait) and strace will show endless lseek(..., SEEK_END) calls on one table for minutes. lseek(28, 0, SEEK_END) = 26697728 lseek(28, 0, SEEK_END) = 26697728 lseek(28, 0, SEEK_END) = 26697728 ... the file-descriptor 28 points to the file for the webapps_base.Acquisition table (see query/plan below). Now the details: The Query: select count(ac.ID) as col_0_0_ from webapps_base.Acquisition ac, webapps_base.SalesPartnerStructure struc where struc.fk_SalesPartner_child=ac.fk_SalesPartner_ID and struc.fk_SalesPartner_parent=200 and (ac.CreationDate between '2012-02-01' and '2013-01-31') and ac.acquisitiondepot='STANDARD' and ('2013-01-31' between struc.ValidFrom and coalesce(struc.ValidTo, '2013-01-31')) The Plan: Aggregate (cost=32617.11..32617.12 rows=1 width=8) (actual time=204.180..204.180 rows=1 loops=1) - Merge Join (cost=32232.01..32598.26 rows=7543 width=8) (actual time=172.882..202.218 rows=2 loops=1) Merge Cond: (ac.fk_salespartner_id = struc.fk_salespartner_child) - Sort (cost=5582.60..5635.69 rows=21235 width=16) (actual time=28.920..31.121 rows=21204 loops=1) Sort Key: ac.fk_salespartner_id Sort Method: quicksort Memory: 1763kB - Bitmap Heap Scan on acquisition ac (cost=395.26..4056.43 rows=21235 width=16) (actual time=3.064..15.868 rows=21223 loops=1) Recheck Cond: ((creationdate = '2012-02-01'::date) AND (creationdate = '2013-01-31'::date)) Filter: ((acquisitiondepot)::text = 'STANDARD'::text) - Bitmap Index Scan on index_acquistion_creationdate (cost=0.00..389.95 rows=21267 width=0) (actual time=2.890..2.890 rows=21514 loops=1) Index Cond: ((creationdate = '2012-02-01'::date) AND (creationdate = '2013-01-31'::date)) - Sort (cost=26648.60..26742.61 rows=37606 width=8) (actual time=143.952..152.808 rows=131713 loops=1) Sort Key: struc.fk_salespartner_child Sort Method: quicksort Memory: 8452kB - Bitmap Heap Scan on salespartnerstructure struc (cost=3976.80..23790.79 rows=37606 width=8) (actual time=13.279..64.681 rows=114772 loops=1) Recheck Cond: (fk_salespartner_parent = 200) Filter: (('2013-01-31'::date = validfrom) AND ('2013-01-31'::date = COALESCE(validto, '2013-01-31'::date))) - Bitmap Index Scan on index_parent_salespartner (cost=0.00..3967.39 rows=114514 width=0) (actual time=13.065..13.065 rows=116479 loops=1) Index Cond: (fk_salespartner_parent = 200) Total runtime: 205.397 ms as you can see the query runs fine. I can run this query from a bash-psql-while-loop/jdbc-cli-tool endless without any problems. so far so good. But now i run the same query from: JBoss EAP 5.1.2 with connection pooling and xa-datasource/two-phase-commits (transactions on multiple datasources needed) *and* *prepared-statement-cache-size1000/prepared-statement-cache-size* i can run the query four times with good performance and after that postgresql starts with the strange lseek() behavior. The query needs more then a minute to complete and during execution the postgres process runs at 100% cpu load with lseek calls (straced). If i flush the connection pool (close all open connections from the jboss jmx-console) it works again for four calls. These problem applies only to 64bit builds. If i run a 32bit postgresql server it works fine. We have tested the following environments: - Debian Squeeze 64bit with Postgresql 9.1.[5,6,7] - Bad behavior - Debian Wheezy 64bit with Postgresql 9.1.8 64bit - Bad behavior - Ubuntu 12.04 LTS 64bit with Postgresql 9.1.8 64bit - Bad behavior - Windows 7 x64 with Postgresql 9.1.8 64bit - Bad behavior - Debian Wheezy 64bit with EnterpriseDB 9.2 64bit - Bad behavior - Debian Wheezy 64bit with Postgresql 9.1.8 32bit - Good behavior - Debian Wheezy 32bit with Postgresql 9.1.8 32bit - Good behavior as you can see all 64bit builds of postgresql are affected (independent from os-arch). If i disable the prepared-statement-cache-size (remove it from -ds.xml) it works on 64bit build too. regards, msc -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BUG: endless lseek(.., SEEK_END) from select queries on x64 builds
On 22.02.2013 10:25, Markus Schulz wrote: i can run the query four times with good performance and after that postgresql starts with the strange lseek() behavior. By default, the JDBC driver re-plans the prepared statement for the first 4 invocations of the query. On the fifth invocation, it switches to using a generic plan, which will be reused on subsequent invocations. See http://jdbc.postgresql.org/documentation/head/server-prepare.html. The generic plan seems to perform much worse in this case. You can disable that mechanism and force re-planning the query every time by setting the prepareThreshold=0 parameter on the data source. You could check what the generic plan looks like by taking the query used in the java program, with the parameter markers, and running EXPLAIN on that. PostgreSQL version 9.2 might work better in this case. It has some smarts in the server to generate parameter-specific plans even when prepared statements are used, if the planner thinks a specific plan will be faster. - Heikki -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] BUG: endless lseek(.., SEEK_END) from select queries on x64 builds
Markus, * Markus Schulz (m...@antzsystem.de) wrote: as you can see the query runs fine. I can run this query from a bash-psql-while-loop/jdbc-cli-tool endless without any problems. so far so good. [...] JBoss EAP 5.1.2 with connection pooling and xa-datasource/two-phase-commits (transactions on multiple datasources needed) *and* *prepared-statement-cache-size1000/prepared-statement-cache-size* i can run the query four times with good performance and after that postgresql starts with the strange lseek() behavior. It sounds like your bash script and JBoss are doing something different. Would it be possible for you to turn on log_statements = 'all' in PG, see what's different, and then update the bash/psql script to do exactly what JBoss does, and see if you can reproduce it that way? It certainly looks like a PG bug, but it'd be a lot easier to debug with a simple, well-defined test case which shows the failure. Thanks! Stephen signature.asc Description: Digital signature
[PERFORM] Avoiding Recheck Cond when using Select Distinct
The following query produces a Recheck Cond and a costly Bitmap Heap Scan even though I have a composite index that covers both columns being filtered and selected. I believe this is because the initial bitmap scan produces 2912 rows, which is too many for the available bitmap space. I've tried rewriting the command as Select ... group by but it still uses the BHS. Is there a way to rewrite this command that would improve performance by avoiding the costly Bitmap Heap Scan? SELECT distinct store_id, book_id FROM sales_points WHERE sales_points.store_id IN (1, 2, 3, 4, 5, 6, 199, 201, 202) AND sales_points.book_id IN (421, 422, 419, 420) Here is the explain/analyze output: HashAggregate (cost=5938.72..5939.01 rows=97 width=8) (actual time=10.837..10.854 rows=32 loops=1) - Bitmap Heap Scan on sales_points (cost=47.03..5936.53 rows=2191 width=8) (actual time=0.547..5.296 rows=4233 loops=1) Recheck Cond: (book_id = ANY ('{421,422,419,420}'::integer[])) Filter: (store_id = ANY ('{1,2,3,4,5,6,199,201,202}'::integer[])) - Bitmap Index Scan on index_sales_points_on_book_id (cost=0.00..46.92 rows=4430 width=0) (actual time=0.469..0.469 rows=4233 loops=1) Index Cond: (book_id = ANY ('{421,422,419,420}'::integer[])) Total runtime: 10.935 ms Actual runtime is more like 15ms when tested against a development database (which gave est. total runtime of 6ms). Under load in production, the command takes 10,158 ms. Tuning Postgre is not an option, as the instance is provided by Heroku and as far as I know cannot be tuned by me. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Avoiding-Recheck-Cond-when-using-Select-Distinct-tp5746290.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are bitmap index scans slow to start?
Hi Jeff, thanks for the reply. What is going on during the interregnum? Whatever it is, it seems to be driving the log_2013_01_session_idx index out of the cache, but not the log_2013_01 table. (Or perhaps the table visit is getting the benefit of effective_io_concurrency?) . Rebuilding the index might help, as it would put all the leaf pages holding values for session_id=27 adjacent to each other, so they would read from disk faster. But with a name like session_id, I don't know how long such clustering would last though. Technically, nothing should be happening. We used to keep one massive audit log, and was impossible to manage due to its size. We then changed to a strategy where every month a new audit log would be spawned, and since log_2013_01 represents January, the log should be closed and nothing should have changed (it is technically possible that a long-running process would spill over into February, but not by this much). So, assuming that it's stable, it should be a very good candidate for reindexing, no? Our effective_io_concurrency is 1, and last I heard the PG host was a LINUX 4 drive RAID10, so I don't know if there is any benefit to raising this number - and if there was any benfit, it would be to the Bitmap Scan, and the problem is the data building before the fact. the bitmap itself doesn't get cached. But the data needed to construct the bitmap does get cached. It gets cached by the generic caching methods of PG and the OS, not through something specific to bitmaps. This has always been a problem for me. I spend hours trying different strategies and think I've solved the problem, when in fact it seems like a cache has spun up, and then something else expires it and the problem is back. Is there a way around this problem, can I force the expiration of a cache? Carlo
Re: [PERFORM] Are bitmap index scans slow to start?
A cool idea, but if I understand it correctly very specific and fussy. New DB's are spawned on this model, and all the developers would have to be aware of this non-standard behaviour, and DBAs would have to create these indexes every month, for every DB (as the log tables are created every month). There are 89 session_id values in the January log (log_2013_01) so this would quickly get out of control. But - like I said - an interesting idea for more specific challenges. From: Marc Mamin [mailto:m.ma...@intershop.de] Sent: February 21, 2013 2:41 PM To: Jeff Janes; Carlo Stonebanks Cc: pgsql-performance@postgresql.org Subject: AW: [PERFORM] Are bitmap index scans slow to start? Rebuilding the index might help, as it would put all the leaf pages holding values for session_id=27 adjacent to each other, so they would read from disk faster. But with a name like session_id, I don't know how long such clustering would last though. If I'm right about the index disk-read time, then switching to a plain index scan rather than a bitmap index scan would make no difference--either way the data has to come off the disk. I'd prefer a strategy that allowed fast performance the first time, rather than slow the first time and extremely fast subsequently. Hello, if the index is only used to locate rows for single session_id, you may consider split it in a set of partial indexes. e.g. create index i_0 on foo where session_id%4 =0; create index i_1 on foo where session_id%4 =1; create index i_2 on foo where session_id%4 =2; create index i_3 on foo where session_id%4 =3; (can be built in parallel using separate threads) Then you will have to ensure that all your WHERE clauses also contain the index condition: WHERE session_id = 27 AND session_id%4 =27%4 regards, Marc Mamin
Re: [PERFORM] Avoiding Recheck Cond when using Select Distinct
2013/2/22 jackrg j...@groundbreakingsoftware.com Tuning Postgre is not an option, as the instance is provided by Heroku and as far as I know cannot be tuned by me. Most tuning parameters can be set at per-query basis, so you can issue alter database set param=value to have same effect as if it was set through postgresql.conf.
Re: [PERFORM] BUG: endless lseek(.., SEEK_END) from select queries on x64 builds
Am Freitag, 22. Februar 2013, 14:35:25 schrieb Heikki Linnakangas: On 22.02.2013 10:25, Markus Schulz wrote: i can run the query four times with good performance and after that postgresql starts with the strange lseek() behavior. By default, the JDBC driver re-plans the prepared statement for the first 4 invocations of the query. On the fifth invocation, it switches to using a generic plan, which will be reused on subsequent invocations. that sounds really interesting and i would try to change my java-jdbc-test-cli program according to that, but ... See http://jdbc.postgresql.org/documentation/head/server-prepare.html. The generic plan seems to perform much worse in this case. You can disable that mechanism and force re-planning the query every time by setting the prepareThreshold=0 parameter on the data source. it wouldn't explain why the same jboss runs fine with a 32bit postgresql server (i switched only the datasource to another server with exactly the same database). You could check what the generic plan looks like by taking the query used in the java program, with the parameter markers, and running EXPLAIN on that. how can i do this? I've tried the following in my ejb-test-function to: String query = ... entitymanager.createNativeQuery(query)...; entitymanager.createNativeQuery(EXPLAIN ANALYZE + query)...; but the second createNativeQuery call runs fast every time and will show the same plan and the first hangs after the fourth call to this function. PostgreSQL version 9.2 might work better in this case. It has some smarts in the server to generate parameter-specific plans even when prepared statements are used, if the planner thinks a specific plan will be faster. this wouldn't help: - Debian Wheezy 64bit with EnterpriseDB 9.2 64bit - Bad behavior we tried postgresql 9.2 too - Heikki regards, msc -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Bad query plan with high-cardinality column
I have a problem with a query that is planned wrong. This is my schema: create table comments ( id serial primary key, conversation_id integer, created_at timestamp ); create index comments_conversation_id_index on comments (conversation_id); create index comments_created_at_index on comments (created_at); The table has 3.5M rows, and 650k unique values for conversation_id, where the histogram goes up to 54000 rows for the most frequent ID, with a long tail. There are only 20 values with a frequency of 1000 or higher. The created_at column has 3.5M distinct values. Now, I have this query: select comments.id from comments where conversation_id = 3975979 order by created_at limit 13 This filters about 5000 rows and returns the oldest 13 rows. But the query is consistently planned wrong: Limit (cost=0.00..830.53 rows=13 width=12) (actual time=3174.862..3179.525 rows=13 loops=1) Buffers: shared hit=2400709 read=338923 written=21 - Index Scan using comments_created_at_index on comments (cost=0.00..359938.52 rows=5634 width=12) (actual time=3174.860..3179.518 rows=13 loops=1) Filter: (conversation_id = 3975979) Rows Removed by Filter: 2817751 Buffers: shared hit=2400709 read=338923 written=21 Total runtime: 3179.553 ms It takes anywhere between 3 seconds and several minutes to run, depending on how warm the disk cache is. This is the correct plan and index usage: Limit (cost=6214.34..6214.38 rows=13 width=12) (actual time=25.471..25.473 rows=13 loops=1) Buffers: shared hit=197 read=4510 - Sort (cost=6214.34..6228.02 rows=5471 width=12) (actual time=25.469..25.470 rows=13 loops=1) Sort Key: created_at Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=197 read=4510 - Index Scan using comments_conversation_id_index on comments (cost=0.00..6085.76 rows=5471 width=12) (actual time=1.163..23.955 rows=5834 loops=1) Index Cond: (conversation_id = 3975979) Buffers: shared hit=197 read=4510 Total runtime: 25.500 ms The problem for Postgres is obviously to estimate how many rows have a given conversation_id value, but I have confirmed that the value is correctly tracked in the histogram. I'm at a loss how to explain why the planner thinks scanning a huge index that covers the entire table will ever beat scanning a small index that has 17% of the table's values. Even if the entire database were in RAM, this would hit way too much buffers unnecessarily. (I have determined that planner will consistently use the bad plan for higher-frequency values, and the good plan for lower-frequency values.) It will *always* be better to branch off the comments_conversation_id_index index. Another curious thing: If I run ANALYZE repeatedly, the planner will sometimes, oddly enough, choose the correct plan. This behaviour actually seems related to effective_cache_size; if it's set small (128MB), the planner will sometimes favour the good plan, but if large (= 2GB) it will consistently use the bad plan. Not sure if ANALYZE is changing anything or if it's just bad timing. Things I have tried: I have bumped the statistics target up to 1, but it does not help. I have also tried setting n_distinct for the column manually, since Postgres guesses it's 285k instead of 650k, but
Re: [PERFORM] Are bitmap index scans slow to start?
I can't really help, but I can make it more clear why postgres is choosing a _bitmap_ index scan rather than a regular index scan. With a regular index scan it pumps the index for the locations of the rows that it points to and loads those rows as it finds them. This works great if the rows in the index are sorta sorted - that way it isn't jumping around the table randomly. Random io is slow. In a bitmap index scan pg pumps the index and buffers the by shoving them in a big bitmap. Then, it walks the bitmap in order to produce in order io. PG makes the choice based on a measure of the index's correlation. The problem comes down to you inserting the sessions concurrently with one another. My instinct would be to lower the FILLFACTOR on newly created indecies so they can keep their entries more in order. I'm not sure why I have that instinct but it feels right. Also, you might could try clustering newly created tables on session_id and setting the fillfactor down so rows with the same session id will stick together on disk. Now that I look stuff up on the internet I'm not sure where I saw that pg tries to maintain a cluster using empty space from FILLFACTOR but I _think_ it does. I'm not sure what is going on with my google foo today. Nik On Fri, Feb 22, 2013 at 12:50 PM, Carlo Stonebanks stonec.regis...@sympatico.ca wrote: A cool idea, but if I understand it correctly very specific and fussy. New DB’s are spawned on this model, and all the developers would have to be aware of this non-standard behaviour, and DBA”s would have to create these indexes every month, for every DB (as the log tables are created every month). There are 89 session_id values in the January log (log_2013_01) so this would quickly get out of control. But – like I said – an interesting idea for more specific challenges. ** ** *From:* Marc Mamin [mailto:m.ma...@intershop.de] *Sent:* February 21, 2013 2:41 PM *To:* Jeff Janes; Carlo Stonebanks *Cc:* pgsql-performance@postgresql.org *Subject:* AW: [PERFORM] Are bitmap index scans slow to start? ** ** ** ** Rebuilding the index might help, as it would put all the leaf pages holding values for session_id=27 adjacent to each other, so they would read from disk faster. But with a name like session_id, I don't know how long such clustering would last though. If I'm right about the index disk-read time, then switching to a plain index scan rather than a bitmap index scan would make no difference--either way the data has to come off the disk. I'd prefer a strategy that allowed fast performance the first time, rather than slow the first time and extremely fast subsequently. Hello, if the index is only used to locate rows for single session_id, you may consider split it in a set of partial indexes. e.g. create index i_0 on foo where session_id%4 =0; create index i_1 on foo where session_id%4 =1; create index i_2 on foo where session_id%4 =2; create index i_3 on foo where session_id%4 =3; (can be built in parallel using separate threads) Then you will have to ensure that all your WHERE clauses also contain the index condition: WHERE session_id = 27 AND session_id%4 =27%4 regards, Marc Mamin
Re: [PERFORM] Avoiding Recheck Cond when using Select Distinct
On Fri, Feb 22, 2013 at 9:59 AM, Vitalii Tymchyshyn tiv...@gmail.com wrote: Tuning Postgre is not an option, as the instance is provided by Heroku and as far as I know cannot be tuned by me. Most tuning parameters can be set at per-query basis, so you can issue alter database set param=value to have same effect as if it was set through postgresql.conf. Jack, Jeff brought up some great points and What Vitalii suggested should let you tweak most knobs, but if you're running into limitations of the platform or you find default settings which seem outright incorrect, please file a support ticket and we'll be happy to work with you. Thanks, Maciek Heroku Postgres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are bitmap index scans slow to start?
Also, you might could try clustering newly created tables on session_id and setting the fillfactor down so rows with the same session id will stick together on disk. My understanding of PG's cluster is that this is a one-time command that creates a re-ordered table and doesn't maintain the clustered order until the command is issued again. During the CLUSTER, the table is read and write locked. So, in order for me to use this I would need to set up a timed event to CLUSTER occasionally. I can't really help, but I can make it more clear why postgres is choosing a _bitmap_ index scan rather than a regular index scan The EXPLAIN ANALYZE is showing it is taking a long time to prepare the bitmap (i.e.- Bitmap Index Scan on log_2013_01_session_idx (cost=0.00..63186.52 rows=2947664 width=0) (actual time=32611.918..32611.918 rows=2772042 loops=1) Index Cond: (session_id = 27) the bitmap scan is actually very fast. Jeff sasys that the bitmap is not cached, so I will assume the PG general caches being created are of general use. I think what I need to do is figure out is: 1) Why does it take 36 seconds to set up the general index caches? 2) What can I do about it (what stats do I need to look at)? 3) How can I force these caches to expire so I can tell if the strategy worked? From: Nikolas Everett [mailto:nik9...@gmail.com] Sent: February 22, 2013 2:05 PM To: Carlo Stonebanks Cc: Marc Mamin; Jeff Janes; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Are bitmap index scans slow to start? I can't really help, but I can make it more clear why postgres is choosing a _bitmap_ index scan rather than a regular index scan. With a regular index scan it pumps the index for the locations of the rows that it points to and loads those rows as it finds them. This works great if the rows in the index are sorta sorted - that way it isn't jumping around the table randomly. Random io is slow. In a bitmap index scan pg pumps the index and buffers the by shoving them in a big bitmap. Then, it walks the bitmap in order to produce in order io. PG makes the choice based on a measure of the index's correlation. The problem comes down to you inserting the sessions concurrently with one another. My instinct would be to lower the FILLFACTOR on newly created indecies so they can keep their entries more in order. I'm not sure why I have that instinct but it feels right. Also, you might could try clustering newly created tables on session_id and setting the fillfactor down so rows with the same session id will stick together on disk. Now that I look stuff up on the internet I'm not sure where I saw that pg tries to maintain a cluster using empty space from FILLFACTOR but I _think_ it does. I'm not sure what is going on with my google foo today. Nik On Fri, Feb 22, 2013 at 12:50 PM, Carlo Stonebanks stonec.regis...@sympatico.ca wrote: A cool idea, but if I understand it correctly very specific and fussy. New DB's are spawned on this model, and all the developers would have to be aware of this non-standard behaviour, and DBAs would have to create these indexes every month, for every DB (as the log tables are created every month). There are 89 session_id values in the January log (log_2013_01) so this would quickly get out of control. But - like I said - an interesting idea for more specific challenges. From: Marc Mamin [mailto:m.ma...@intershop.de] Sent: February 21, 2013 2:41 PM To: Jeff Janes; Carlo Stonebanks Cc: pgsql-performance@postgresql.org Subject: AW: [PERFORM] Are bitmap index scans slow to start? Rebuilding the index might help, as it would put all the leaf pages holding values for session_id=27 adjacent to each other, so they would read from disk faster. But with a name like session_id, I don't know how long such clustering would last though. If I'm right about the index disk-read time, then switching to a plain index scan rather than a bitmap index scan would make no difference--either way the data has to come off the disk. I'd prefer a strategy that allowed fast performance the first time, rather than slow the first time and extremely fast subsequently. Hello, if the index is only used to locate rows for single session_id, you may consider split it in a set of partial indexes. e.g. create index i_0 on foo where session_id%4 =0; create index i_1 on foo where session_id%4 =1; create index i_2 on foo where session_id%4 =2; create index i_3 on foo where session_id%4 =3; (can be built in parallel using separate threads) Then you will have to ensure that all your WHERE clauses also contain the index condition: WHERE session_id = 27 AND session_id%4 =27%4 regards, Marc Mamin
Re: [PERFORM] Bad query plan with high-cardinality column
Alexander Staubo a...@bengler.no writes: select comments.id from comments where conversation_id = 3975979 order by created_at limit 13 I'm at a loss how to explain why the planner thinks scanning a huge index that covers the entire table will ever beat scanning a small index that has 17% of the table's values. The reason is that the LIMIT may stop the query before it's scanned all of the index. The planner estimates on the assumption that the desired rows are roughly uniformly distributed within the created_at index, and on that assumption, it looks like this query will stop fairly soon ... but evidently, that's wrong. On the other hand, it knows quite well that the other plan will require pulling out 5000-some rows and then sorting them before it can return anything, so that's not going to be exactly instantaneous either. In this example, I'll bet that conversation_id and created_at are pretty strongly correlated, and that most or all of the rows with that specific conversation_id are quite far down the created_at ordering, so that the search through the index takes a long time to run. OTOH, with another conversation_id the same plan might run almost instantaneously. If you're concerned mostly with this type of query then a 2-column index on (conversation_id, created_at) would serve your purposes nicely. You could likely even dispense with the separate index on conversation_id alone. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad query plan with high-cardinality column
Alexander Staubo a...@bengler.no wrote: This is my schema: create table comments ( id serial primary key, conversation_id integer, created_at timestamp ); create index comments_conversation_id_index on comments (conversation_id); create index comments_created_at_index on comments (created_at); I suspect you would be better off without those two indexes, and instead having an index on (conversation_id, created_at). Not just for the query you show, but in general. select comments.id from comments where conversation_id = 3975979 order by created_at limit 13 This filters about 5000 rows and returns the oldest 13 rows. But the query is consistently planned wrong: [planner thinks it will be cheaper to read index in ORDER BY sequence and filter rows until it has 13 than to read 5471 rows and sort them to pick the top 13 after the sort.] In my experience these problems come largely from the planner not knowing the cost of dealing with each tuple. I see a lot less of this if I raise cpu_tuple_cost to something in the 0.03 to 0.05 range. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad query plan with high-cardinality column
On Friday, February 22, 2013 at 21:33 , Tom Lane wrote: The reason is that the LIMIT may stop the query before it's scanned all of the index. The planner estimates on the assumption that the desired rows are roughly uniformly distributed within the created_at index, and on that assumption, it looks like this query will stop fairly soon ... but evidently, that's wrong. On the other hand, it knows quite well that the other plan will require pulling out 5000-some rows and then sorting them before it can return anything, so that's not going to be exactly instantaneous either. In this example, I'll bet that conversation_id and created_at are pretty strongly correlated, and that most or all of the rows with that specific conversation_id are quite far down the created_at ordering, so that the search through the index takes a long time to run. OTOH, with another conversation_id the same plan might run almost instantaneously. That's right. So I created a composite index, and not only does this make the plan correct, but the planner now chooses a much more efficient plan than the previous index that indexed only on conversation_id: Limit (cost=0.00..30.80 rows=13 width=12) (actual time=0.042..0.058 rows=13 loops=1) Buffers: shared hit=8 - Index Scan using index_comments_on_conversation_id_and_created_at on comments (cost=0.00..14127.83 rows=5964 width=12) (actual time=0.039..0.054 rows=13 loops=1) Index Cond: (conversation_id = 3975979) Buffers: shared hit=8 Total runtime: 0.094 ms Is this because it can get the value of created_at from the index, or is it because it can know that the index is pre-sorted, or both? Very impressed that Postgres can use a multi-column index for this. I just assumed, wrongly, that it couldn't. I will have to go review my other tables now and see if they can benefit from multi-column indexes. Thanks! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad query plan with high-cardinality column
On Friday, February 22, 2013 at 21:47 , Kevin Grittner wrote: I suspect you would be better off without those two indexes, and instead having an index on (conversation_id, created_at). Not just for the query you show, but in general. Indeed, that solved it, thanks! In my experience these problems come largely from the planner not knowing the cost of dealing with each tuple. I see a lot less of this if I raise cpu_tuple_cost to something in the 0.03 to 0.05 range. Is this something I can just frob a bit without worrying about it adversely impacting database performance across the board, or should I be very careful and do lots of testing on a staging box first? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad query plan with high-cardinality column
Alexander Staubo a...@bengler.no wrote: On Friday, February 22, 2013 at 21:47 , Kevin Grittner wrote: In my experience these problems come largely from the planner not knowing the cost of dealing with each tuple. I see a lot less of this if I raise cpu_tuple_cost to something in the 0.03 to 0.05 range. Is this something I can just frob a bit without worrying about it adversely impacting database performance across the board, or should I be very careful and do lots of testing on a staging box first? If possible, I would recommend trying it with the old indexes and seeing whether it causes it to choose the better plan. (Of course, you're not going to beat the plan you get with the two-column index for this query, but it might help it better cost the other alternatives, which would be a clue that it makes your overall costing model more accurate and would have a more general benefit.) You can play with settings like this in a single session without affecting any other sessions. I always recommend testing a change like this in staging and closely monitoring after deploying to production, to confirm the overall benefit and look for any odd cases which might suffer a performance regression. For this particular change, I have never seen a negative effect, but I'm sure that it's possible to have a scenario where it isn't helpful. Personally, I have changed this setting many times and have often noted that 0.02 was not enough to cause choice of an optimal plan, 0.03 was always enough to do it if adjusting this setting was going to help at all, and boosting it to 0.05 never caused further plan changes in the cases I tested. I have never seen such increases cause less optimal plan choice. If you try this, please post your results. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance