Re: [PERFORM] Expected performance of querying 5k records from 4 million records?
Thanks for the help, Jeff and Josh. It looks reclustering the multi-column index might solve things. For my particular query, because I'm getting a range of records back, it makes sense that reclustering will benefit me if I have a slow disk even if I had expected that the indices would be sufficient . I now need to make sure that the speed up I'm seeing is not because things have been cached. That being said, here's what I have: 2CPUs, 12 physical cores, hyperthreaded (24 virtual cores), 2.67Ghz 96G RAM, 80G available to dom0 CentOS 5.8, Xen 3Gbps SATA (7200 RPM, Hitachi ActiveStar Enterprise Class) So, I have lots of RAM, but not necessarily the fastest disk. default_statistics_target = 50 # pgtune wizard 2011-03-16 maintenance_work_mem = 1GB # pgtune wizard 2011-03-16 constraint_exclusion = on # pgtune wizard 2011-03-16 checkpoint_completion_target = 0.9 # pgtune wizard 2011-03-16 effective_cache_size = 24GB # pgtune wizard 2011-03-16 work_mem = 192MB # pgtune wizard 2011-03-16 wal_buffers = 8MB # pgtune wizard 2011-03-16 checkpoint_segments = 128 # pgtune wizard 2011-03-16, amended by am, 30may2011 shared_buffers = 4GB # pgtune wizard 2011-03-16 max_connections = 100 # pgtune wizard 2011-03-16: 80, bumped up to 100 max_locks_per_transaction = 1000 I didn't know about explain (analyze,buffers). Very cool. So, based on your advice, I ran it and here's what I found: 1st time I ran the query: QUERY PLAN --- Bitmap Heap Scan on data_part_213 (cost=113.14..13725.77 rows=4189 width=16) (actual time=69.807..2763.174 rows=5350 loops=1) Recheck Cond: ((data_id = 50544630) AND (data_id = 50549979)) Filter: ((dataset_id = 213) AND (stat_id = 6)) Buffers: shared read=4820 - Bitmap Index Scan on data_unq_213 (cost=0.00..112.09 rows=5142 width=0) (actual time=51.918..51.918 rows=5350 loops=1) Index Cond: ((data_id = 50544630) AND (data_id = 50549979)) Buffers: shared read=19 Total runtime: 2773.099 ms (8 rows) the second time I run the query it's very fast, since all the buffered read counts have turned into hit counts showing I'm reading from cache (as I expected): QUERY PLAN Bitmap Heap Scan on data_part_213 (cost=113.14..13725.77 rows=4189 width=16) (actual time=1.661..14.376 rows=5350 loops=1) Recheck Cond: ((data_id = 50544630) AND (data_id = 50549979)) Filter: ((dataset_id = 213) AND (stat_id = 6)) Buffers: shared hit=4819 - Bitmap Index Scan on data_unq_213 (cost=0.00..112.09 rows=5142 width=0) (actual time=0.879..0.879 rows=5350 loops=1) Index Cond: ((data_id = 50544630) AND (data_id = 50549979)) Buffers: shared hit=18 Total runtime: 20.232 ms (8 rows) Next, I tried reclustering a partition with the multicolumn-index. the big things is that the read count has dropped dramatically! Index Scan using data_part_214_dataset_stat_data_idx on data_part_214 (cost=0.00..7223.05 rows=4265 width=16) (actual time=0.093..7.251 rows=5350 loops=1) Index Cond: ((dataset_id = 214) AND (data_id = 50544630) AND (data_id = 50549979) AND (stat_id = 6)) Buffers: shared hit=45 read=24 Total runtime: 12.929 ms (4 rows) second time: -- Index Scan using data_part_214_dataset_stat_data_idx on data_part_214 (cost=0.00..7223.05 rows=4265 width=16) (actual time=0.378..7.696 rows=5350 loops=1) Index Cond: ((dataset_id = 214) AND (data_id = 50544630) AND (data_id = 50549979) AND (stat_id = 6)) Buffers: shared hit=68 Total runtime: 13.511 ms (4 rows) So, it looks like clustering the index appropriately fixes things! Also, I'll recreate the index switching the order to (dataset_id, stat_id,data_id) thanks! On Fri, Jun 15, 2012 at 11:20 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Jun 15, 2012 at 9:17 AM, Anish Kejariwal anish...@gmail.com wrote: Below are the tables, queries, and execution plans with my questions with more detail. (Since I have 250 partitions, I can query one partition after the other to ensure that I'm not pulling results form the cache) Doesn't that explain why it is slow? If you have 15000 rpm drives and each row is in a different block and uncached, it would take 20 seconds to read them all in. You are getting 10 times better than that, either due to caching or because your rows are clustered, or because effective_io_concurrency is doing its thing. explain analyze select data_id, dataset_id, stat from
[PERFORM] Expected performance of querying 5k records from 4 million records?
I've been struggling with this issue for the last several days, and I feel like I'm running into a few different issues that I don't understand. I'm using postgres 9.0.8, and here's the OS I'm running this on: inux 2.6.18-308.4.1.el5xen #1 SMP Tue Apr 17 17:49:15 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux from show all: shared_buffers | 4GB work_mem| 192MB maintenance_work_mem| 1GB effective_cache_size| 24GB wal_sync_method | fdatasync random_page_cost| 4 My situtation: I have an empty parent table, that has 250 partitions. Each partition has 4 million records (250 megs). I'm querying 5k records directly from one partition (no joins), and it's taking ~2 seconds to get the results. This feels very slow to me for an indexed table of only 4 million records. Quick overview of my questions:: 1. expected performance? tips on what to look into to increase performance? 2. should multicolumn indices help? 3. does reindex table cache the table? Below are the tables, queries, and execution plans with my questions with more detail. (Since I have 250 partitions, I can query one partition after the other to ensure that I'm not pulling results form the cache) Parent table: # \d data Table public.data Column| Type | Modifiers --+--+--- data_id| integer | not null dataset_id | integer | not null stat| double precision | not null stat_id | integer | not null Number of child tables: 254 (Use \d+ to list them.) Child (partition) with ~4 million records: \d data_part_201 genepool_1_11=# \d data_part_201 Table public.data_part_201 Column| Type | Modifiers --+--+--- data_id| integer | not null dataset_id | integer | not null stat| double precision | not null stat_id | integer | not null Indexes: data_unq_201 UNIQUE, btree (data_id) data_part_201_dataset_id_idx btree (dataset_id) data_part_201_stat_id_idx btree (stat_id) Check constraints: data_chk_201 CHECK (dataset_id = 201) Inherits: data explain analyze select data_id, dataset_id, stat from data_part_201 where dataset_id = 201 and stat_id = 6 and data_id=50544630 and data_id=50549979; Bitmap Heap Scan on data_part_201 (cost=115.79..14230.69 rows=4383 width=16) (actual time=36.103..1718.141 rows=5350 loops=1) Recheck Cond: ((data_id = 50544630) AND (data_id = 50549979)) Filter: ((dataset_id = 201) AND (stat_id = 6)) - Bitmap Index Scan on data_unq_201 (cost=0.00..114.70 rows=5403 width=0) (actual time=26.756..26.756 rows=5350 loops=1) Index Cond: ((data_id = 50544630) AND (data_id = 50549979)) Total runtime: 1728.447 ms (6 rows) Time: 1743.535 ms QUESTION 1: you can see that the query is very simple. is this the optimal execution plan? any tips on what to look into to increase performance? I then tried adding the following multi-column index: data_part_202_dataset_regionset_data_idx btree (dataset_id, data_id, stat_id) The query now takes 27 seconds!: explain analyze select data_id, dataset_id, stat from data_part_202 where dataset_id = 202 and stat_id = 6 and data_id=50544630 and data_id=50549979; Index Scan using data_part_202_dataset_regionset_data_idx on data_part_202 (cost=0.00..7987.83 rows=4750 width=16) (actual time=39.152..27339.401 rows=5350 loops=1) Index Cond: ((dataset_id = 202) AND (data_id = 50544630) AND (data_id = 50549979) AND (stat_id = 6)) Total runtime: 27349.091 ms (3 rows) QUESTION 2: why is a multicolumn index causing the query to run so much slower? I had expected it to increase the performance QUESTION 3: If I do the following: reindex table data_part_204 the query now takes 50-70 milliseconds. Is this because the table is getting cached? How do I know if a particular query is coming from the cache? The reason why I think reindex table is caching the results, is that select count(*) from the partition also causes the query to be fast. (and yes, vacuum analyze on the partition makes no difference)
Re: [PERFORM] Calculating statistic via function rather than with query is slowing my query
Hi Craig, Fair point. For now, I mean just fast - which is 5-15 seconds, but I'd like to get it down to the 1-2 second range. From the query I provided, I have approximately 30,000 unique keys (what I called primary_id) that I'm grouping by, and each key has a series of numerical values for each of the type_ids. I'm looking at averages, stddev and other statistics across a few hundred type_ids (where agg.type_id in ). The part of the query that varies is the user specified type_ids, which makes it impossible to precalculate my statistics. I'd like this to eventually scale to a million unique keys, and a thousand type_ids. For now Postgres been great for modeling the data, understanding where I hit performance bottle necks, and providing a fast enough user interface. But, I'm definitely starting to think about whether I can cache my data (with millions of keys and thousands of type_ids, the data might be too large), and whether to look into distributed databases (even thought I can't precompute the stats, my queries are easily distributable across multiple processors since each processor could take a batch of keys). I might even want to consider a column oriented database - since my keys don't change often, I could potentially add new columns when there are new type_ids. I've been thinking of looking into memcached or hbase. If you have any suggestions on which options I should explore, I'd greatly appreciate it. Sorry, for veering off topic a bit from postgres. thanks, Anish On Wed, Aug 17, 2011 at 10:32 PM, Craig Ringer ring...@ringerc.id.auwrote: On 18/08/2011 9:03 AM, Anish Kejariwal wrote: Thanks for the help Pavel and Craig. I really appreciate it. I'm going to try a couple of these different options (write a c function, use a sql function with case statements, and use plperl), so I can see which gives me the realtime performance that I need, and works best for clean code in my particular case. Do you really mean realtime? Or just fast? If you have strongly bounded latency requirements, any SQL-based, disk-based system is probably not for you. Especially not one that relies on a statics-based query planner, caching, and periodic checkpoints. I'd be looking into in-memory databases designed for realtime environments where latency is critical. Hard realtime: If this system fails to respond within x milliseconds, all the time, every time, then something will go smash or boom expensively and unrecoverably. Soft realtime: If this system responds late, the late response is expensive or less useful. Frequent late responses are unacceptable but the occasional one might be endurable. Just needs to be fast: If it responds late, the user gets irritated because they're sitting and waiting for a response. Regular long stalls are unacceptable, but otherwise the user can put up with it. You're more concerned with average latency than maximum latency. -- Craig Ringer
[PERFORM] Calculating statistic via function rather than with query is slowing my query
Hi everyone, I'm using postgres 9.0.3, and here's the OS I'm running this on: Linux 2.6.18-238.12.1.el5xen #1 SMP Tue May 31 14:02:29 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux I have a fairly straight forward query. I'm doing a group by on an ID, and then calculating some a statistic on the resulting data. The problem I'm running into is that when I'm calculating the statistics via a function, it's twice as slow as when I'm calculating the statistics directly in my query. I want to be able to use a function, since I'll be using this particular calculation in many places. Any idea of what's going on? Below, I've included my function, and both queries (I removed the type_ids, and just wrote …ids… Here's my function (I also tried stable): CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c integer) RETURNS double precision AS $$ BEGIN return a/b/c* 10::double precision; END; $$ LANGUAGE plpgsql immutable; The query that takes 7.6 seconds, when I calculate the statistic from within the query: explain analyze select agg.primary_id, avg(agg.a / agg.b / agg.c * 10::double precision) foo, stddev(agg.a / agg.b / agg.c * 10::double precision) bar from mytable agg where agg.type_id in (ids) group by agg.primary_id; The execution plan: HashAggregate (cost=350380.58..350776.10 rows=9888 width=20) (actual time=7300.414..7331.659 rows=20993 loops=1) - Bitmap Heap Scan on mytable agg (cost=28667.90..337509.63 rows=1716127 width=20) (actual time=200.064..2861.600 rows=2309230 loops=1) Recheck Cond: (type_id = ANY ('{ids}'::integer[])) - Bitmap Index Scan on mytable_type_id_idx (cost=0.00..28238.87 rows=1716127 width=0) (actual time=192.725..192.725 rows=2309230 loops=1) Index Cond: (type_id = ANY ('{ids}'::integer[])) Total runtime: 7358.337 ms (6 rows) The same query, but now I'm calling the function. When I call the function it's taking 15.5 seconds. explain analyze select agg.primary_id, avg(calc_test(agg.a,agg.b,agg.c)) foo, stddev(calc_test(agg.a,agg.b,agg.c)) bar from mytable agg where agg.type_id in (ids) group by agg.primary_id; and, here's the execution plan: HashAggregate (cost=350380.58..355472.90 rows=9888 width=20) (actual time=13660.838..13686.618 rows=20993 loops=1) - Bitmap Heap Scan on mytable agg (cost=28667.90..337509.63 rows=1716127 width=20) (actual time=170.385..2881.122 rows=2309230 loops=1) Recheck Cond: (type_id = ANY ('{ids}'::integer[])) - Bitmap Index Scan on mytable_type_id_idx (cost=0.00..28238.87 rows=1716127 width=0) (actual time=162.834..162.834 rows=2309230 loops=1) Index Cond: (type_id = ANY ('{ids}'::integer[])) Total runtime: 13707.560 ms Thanks! Anish
Re: [PERFORM] Calculating statistic via function rather than with query is slowing my query
Thanks Pavel! that definitely solved it. Unfortunately, the function I gave you was a simple/short version of what the actual function is going to be. The actual function is going to get parameters passed to it, and based on the parameters will go through some if...else conditions, and maybe even call another function. Based on that, I was definitely hoping to use plpgsql, and the overhead is unfortunate. Is there any way to get around this overhead? Will I still have the same overhead if I use plperl, plpython, pljava, or write the function in C? Anish On Wed, Aug 17, 2011 at 11:27 AM, Pavel Stehule pavel.steh...@gmail.comwrote: Hello 2011/8/17 Anish Kejariwal anish...@gmail.com: Hi everyone, I'm using postgres 9.0.3, and here's the OS I'm running this on: Linux 2.6.18-238.12.1.el5xen #1 SMP Tue May 31 14:02:29 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux I have a fairly straight forward query. I'm doing a group by on an ID, and then calculating some a statistic on the resulting data. The problem I'm running into is that when I'm calculating the statistics via a function, it's twice as slow as when I'm calculating the statistics directly in my query. I want to be able to use a function, since I'll be using this particular calculation in many places. Any idea of what's going on? Below, I've included my function, and both queries (I removed the type_ids, and just wrote …ids… Here's my function (I also tried stable): CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c integer) RETURNS double precision AS $$ BEGIN return a/b/c* 10::double precision; END; $$ LANGUAGE plpgsql immutable; this is overhead of plpgsql call. For this simple functions use a SQL functions instead CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c integer) RETURNS double precision AS $$ SELECT $1/$2/$3* 10::double precision; $$ LANGUAGE sql; Regards Pavel Stehule The query that takes 7.6 seconds, when I calculate the statistic from within the query: explain analyze select agg.primary_id, avg(agg.a / agg.b / agg.c * 10::double precision) foo, stddev(agg.a / agg.b / agg.c * 10::double precision) bar from mytable agg where agg.type_id in (ids) group by agg.primary_id; The execution plan: HashAggregate (cost=350380.58..350776.10 rows=9888 width=20) (actual time=7300.414..7331.659 rows=20993 loops=1) - Bitmap Heap Scan on mytable agg (cost=28667.90..337509.63 rows=1716127 width=20) (actual time=200.064..2861.600 rows=2309230 loops=1) Recheck Cond: (type_id = ANY ('{ids}'::integer[])) - Bitmap Index Scan on mytable_type_id_idx (cost=0.00..28238.87 rows=1716127 width=0) (actual time=192.725..192.725 rows=2309230 loops=1) Index Cond: (type_id = ANY ('{ids}'::integer[])) Total runtime: 7358.337 ms (6 rows) The same query, but now I'm calling the function. When I call the function it's taking 15.5 seconds. explain analyze select agg.primary_id, avg(calc_test(agg.a,agg.b,agg.c)) foo, stddev(calc_test(agg.a,agg.b,agg.c)) bar from mytable agg where agg.type_id in (ids) group by agg.primary_id; and, here's the execution plan: HashAggregate (cost=350380.58..355472.90 rows=9888 width=20) (actual time=13660.838..13686.618 rows=20993 loops=1) - Bitmap Heap Scan on mytable agg (cost=28667.90..337509.63 rows=1716127 width=20) (actual time=170.385..2881.122 rows=2309230 loops=1) Recheck Cond: (type_id = ANY ('{ids}'::integer[])) - Bitmap Index Scan on mytable_type_id_idx (cost=0.00..28238.87 rows=1716127 width=0) (actual time=162.834..162.834 rows=2309230 loops=1) Index Cond: (type_id = ANY ('{ids}'::integer[])) Total runtime: 13707.560 ms Thanks! Anish
Re: [PERFORM] Calculating statistic via function rather than with query is slowing my query
Thanks for the help Pavel and Craig. I really appreciate it. I'm going to try a couple of these different options (write a c function, use a sql function with case statements, and use plperl), so I can see which gives me the realtime performance that I need, and works best for clean code in my particular case. thanks! Anish On Wed, Aug 17, 2011 at 5:05 PM, Craig Ringer ring...@ringerc.id.au wrote: On 18/08/2011 3:00 AM, Anish Kejariwal wrote: Thanks Pavel! that definitely solved it. Unfortunately, the function I gave you was a simple/short version of what the actual function is going to be. The actual function is going to get parameters passed to it, and based on the parameters will go through some if...else conditions, and maybe even call another function. Based on that, I was definitely hoping to use plpgsql, and the overhead is unfortunate. Is there any way to get around this overhead? Will I still have the same overhead if I use plperl, plpython, pljava, or write the function in C? You can probably still write it as an SQL function if you use CASE WHEN appropriately. -- Craig Ringer
Re: [PERFORM] issue with query optimizer when joining two partitioned tables
relpages from pg_class where relname = 'icecream'; relpages -- 0 (1 row) Ok, so the planner thinks that the parent table is empty. I then ran: update pg_class set relpages = 1 where relname = 'icecream'; HashAggregate (cost=201199.27..201564.18 rows=11228 width=41) (actual time=277.195..304.620 rows=11028 loops=1) - Nested Loop (cost=4.28..79942.45 rows=4041894 width=41) (actual time=0.227..231.181 rows=11028 loops=1) Join Filter: (r.widget_id = rc.widget_id) - Append (cost=4.28..1163.69 rows=11228 width=21) (actual time=0.125..40.834 rows=11028 loops=1) - Bitmap Heap Scan on widget r (cost=4.28..12.75 rows=1 width=48) (actual time=0.022..0.022 rows=0 loops=1) Recheck Cond: (widgetset_id = 5) Filter: (widget_type_id = 4) - Bitmap Index Scan on widget_widgetset_id_idx (cost=0.00..4.28 rows=4 width=0) (actual time=0.019..0.019 rows=0 loops=1) Index Cond: (widgetset_id = 5) - Index Scan using widget_part_5_widget_widget_type_id_idx on widget_part_5 r (cost=0.00..1150.94 rows=11227 width=21) (actual time=0.100..18.964 rows=11028 lo ops=1) Index Cond: (widget_type_id = 4) Filter: (widgetset_id = 5) - Append (cost=0.00..6.99 rows=2 width=24) (actual time=0.008..0.012 rows=1 loops=11028) - Seq Scan on icecream rc (cost=0.00..0.00 rows=1 width=24) (actual time=0.001..0.001 rows=0 loops=11028) Filter: (rc.dataset_id = 283) - Index Scan using icecream_part_283_widget_id_idx on icecream_part_283 rc (cost=0.00..6.99 rows=1 width=24) (actual time=0.004..0.006 rows=1 loo ps=11028) Index Cond: (rc.widget_id = r.widget_id) Filter: (rc.dataset_id = 283) Total runtime: 318.634 ms (19 rows) Wow! that fixes it. Thanks you so much I've been struggling with this issue for 2-3 days. (Also, in the past, I've seen inconsistent performance with this query, which may be the result of the planner sometimes choosing the wrong plan, but I'll chase that down later). Tom said: But maybe we should reconsider the heuristic for tables that are members of inheritance trees --- particularly parents of inheritance trees. I agree. I think postgres should get updated to take this into account. I shouldn't have to set the relpages to 1 for all the empty parent tables that I have partitioned. Should I file this as a bug/enhancement? Also, do I need to worry about about autoanalyze/autovacuum setting back relpages to zero for the parent icecream table? thanks!!! Anish On Sat, Jul 9, 2011 at 10:43 AM, Tom Lane t...@sss.pgh.pa.us wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 09.07.2011 00:36, Anish Kejariwal wrote: My guess as to what happened: -because the icecream parent table has zero records, the query optimizer chooses the incorrect execution plan -when I do select * from icecream, the optimizer now knows how many records are really in the icecream table, by knowing that the icecream table has partitions. select * from icecream won't have any direct effect on the optimization of subsequent queries. What probably happened is that autoanalyze ran in the background while you ran that select, and analyzed some of the partitions. Simply waiting a while would've had the same effect. Yeah. Also, the reason that a manual vacuum on icecream changes things yet again is that in 9.0 and up, we have a notion of summary stats across the whole inheritance tree, but autoanalyze hasn't been taught to gather those. The manual command on the parent table does gather them, though. So what's happening here is that we suddenly have an accurate idea of the size of the join product as a result of having inheritance summary stats to estimate with, and that drives the estimated cost of the merge or hash join down out of the stratosphere. The estimated cost of the nestloop goes down a lot too, but not as much. I experimented with a similar case here, and it seems like a lot of the remaining error in the nestloop estimate comes from this: - Seq Scan on icecream rc (cost=0.00..23.00 rows=5 width=24) (actual time=0.002..0.002 rows=0 loops=1) Filter: (dataset_id = 281) The indexscan on the nonempty child partition is estimated at less than 10 cost units, so this is a *large* fraction of what the planner sees as the per-outer-row cost of a nestloop. And with more than 11000 rows on the other side of the join, that discourages it from using the nestloop. In reality of course this takes negligible time compared to examining the child partition. Now why is the seqscan cost estimate so large, when actually the parent icecream table is totally empty? It's because the planner has been taught to never believe that an empty table is empty
[PERFORM] issue with query optimizer when joining two partitioned tables
I have run into issue where the query optimizer is choosing the wrong execution plan when I'm trying to join two large tables that have been partitioned. I would really appreciate it if someone could help me out this. I don't know whether I've found a bug in the optimizer, or whether there is some parameter/option I need to set in postgres. Below, I've included my execution plans. I'm using postgres 9.0.3, and I'm running this on a pretty beefy Linux server. My two tables: -widget: has 4041866 records, and is broken up into 4 partitions (no records are in the parent table). -icecream: I'm starting with zero records, but since this there could be billions of ice-cream records, I will partition and will not have any records in the parent table. So, then I then create my first partition in icecream table, and load 4041866 records into it. Here is the query I'm using to join the two tables: explain analyze SELECT r.widget_id, r.widget_type_id, avg(rc.cost)::double precision cost_avg FROM widget r, icecream rc WHERE r.widget_type_id = 4 and r.widgetset_id = 5 AND r.widget_id = rc.widget_id and rc.dataset_id = 281 group by r.widget_id,r.chromosome, r.start_pos, r.end_pos,r.widget_type_id ; Here is the corresponding execution plan: QUERY PLAN -- HashAggregate (cost=147262.20..147299.12 rows=1136 width=41) (actual time=31876.290..31904.880 rows=11028 loops=1) - Merge Join (cost=95574.83..112841.79 rows=1147347 width=41) (actual time=31130.870..31832.922 rows=11028 loops=1) Merge Cond: (r.widget_id = rc.widget_id) - Sort (cost=1913.89..1942.27 rows=11352 width=21) (actual time=56.818..68.701 rows=11028 loops=1) Sort Key: r.widget_id Sort Method: quicksort Memory: 1246kB - Append (cost=4.28..1149.30 rows=11352 width=21) (actual time=0.139..40.513 rows=11028 loops=1) - Bitmap Heap Scan on widget r (cost=4.28..12.75 rows=1 width=48) (actual time=0.030..0.030 rows=0 loops=1) Recheck Cond: (widgetset_id = 5) Filter: (widget_type_id = 4) - Bitmap Index Scan on widget_widgetset_id_idx (cost=0.00..4.28 rows=4 width=0) (actual time=0.023..0.023 rows=0 loops=1) Index Cond: (widgetset_id = 5) - Index Scan using widget_part_5_widget_widget_type_id_idx on widget_part_5 r (cost=0.00..1136.55 rows=11351 width=21) (actual time=0.106..18.489 rows=11028 loops=1) Index Cond: (widget_type_id = 4) Filter: (widgetset_id = 5) - Sort (cost=93660.94..93711.47 rows=20214 width=24) (actual time=29730.522..30766.354 rows=946140 loops=1) Sort Key: rc.widget_id Sort Method: external sort Disk: 165952kB - Append (cost=0.00..92215.33 rows=20214 width=24) (actual time=0.057..13731.204 rows=4041866 loops=1) - Seq Scan on icecream rc (cost=0.00..23.00 rows=5 width=24) (actual time=0.002..0.002 rows=0 loops=1) Filter: (dataset_id = 281) - Seq Scan on icecream_part_281 rc (cost=0.00..92192.33 rows=20209 width=24) (actual time=0.051..5427.730 rows=4041866 loops=1) Filter: (dataset_id = 281) Total runtime: 33182.945 ms (24 rows) The query is doing a merge join, is taking 33 seconds, but should take less than a second. So, then I do: select * from icecream; Now, when I run the same query again, I get a different and correct execution plan (nested loop), and the query takes less than 1 second as I would expect. QUERY PLAN --- HashAggregate (cost=7223611.41..7223648.33 rows=1136 width=41) (actual time=392.822..420.166 rows=11028 loops=1) - Nested Loop (cost=4.28..341195.22 rows=229413873 width=41) (actual time=0.231..331.800 rows=11028 loops=1) Join Filter: (r.widget_id = rc.widget_id) - Append (cost=4.28..1149.30 rows=11352 width=21) (actual time=0.051..50.181 rows=11028 loops=1) - Bitmap Heap Scan on widget r (cost=4.28..12.75 rows=1 width=48) (actual time=0.013..0.013 rows=0 loops=1) Recheck Cond: (widgetset_id = 5) Filter: (widget_type_id = 4) - Bitmap Index Scan on widget_widgetset_id_idx (cost=0.00..4.28 rows=4 width=0) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (widgetset_id = 5) - Index Scan using