details regarding buffer usage: for database 1:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1534125.08..1534125.09 rows=1 width=0) (actual time=9149.366..9149.366 rows=1 loops=1) Buffers: shared hit=137991 -> Merge Join (cost=2.48..1514765.90 rows=7743672 width=0) (actual time=0.091..9075.008 rows=998038 loops=1) Merge Cond: (pc.did = tc.did) Buffers: shared hit=137991 -> Index Only Scan using pc_did_idx on pc (cost=0.00..12987.04 rows=499616 width=4) (actual time=0.017..58.237 rows=499616 loops=1) Heap Fetches: 0 Buffers: shared hit=1369 -> Index Only Scan using tc_did_idx on tc (cost=0.00..1298125.32 rows=49987616 width=4) (actual time=0.015..5301.727 rows=49997291 loops=1) Heap Fetches: 0 Buffers: shared hit=136622 Total runtime: 9149.414 ms (12 rows) for database 2: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=35279895.52..35279895.53 rows=1 width=0) (actual time=2386.865..2386.865 rows=1 loops=1) Buffers: shared hit=2235978 read=208446 -> Nested Loop (cost=0.00..35276697.82 rows=1279080 width=0) (actual time=0.049..2292.338 rows=1000834 loops=1) Buffers: shared hit=2235978 read=208446 -> Index Only Scan using pc_did_idx on pc (cost=0.00..15224.56 rows=500384 width=4) (actual time=0.016..108.407 rows=500384 loops=1) Heap Fetches: 500384 Buffers: shared hit=6 read=3579 -> Index Only Scan using tc_did_idx on tc (cost=0.00..70.44 rows=3 width=4) (actual time=0.003..0.004 rows=2 loops=500384) Index Cond: (did = pc.did) Heap Fetches: 1000834 Buffers: shared hit=2235972 read=204867 Total runtime: 2386.914 ms (12 rows) On Wed, Jul 31, 2013 at 3:16 PM, Sandeep Gupta <gupta.sand...@gmail.com>wrote: > Hi Pavel, > > Yes. The postgresql.conf is exactly the same. The have the same index > and clustering and are on the same compute node as well but running on > different ports. > > -Sandeep > > > > On Wed, Jul 31, 2013 at 3:14 PM, Pavel Stehule <pavel.steh...@gmail.com>wrote: > >> Hello >> >> do you have same configuration? >> >> Regards >> >> Pavel >> >> 2013/7/31 Sandeep Gupta <gupta.sand...@gmail.com>: >> > I have two postgres instances each with a database of same schema. The >> > dataset in both is ''same'' but for randomness i.e. both contain two >> tables >> > pc(did) and tc(pid, did) that have almost >> > same number of rows and have been generate from same distribution. >> > >> > However the query plan for the join turns out to be completely >> different: on >> > one join takes 2.3 secs while on the other it takes 7 secs. >> > >> > >> > Here are the statistics: >> > >> > for database 1: >> > size of tc table: 49987585 >> > size of pc table: 499616 >> > >> > join plan: >> > >> > QUERY PLAN >> > >> ------------------------------------------------------------------------------------------------------------------------------------------------------- >> > Aggregate (cost=1534125.08..1534125.09 rows=1 width=0) (actual >> > time=8473.296..8473.296 rows=1 loops=1) >> > -> Merge Join (cost=2.48..1514765.90 rows=7743672 width=0) (actual >> > time=0.084..8409.065 rows=998038 loops=1) >> > Merge Cond: (pc.did = tc.did) >> > -> Index Only Scan using pc_did_idx on pc >> (cost=0.00..12987.04 >> > rows=499616 width=4) (actual time=0.016..58.202 rows=499616 loops=1) >> > Heap Fetches: 0 >> > -> Index Only Scan using tc_did_idx on tc >> (cost=0.00..1298125.32 >> > rows=49987616 width=4) (actual time=0.014..5141.809 rows=49997291 >> loops=1) >> > Heap Fetches: 0 >> > Total runtime: 8473.337 ms >> > ' >> > >> > Query Running time: 5135 >> > >> > >> > for database 2: >> > size of tc table: 50012415 >> > size of pc table: 500384 >> > >> > QUERY >> > PLAN >> > >> ------------------------------------------------------------------------------------------------------------------------------------------------ >> > Aggregate (cost=35279895.52..35279895.53 rows=1 width=0) (actual >> > time=2501.970..2501.970 rows=1 loops=1) >> > -> Nested Loop (cost=0.00..35276697.82 rows=1279080 width=0) >> (actual >> > time=0.038..2418.766 rows=1000834 loops=1) >> > -> Index Only Scan using pc_did_idx on pc >> (cost=0.00..15224.56 >> > rows=500384 width=4) (actual time=0.017..109.080 rows=500384 loops=1) >> > Heap Fetches: 500384 >> > -> Index Only Scan using tc_did_idx on tc (cost=0.00..70.44 >> > rows=3 width=4) (actual time=0.004..0.004 rows=2 loops=500384) >> > Index Cond: (did = pc.did) >> > Heap Fetches: 1000834 >> > Total runtime: 2502.017 ms >> > >> > Query running time: 2090.388 ms >> > >> > My question is why is the query plan so different for two datasets that >> are >> > really exactly the same. And how can i force the plan to be nested index >> > scan on >> > database 1 . >> > >> > >> > -Sandeep >> > >> > >