Re: [PERFORM] Expected performance of querying 5k records from 4 million records?

2012-06-18 Thread Anish Kejariwal
Thanks for the help, Jeff and Josh.  It looks reclustering the multi-column
index might solve things.  For my particular query, because I'm getting a
range of records back, it makes sense that reclustering will benefit me if
I have a slow disk even if I had expected that the indices would be
sufficient .  I now need to make sure that the speed up I'm seeing is not
because things have been cached.

That being said, here's what I have:
2CPUs, 12 physical cores, hyperthreaded (24 virtual cores), 2.67Ghz
96G RAM, 80G available to dom0
CentOS 5.8, Xen
3Gbps SATA (7200 RPM, Hitachi ActiveStar Enterprise Class)

So, I have lots of RAM, but not necessarily the fastest disk.

default_statistics_target = 50 # pgtune wizard 2011-03-16
maintenance_work_mem = 1GB # pgtune wizard 2011-03-16
constraint_exclusion = on # pgtune wizard 2011-03-16
checkpoint_completion_target = 0.9 # pgtune wizard 2011-03-16
effective_cache_size = 24GB # pgtune wizard 2011-03-16
work_mem = 192MB # pgtune wizard 2011-03-16
wal_buffers = 8MB # pgtune wizard 2011-03-16
checkpoint_segments = 128 # pgtune wizard 2011-03-16, amended by am,
30may2011
shared_buffers = 4GB # pgtune wizard 2011-03-16
max_connections = 100 # pgtune wizard 2011-03-16: 80, bumped up to 100
max_locks_per_transaction = 1000

I didn't know about explain (analyze,buffers).  Very cool.  So, based on
your advice,  I ran it and here's what I found:

1st time I ran the query:
QUERY PLAN

---
 Bitmap Heap Scan on data_part_213  (cost=113.14..13725.77 rows=4189
width=16) (actual time=69.807..2763.174 rows=5350 loops=1)
   Recheck Cond: ((data_id = 50544630) AND (data_id = 50549979))
   Filter: ((dataset_id = 213) AND (stat_id = 6))
   Buffers: shared read=4820
   -  Bitmap Index Scan on data_unq_213  (cost=0.00..112.09 rows=5142
width=0) (actual time=51.918..51.918 rows=5350 loops=1)
 Index Cond: ((data_id = 50544630) AND (data_id = 50549979))
 Buffers: shared read=19
 Total runtime: 2773.099 ms
(8 rows)

the second time I run the query it's very fast, since all the buffered read
counts have turned into hit counts showing I'm reading from cache (as I
expected):
   QUERY PLAN


 Bitmap Heap Scan on data_part_213  (cost=113.14..13725.77 rows=4189
width=16) (actual time=1.661..14.376 rows=5350 loops=1)
   Recheck Cond: ((data_id = 50544630) AND (data_id = 50549979))
   Filter: ((dataset_id = 213) AND (stat_id = 6))
   Buffers: shared hit=4819
   -  Bitmap Index Scan on data_unq_213  (cost=0.00..112.09 rows=5142
width=0) (actual time=0.879..0.879 rows=5350 loops=1)
 Index Cond: ((data_id = 50544630) AND (data_id = 50549979))
 Buffers: shared hit=18
 Total runtime: 20.232 ms
(8 rows)



Next, I tried reclustering a partition with the multicolumn-index.  the big
things is that the read count has dropped dramatically!
 Index Scan using data_part_214_dataset_stat_data_idx on data_part_214
 (cost=0.00..7223.05 rows=4265 width=16) (actual time=0.093..7.251
rows=5350 loops=1)
   Index Cond: ((dataset_id = 214) AND (data_id = 50544630) AND (data_id
= 50549979) AND (stat_id = 6))
   Buffers: shared hit=45 read=24
 Total runtime: 12.929 ms
(4 rows)


second time:
--
 Index Scan using data_part_214_dataset_stat_data_idx on data_part_214
 (cost=0.00..7223.05 rows=4265 width=16) (actual time=0.378..7.696
rows=5350 loops=1)
   Index Cond: ((dataset_id = 214) AND (data_id = 50544630) AND (data_id
= 50549979) AND (stat_id = 6))
   Buffers: shared hit=68
 Total runtime: 13.511 ms
(4 rows)

So, it looks like clustering the index appropriately fixes things!  Also,
I'll recreate the index switching the order to (dataset_id, stat_id,data_id)

thanks!

On Fri, Jun 15, 2012 at 11:20 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Fri, Jun 15, 2012 at 9:17 AM, Anish Kejariwal anish...@gmail.com
 wrote:
 
  Below are the tables, queries, and execution plans with my questions with
  more detail.  (Since I have 250 partitions, I can query one partition
 after
  the other to ensure that I'm not pulling results form the cache)

 Doesn't that explain why it is slow?  If you have 15000 rpm drives and
 each row is in a different block and uncached, it would take 20
 seconds to read them all in.  You are getting 10 times better than
 that, either due to caching or because your rows are clustered, or
 because effective_io_concurrency is doing its thing.

 
  explain analyze select data_id, dataset_id, stat from

[PERFORM] Expected performance of querying 5k records from 4 million records?

2012-06-15 Thread Anish Kejariwal
I've been struggling with this issue for the last several days, and I feel
like I'm running into a few different issues that I don't understand.  I'm
using postgres 9.0.8, and here's the OS I'm running this on:
inux  2.6.18-308.4.1.el5xen #1 SMP Tue Apr 17 17:49:15 EDT 2012 x86_64
x86_64 x86_64 GNU/Linux

from show all:
shared_buffers  | 4GB
work_mem| 192MB
 maintenance_work_mem| 1GB
effective_cache_size| 24GB
 wal_sync_method | fdatasync
random_page_cost| 4

My situtation: I have an empty parent table, that has 250 partitions. Each
partition has 4 million records (250 megs).  I'm querying 5k records
directly from one partition (no joins), and it's taking ~2 seconds to get
the results.  This feels very slow to me for an indexed table of only 4
million records.

Quick overview of my questions::
1. expected performance? tips on what to look into to increase performance?
2. should multicolumn indices help?
3. does reindex table cache the table?

Below are the tables, queries, and execution plans with my questions with
more detail.  (Since I have 250 partitions, I can query one partition after
the other to ensure that I'm not pulling results form the cache)

Parent table:
# \d data
   Table public.data
Column|   Type   | Modifiers
--+--+---
 data_id| integer  | not null
 dataset_id   | integer  | not null
 stat| double precision | not null
 stat_id | integer  | not null
Number of child tables: 254 (Use \d+ to list them.)


Child (partition) with ~4 million records:

\d data_part_201
genepool_1_11=# \d data_part_201
   Table public.data_part_201
Column|   Type   | Modifiers
--+--+---
 data_id| integer  | not null
 dataset_id   | integer  | not null
 stat| double precision | not null
 stat_id | integer  | not null
Indexes:
data_unq_201 UNIQUE, btree (data_id)
data_part_201_dataset_id_idx btree (dataset_id)
data_part_201_stat_id_idx btree (stat_id)
Check constraints:
data_chk_201 CHECK (dataset_id = 201)
Inherits: data

explain analyze select data_id, dataset_id, stat from data_part_201 where
dataset_id = 201
and stat_id = 6 and data_id=50544630 and data_id=50549979;

 Bitmap Heap Scan on data_part_201  (cost=115.79..14230.69 rows=4383
width=16) (actual time=36.103..1718.141 rows=5350 loops=1)
   Recheck Cond: ((data_id = 50544630) AND (data_id = 50549979))
   Filter: ((dataset_id = 201) AND (stat_id = 6))
   -  Bitmap Index Scan on data_unq_201  (cost=0.00..114.70 rows=5403
width=0) (actual time=26.756..26.756 rows=5350 loops=1)
 Index Cond: ((data_id = 50544630) AND (data_id = 50549979))
 Total runtime: 1728.447 ms
(6 rows)

Time: 1743.535 ms

QUESTION 1: you can see that the query is very simple.  is this the optimal
execution plan? any tips on what to look into to increase performance?

I then tried adding the following multi-column index:
data_part_202_dataset_regionset_data_idx btree (dataset_id, data_id,
stat_id)

The query now takes 27 seconds!:
explain analyze select data_id, dataset_id, stat from data_part_202 where
dataset_id = 202
and stat_id = 6 and data_id=50544630 and data_id=50549979;

 Index Scan using data_part_202_dataset_regionset_data_idx on data_part_202
 (cost=0.00..7987.83 rows=4750 width=16) (actual time=39.152..27339.401
rows=5350 loops=1)
   Index Cond: ((dataset_id = 202) AND (data_id = 50544630) AND (data_id
= 50549979) AND (stat_id = 6))
 Total runtime: 27349.091 ms
(3 rows)

QUESTION 2: why is a multicolumn index causing the query to run so much
slower?  I had expected it to increase the performance


QUESTION 3:
If I do the following:  reindex table data_part_204 the query now takes
50-70 milliseconds.  Is this because the table is getting cached?  How do I
know if a particular query is coming from the cache? The reason why I think
reindex table is caching the results, is that select count(*) from the
partition also causes the query to be fast.

(and yes, vacuum analyze on the partition makes no difference)


Re: [PERFORM] Calculating statistic via function rather than with query is slowing my query

2011-08-18 Thread Anish Kejariwal
Hi Craig,

Fair point.  For now, I mean just fast - which is 5-15 seconds, but I'd
like to get it down to the 1-2 second range.

From the query I provided, I have approximately 30,000 unique keys (what I
called primary_id) that I'm grouping by, and each key has a series of
numerical values for each of the type_ids.  I'm looking at averages, stddev
and other statistics across a few hundred type_ids (where agg.type_id in
).  The part of the query that varies is the user specified type_ids,
which makes it impossible to precalculate my statistics.

I'd like this to eventually scale to a million unique keys, and a thousand
type_ids.

For now Postgres been great for modeling the data, understanding where I hit
performance bottle necks, and providing a fast enough user interface.  But,
I'm definitely starting to think about whether I can cache my data (with
millions of keys and thousands of type_ids, the data might be too large),
and whether to look into distributed databases (even thought I can't
precompute the stats, my queries are easily distributable across multiple
processors since each processor could take a batch of keys).  I might even
want to consider a column oriented database  - since my keys don't change
often, I could potentially add new columns when there are new type_ids.

I've been thinking of looking into memcached or hbase.  If you have any
suggestions on which options I should explore, I'd greatly appreciate it.

Sorry, for veering off topic a bit from postgres.

thanks,
Anish







On Wed, Aug 17, 2011 at 10:32 PM, Craig Ringer ring...@ringerc.id.auwrote:

 On 18/08/2011 9:03 AM, Anish Kejariwal wrote:

 Thanks for the help Pavel and Craig.  I really appreciate it.  I'm going
 to try a couple of these different options (write a c function, use a sql
 function with case statements, and use plperl), so I can see which gives me
 the realtime performance that I need, and works best for clean code in my
 particular case.

 Do you really mean realtime? Or just fast?

 If you have strongly bounded latency requirements, any SQL-based,
 disk-based system is probably not for you. Especially not one that relies on
 a statics-based query planner, caching, and periodic checkpoints. I'd be
 looking into in-memory databases designed for realtime environments where
 latency is critical.

 Hard realtime: If this system fails to respond within x milliseconds, all
 the time, every time, then something will go smash or boom expensively
 and unrecoverably.

 Soft realtime: If this system responds late, the late response is expensive
 or less useful. Frequent late responses are unacceptable but the occasional
 one might be endurable.

 Just needs to be fast: If it responds late, the user gets irritated because
 they're sitting and waiting for a response. Regular long stalls are
 unacceptable, but otherwise the user can put up with it. You're more
 concerned with average latency than maximum latency.

 --
 Craig Ringer



[PERFORM] Calculating statistic via function rather than with query is slowing my query

2011-08-17 Thread Anish Kejariwal
Hi everyone,

I'm using postgres 9.0.3, and here's the OS I'm running this on:
Linux 2.6.18-238.12.1.el5xen #1 SMP Tue May 31 14:02:29 EDT 2011 x86_64
x86_64 x86_64 GNU/Linux

I have a fairly straight forward query.  I'm doing a group by on an ID, and
then calculating some a statistic on the resulting data.  The problem I'm
running into is that when I'm calculating the statistics via a function,
it's twice as slow as when I'm calculating the statistics directly in my
query.  I want to be able to use a function, since I'll be using this
particular calculation in many places.

Any idea of what's going on?  Below, I've included my function, and both
queries (I removed the type_ids, and just wrote …ids…

Here's my function (I also tried stable):
CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c
integer)
RETURNS double precision AS $$
BEGIN
return a/b/c* 10::double precision;
END;
$$ LANGUAGE plpgsql immutable;


The query that takes 7.6 seconds, when I calculate the statistic from within
the query:
explain analyze
select
   agg.primary_id,
   avg(agg.a / agg.b / agg.c * 10::double precision) foo,
   stddev(agg.a / agg.b / agg.c * 10::double precision) bar
from mytable agg
where agg.type_id in (ids)
group by agg.primary_id;

The execution plan:
 HashAggregate  (cost=350380.58..350776.10 rows=9888 width=20) (actual
time=7300.414..7331.659 rows=20993 loops=1)
   -  Bitmap Heap Scan on mytable agg  (cost=28667.90..337509.63
rows=1716127 width=20) (actual time=200.064..2861.600 rows=2309230 loops=1)
 Recheck Cond: (type_id = ANY ('{ids}'::integer[]))
 -  Bitmap Index Scan on mytable_type_id_idx  (cost=0.00..28238.87
rows=1716127 width=0) (actual time=192.725..192.725 rows=2309230 loops=1)
   Index Cond: (type_id = ANY ('{ids}'::integer[]))
 Total runtime: 7358.337 ms
(6 rows)




The same query, but now I'm calling the function.  When I call the function
it's taking 15.5 seconds.
explain analyze select
   agg.primary_id,
   avg(calc_test(agg.a,agg.b,agg.c)) foo,
   stddev(calc_test(agg.a,agg.b,agg.c)) bar
from mytable agg
where agg.type_id in (ids)
group by agg.primary_id;

and, here's the execution plan:

 HashAggregate  (cost=350380.58..355472.90 rows=9888 width=20) (actual
time=13660.838..13686.618 rows=20993 loops=1)
   -  Bitmap Heap Scan on mytable agg  (cost=28667.90..337509.63
rows=1716127 width=20) (actual time=170.385..2881.122 rows=2309230 loops=1)
 Recheck Cond: (type_id = ANY ('{ids}'::integer[]))
 -  Bitmap Index Scan on mytable_type_id_idx  (cost=0.00..28238.87
rows=1716127 width=0) (actual time=162.834..162.834 rows=2309230 loops=1)
   Index Cond: (type_id = ANY ('{ids}'::integer[]))
 Total runtime: 13707.560 ms


Thanks!

Anish


Re: [PERFORM] Calculating statistic via function rather than with query is slowing my query

2011-08-17 Thread Anish Kejariwal
Thanks Pavel! that definitely solved it.

Unfortunately, the function I gave you was a simple/short version of what
the actual function is going to be.  The actual function is going to get
parameters passed to it, and based on the parameters will go through some
if...else conditions, and maybe even call another function.  Based on that,
I was definitely hoping to use plpgsql, and the overhead is unfortunate.

Is there any way to get around this overhead?  Will I still have the same
overhead if I use plperl, plpython, pljava, or write the function in C?

Anish


On Wed, Aug 17, 2011 at 11:27 AM, Pavel Stehule pavel.steh...@gmail.comwrote:

 Hello

 2011/8/17 Anish Kejariwal anish...@gmail.com:
  Hi everyone,
  I'm using postgres 9.0.3, and here's the OS I'm running this on:
  Linux 2.6.18-238.12.1.el5xen #1 SMP Tue May 31 14:02:29 EDT 2011 x86_64
  x86_64 x86_64 GNU/Linux
  I have a fairly straight forward query.  I'm doing a group by on an ID,
 and
  then calculating some a statistic on the resulting data.  The problem I'm
  running into is that when I'm calculating the statistics via a function,
  it's twice as slow as when I'm calculating the statistics directly in my
  query.  I want to be able to use a function, since I'll be using this
  particular calculation in many places.
  Any idea of what's going on?  Below, I've included my function, and both
  queries (I removed the type_ids, and just wrote …ids…
  Here's my function (I also tried stable):
  CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c
  integer)
  RETURNS double precision AS $$
  BEGIN
  return a/b/c* 10::double precision;
  END;
  $$ LANGUAGE plpgsql immutable;
 

 this is overhead of plpgsql call. For this simple functions use a SQL
 functions instead

 CREATE OR REPLACE FUNCTION calc_test(a double precision, b integer, c
  integer)
  RETURNS double precision AS $$
  SELECT $1/$2/$3* 10::double precision;
  $$ LANGUAGE sql;

 Regards

 Pavel Stehule

  The query that takes 7.6 seconds, when I calculate the statistic from
 within
  the query:
  explain analyze
  select
 agg.primary_id,
 avg(agg.a / agg.b / agg.c * 10::double precision) foo,
 stddev(agg.a / agg.b / agg.c * 10::double precision) bar
  from mytable agg
  where agg.type_id in (ids)
  group by agg.primary_id;
  The execution plan:
   HashAggregate  (cost=350380.58..350776.10 rows=9888 width=20) (actual
  time=7300.414..7331.659 rows=20993 loops=1)
 -  Bitmap Heap Scan on mytable agg  (cost=28667.90..337509.63
  rows=1716127 width=20) (actual time=200.064..2861.600 rows=2309230
 loops=1)
   Recheck Cond: (type_id = ANY ('{ids}'::integer[]))
   -  Bitmap Index Scan on mytable_type_id_idx
  (cost=0.00..28238.87
  rows=1716127 width=0) (actual time=192.725..192.725 rows=2309230 loops=1)
 Index Cond: (type_id = ANY ('{ids}'::integer[]))
   Total runtime: 7358.337 ms
  (6 rows)
 
 
 
  The same query, but now I'm calling the function.  When I call the
 function
  it's taking 15.5 seconds.
  explain analyze select
 agg.primary_id,
 avg(calc_test(agg.a,agg.b,agg.c)) foo,
 stddev(calc_test(agg.a,agg.b,agg.c)) bar
  from mytable agg
  where agg.type_id in (ids)
  group by agg.primary_id;
  and, here's the execution plan:
   HashAggregate  (cost=350380.58..355472.90 rows=9888 width=20) (actual
  time=13660.838..13686.618 rows=20993 loops=1)
 -  Bitmap Heap Scan on mytable agg  (cost=28667.90..337509.63
  rows=1716127 width=20) (actual time=170.385..2881.122 rows=2309230
 loops=1)
   Recheck Cond: (type_id = ANY ('{ids}'::integer[]))
   -  Bitmap Index Scan on mytable_type_id_idx
  (cost=0.00..28238.87
  rows=1716127 width=0) (actual time=162.834..162.834 rows=2309230 loops=1)
 Index Cond: (type_id = ANY ('{ids}'::integer[]))
   Total runtime: 13707.560 ms
 
  Thanks!
  Anish



Re: [PERFORM] Calculating statistic via function rather than with query is slowing my query

2011-08-17 Thread Anish Kejariwal
Thanks for the help Pavel and Craig.  I really appreciate it.  I'm going to
try a couple of these different options (write a c function, use a sql
function with case statements, and use plperl), so I can see which gives me
the realtime performance that I need, and works best for clean code in my
particular case.

thanks!

Anish

On Wed, Aug 17, 2011 at 5:05 PM, Craig Ringer ring...@ringerc.id.au wrote:

  On 18/08/2011 3:00 AM, Anish Kejariwal wrote:

 Thanks Pavel! that definitely solved it.

  Unfortunately, the function I gave you was a simple/short version of what
 the actual function is going to be.  The actual function is going to get
 parameters passed to it, and based on the parameters will go through some
 if...else conditions, and maybe even call another function.  Based on that,
 I was definitely hoping to use plpgsql, and the overhead is unfortunate.

  Is there any way to get around this overhead?  Will I still have the same
 overhead if I use plperl, plpython, pljava, or write the function in C?


 You can probably still write it as an SQL function if you use CASE WHEN
 appropriately.

 --
 Craig Ringer



Re: [PERFORM] issue with query optimizer when joining two partitioned tables

2011-07-10 Thread Anish Kejariwal
 relpages from pg_class where relname = 'icecream';

 relpages
--
0
(1 row)

Ok, so the planner thinks that the parent table is empty.  I then ran:
update pg_class set relpages = 1 where relname = 'icecream';


 HashAggregate  (cost=201199.27..201564.18 rows=11228 width=41) (actual
time=277.195..304.620 rows=11028 loops=1)
   -  Nested Loop  (cost=4.28..79942.45 rows=4041894 width=41) (actual
time=0.227..231.181 rows=11028 loops=1)
 Join Filter: (r.widget_id = rc.widget_id)
 -  Append  (cost=4.28..1163.69 rows=11228 width=21) (actual
time=0.125..40.834 rows=11028 loops=1)
   -  Bitmap Heap Scan on widget r  (cost=4.28..12.75 rows=1
width=48) (actual time=0.022..0.022 rows=0 loops=1)
 Recheck Cond: (widgetset_id = 5)
 Filter: (widget_type_id = 4)
 -  Bitmap Index Scan on widget_widgetset_id_idx
 (cost=0.00..4.28 rows=4 width=0) (actual time=0.019..0.019 rows=0 loops=1)
   Index Cond: (widgetset_id = 5)
   -  Index Scan using widget_part_5_widget_widget_type_id_idx
on widget_part_5 r  (cost=0.00..1150.94 rows=11227 width=21) (actual
time=0.100..18.964 rows=11028 lo
ops=1)
 Index Cond: (widget_type_id = 4)
 Filter: (widgetset_id = 5)
 -  Append  (cost=0.00..6.99 rows=2 width=24) (actual
time=0.008..0.012 rows=1 loops=11028)
   -  Seq Scan on icecream rc  (cost=0.00..0.00 rows=1
width=24) (actual time=0.001..0.001 rows=0 loops=11028)
 Filter: (rc.dataset_id = 283)
   -  Index Scan using icecream_part_283_widget_id_idx on
icecream_part_283 rc  (cost=0.00..6.99 rows=1 width=24) (actual
time=0.004..0.006 rows=1 loo
ps=11028)
 Index Cond: (rc.widget_id = r.widget_id)
 Filter: (rc.dataset_id = 283)
 Total runtime: 318.634 ms
(19 rows)


Wow!  that fixes it.  Thanks you so much I've been struggling with this
issue for 2-3 days.  (Also, in the past, I've seen inconsistent performance
with this query, which may be the result of the planner sometimes choosing
the wrong plan, but I'll chase that down later).

Tom said: But maybe we should reconsider the heuristic for tables that are
members of inheritance trees --- particularly parents of inheritance trees.

I agree.  I think postgres should get updated to take this into account.  I
shouldn't have to set the relpages to 1 for all the empty parent tables that
I have partitioned.  Should I file this as a bug/enhancement?

Also, do I need to worry about about autoanalyze/autovacuum setting back
relpages to zero for the parent icecream table?

thanks!!!
Anish



On Sat, Jul 9, 2011 at 10:43 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  On 09.07.2011 00:36, Anish Kejariwal wrote:
  My guess as to what happened:
  -because the icecream parent table has zero records, the query optimizer
  chooses the incorrect execution plan
  -when I do select * from icecream, the optimizer now knows how many
 records
  are really in the icecream table, by knowing that the icecream table has
  partitions.

  select * from icecream won't have any direct effect on the
  optimization of subsequent queries. What probably happened is that
  autoanalyze ran in the background while you ran that select, and
  analyzed some of the partitions. Simply waiting a while would've had the
  same effect.

 Yeah.  Also, the reason that a manual vacuum on icecream changes things
 yet again is that in 9.0 and up, we have a notion of summary stats
 across the whole inheritance tree, but autoanalyze hasn't been taught to
 gather those.  The manual command on the parent table does gather them,
 though.

 So what's happening here is that we suddenly have an accurate idea of
 the size of the join product as a result of having inheritance summary
 stats to estimate with, and that drives the estimated cost of the merge
 or hash join down out of the stratosphere.  The estimated cost of the
 nestloop goes down a lot too, but not as much.

 I experimented with a similar case here, and it seems like a lot of the
 remaining error in the nestloop estimate comes from this:

  -   Seq Scan on icecream rc  (cost=0.00..23.00 rows=5 width=24) (actual
 time=0.002..0.002 rows=0 loops=1)
 Filter: (dataset_id = 281)

 The indexscan on the nonempty child partition is estimated at less than
 10 cost units, so this is a *large* fraction of what the planner sees as
 the per-outer-row cost of a nestloop.  And with more than 11000 rows on
 the other side of the join, that discourages it from using the nestloop.
 In reality of course this takes negligible time compared to examining
 the child partition.

 Now why is the seqscan cost estimate so large, when actually the parent
 icecream table is totally empty?  It's because the planner has been
 taught to never believe that an empty table is empty

[PERFORM] issue with query optimizer when joining two partitioned tables

2011-07-08 Thread Anish Kejariwal
I have run into issue where the query optimizer is choosing the wrong
execution plan when I'm trying to join two large tables that have been
partitioned.  I would really appreciate it if someone could help me out
this.  I don't know whether I've found a bug in the optimizer, or whether
there is some parameter/option I need to set in postgres.  Below, I've
included my execution plans.  I'm using postgres 9.0.3, and I'm running this
on a pretty beefy Linux server.

My two tables:
-widget: has 4041866 records, and is broken up into 4 partitions (no records
are in the parent table).
-icecream: I'm starting with zero records, but since this there could be
billions of ice-cream records, I will partition and will not have any
records in the parent table.

So, then I then create my first partition in icecream table, and load
4041866 records into it.

Here is the query I'm using to join the two tables:
explain analyze
SELECT
  r.widget_id,
  r.widget_type_id,
  avg(rc.cost)::double precision cost_avg
FROM
  widget r,
  icecream rc
WHERE
r.widget_type_id = 4
and r.widgetset_id = 5
AND r.widget_id = rc.widget_id
and rc.dataset_id = 281
group by r.widget_id,r.chromosome, r.start_pos, r.end_pos,r.widget_type_id
;

Here is the corresponding execution plan:


  QUERY PLAN

--
 HashAggregate  (cost=147262.20..147299.12 rows=1136 width=41) (actual
time=31876.290..31904.880 rows=11028 loops=1)
   -  Merge Join  (cost=95574.83..112841.79 rows=1147347 width=41) (actual
time=31130.870..31832.922 rows=11028 loops=1)
 Merge Cond: (r.widget_id = rc.widget_id)
 -  Sort  (cost=1913.89..1942.27 rows=11352 width=21) (actual
time=56.818..68.701 rows=11028 loops=1)
   Sort Key: r.widget_id
   Sort Method:  quicksort  Memory: 1246kB
   -  Append  (cost=4.28..1149.30 rows=11352 width=21) (actual
time=0.139..40.513 rows=11028 loops=1)
 -  Bitmap Heap Scan on widget r  (cost=4.28..12.75
rows=1 width=48) (actual time=0.030..0.030 rows=0 loops=1)
   Recheck Cond: (widgetset_id = 5)
   Filter: (widget_type_id = 4)
   -  Bitmap Index Scan on widget_widgetset_id_idx
 (cost=0.00..4.28 rows=4 width=0) (actual time=0.023..0.023 rows=0 loops=1)
 Index Cond: (widgetset_id = 5)
 -  Index Scan using
widget_part_5_widget_widget_type_id_idx on widget_part_5 r
 (cost=0.00..1136.55 rows=11351 width=21) (actual time=0.106..18.489
rows=11028 loops=1)
   Index Cond: (widget_type_id = 4)
   Filter: (widgetset_id = 5)
 -  Sort  (cost=93660.94..93711.47 rows=20214 width=24) (actual
time=29730.522..30766.354 rows=946140 loops=1)
   Sort Key: rc.widget_id
   Sort Method:  external sort  Disk: 165952kB
   -  Append  (cost=0.00..92215.33 rows=20214 width=24) (actual
time=0.057..13731.204 rows=4041866 loops=1)
 -  Seq Scan on icecream rc  (cost=0.00..23.00 rows=5
width=24) (actual time=0.002..0.002 rows=0 loops=1)
   Filter: (dataset_id = 281)
 -  Seq Scan on icecream_part_281 rc
 (cost=0.00..92192.33 rows=20209 width=24) (actual time=0.051..5427.730
rows=4041866 loops=1)
   Filter: (dataset_id = 281)
 Total runtime: 33182.945 ms
(24 rows)


The query is doing a merge join, is taking 33 seconds, but should take less
than a second.  So, then I do: select * from icecream;

Now, when I run the same query again, I get a different and correct
execution plan (nested loop), and the query takes less than 1 second as I
would expect.



QUERY PLAN

---
 HashAggregate  (cost=7223611.41..7223648.33 rows=1136 width=41) (actual
time=392.822..420.166 rows=11028 loops=1)
   -  Nested Loop  (cost=4.28..341195.22 rows=229413873 width=41) (actual
time=0.231..331.800 rows=11028 loops=1)
 Join Filter: (r.widget_id = rc.widget_id)
 -  Append  (cost=4.28..1149.30 rows=11352 width=21) (actual
time=0.051..50.181 rows=11028 loops=1)
   -  Bitmap Heap Scan on widget r  (cost=4.28..12.75 rows=1
width=48) (actual time=0.013..0.013 rows=0 loops=1)
 Recheck Cond: (widgetset_id = 5)
 Filter: (widget_type_id = 4)
 -  Bitmap Index Scan on widget_widgetset_id_idx
 (cost=0.00..4.28 rows=4 width=0) (actual time=0.007..0.007 rows=0 loops=1)
   Index Cond: (widgetset_id = 5)
   -  Index Scan using