This is a good one :)

Here is a "brief" description of our issue(Postgres 9.0): 

Tables:
main fact table:
Table "public.parent_fact"
        Column        |            Type             |  
----------------------+-----------------------------+-----------
 etime                | date | not null
 pcamp_id             | integer                     | 
 location_id          | integer                     | 
 impressions          | bigint                      | 
 clicks              | int

this table partitioned by etime.

We are trying to build a report, which has last week numbers alongside with 
this 
week numbers. For example: if today is Wednesday, I want to compare daily 
numbers from last week 3 days (mon through wed) with this week 3 days(mon 
through wed).

To accomplish that, we've decided to build a transformation table, which has 
two 
columns:

 Table "public.trans_last_week"
  Column  |            Type             | Modifiers 
----------+-----------------------------+-----------
 etime    | date | 
 lw_etime | date |

So for each date(etime), we have lw_etime, which is essentially etime-7 days.

Here is the first query, which performs fine:

select    a11.location_id AS location_id,
    a11.pcamp_id AS  pcamp_id,
    sum(a11.clicks)
from    parent_fact    a11
where    a11.etime between '2011-14-18' and '2011-04-20'
group  by    a11.location_id,
    a11.pcamp_id

everything is good there -  it calculates numbers from the current week and 
goes 
to only 3 partitions to aggregate numbers. 

Here is the second query:

select    a11.location_id AS location_id,
    a11.pcamp_id AS  pcamp_id,
    sum(a11.clicks)
from    parent_fact    a11
    join    trans_last_week    a12
      on     (a11.etime = a12.lw_etime)
where    a12.etime between '2011-14-18' and '2011-04-20'
group  by    a11.location_id,
    a11.pcamp_id


Here it scans through all partitions in the parent_fact table and runs 3-4 
times 
slower.

What was noticed, that the only case when Postgres is actually going to execute 
the query against the right partitions is query #1. 

Is that by design? Second query join, will also result in 3 days(3 partitions) 

This query (#3) also scans all partitions:

select    a11.location_id AS location_id,
    a11.pcamp_id AS  pcamp_id,
    sum(a11.clicks)
from    parent_fact    a11
where    a11.etime in (select a12.etime from trans_last_week    a12 
where a11.etime = a12.lw_etime)
group  by    a11.location_id,
    a11.pcamp_id


Thank you!

Reply via email to