Re: [HACKERS] star schema and the optimizer

Marc Cousin Sat, 28 Feb 2015 00:04:59 -0800

On 27/02/2015 20:01, Marc Cousin wrote:
> On 27/02/2015 19:45, Tom Lane wrote:
>>> I wrote:
>>>> I had actually thought that we'd fixed this type of problem in recent
>>>> versions, and that you should be able to get a plan that would look
>>>> like
>>
>>>> Nestloop
>>>>    -> scan dim1
>>>>    -> Nestloop
>>>>         -> scan dim2
>>>>         -> indexscan fact table using dim1.a and dim2.b
>>
>> After closer study, I think this is an oversight in commit
>> e2fa76d80ba571d4de8992de6386536867250474, which quoth
>>
>> +It can be useful for the parameter value to be passed down through
>> +intermediate layers of joins, for example:
>> +
>> +    NestLoop
>> +        -> Seq Scan on A
>> +        Hash Join
>> +            Join Condition: B.Y = C.W
>> +            -> Seq Scan on B
>> +            -> Index Scan using C_Z_IDX on C
>> +                Index Condition: C.Z = A.X
>> +
>> +If all joins are plain inner joins then this is unnecessary, because
>> +it's always possible to reorder the joins so that a parameter is used
>> +immediately below the nestloop node that provides it.  But in the
>> +presence of outer joins, join reordering may not be possible, and then
>> +this option can be critical.  Before version 9.2, Postgres used ad-hoc
>>
>> This reasoning overlooked the fact that if we need parameters from
>> more than one relation, and there's no way to join those relations
>> to each other directly, then we have to allow passing the dim1 parameter
>> down through the join to dim2.
>>
>> The attached patch seems to fix it (modulo the need for some updates
>> in the README, and maybe a regression test).  Could you see if this
>> produces satisfactory plans for you?
> 
> 
> From what I see, it's just perfect. I'll give it a more thorough look a
> bit later, but it seems to be exactly what I was waiting for.
> 
> Thanks a lot.
> 
> Regards


I gave it another look this morning. It works very well with the initial test 
schema. The situation is much improved for me.

I still have one issue: I've extended the test to more than 2 dimensions.

marc=# explain analyze SELECT * FROM dim1,dim2,dim3,dim4,facts WHERE 
facts.dim1=dim1.a and facts.dim2=dim2.a and facts.dim3=dim3.a and 
facts.dim4=dim4.a and dim1.b between 10 and 60 AND dim2.b between 30 and 50 and 
dim3.b=8526 and dim4.b between 70 and 90; 
                                                                      QUERY 
PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=15.00..957710.99 rows=1 width=200) (actual 
time=793.247..793.247 rows=0 loops=1)
   ->  Nested Loop  (cost=14.71..957704.75 rows=1 width=192) (actual 
time=793.245..793.245 rows=0 loops=1)
         Join Filter: (facts.dim3 = dim3.a)
         Rows Removed by Join Filter: 942
         ->  Index Scan using idxdim32 on dim3  (cost=0.29..3300.29 rows=10 
width=8) (actual time=49.498..67.923 rows=1 loops=1)
               Filter: (b = 8526)
               Rows Removed by Filter: 99999
         ->  Materialize  (cost=14.41..954262.56 rows=962 width=184) (actual 
time=0.552..724.988 rows=942 loops=1)
               ->  Nested Loop  (cost=14.41..954257.75 rows=962 width=184) 
(actual time=0.542..723.380 rows=942 loops=1)
                     ->  Index Scan using idxdim22 on dim2  (cost=0.29..3648.29 
rows=192 width=16) (actual time=0.074..47.445 rows=229 loops=1)
                           Filter: ((b >= 30) AND (b <= 50))
                           Rows Removed by Filter: 99771
                     ->  Nested Loop  (cost=14.12..4946.08 rows=501 width=168) 
(actual time=2.575..2.946 rows=4 loops=229)
                           ->  Bitmap Heap Scan on dim1  (cost=13.43..573.60 
rows=501 width=16) (actual time=0.170..0.647 rows=509 loops=229)
                                 Recheck Cond: ((b >= 10) AND (b <= 60))
                                 Heap Blocks: exact=78547
                                 ->  Bitmap Index Scan on idxdim11  
(cost=0.00..13.30 rows=501 width=0) (actual time=0.112..0.112 rows=509 
loops=229)
                                       Index Cond: ((b >= 10) AND (b <= 60))
                           ->  Index Scan using idxfacts on facts  
(cost=0.69..8.72 rows=1 width=152) (actual time=0.004..0.004 rows=0 
loops=116561)
                                 Index Cond: ((dim1 = dim1.a) AND (dim2 = 
dim2.a))
   ->  Index Scan using idxdim42 on dim4  (cost=0.29..6.24 rows=1 width=8) 
(never executed)
         Index Cond: (a = facts.dim4)
         Filter: ((b >= 70) AND (b <= 90))
 Planning time: 8.092 ms
 Execution time: 793.621 ms
(25 lignes)

marc=# \d facts 
      Table « public.facts »
 Colonne |  Type  | Modificateurs 
---------+--------+---------------
 dim1    | bigint | 
 dim2    | bigint | 
 dim3    | bigint | 
 dim4    | bigint | 
 dim5    | bigint | 
 dim6    | bigint | 
 dim7    | bigint | 
 dim8    | bigint | 
 dim9    | bigint | 
 dim10   | bigint | 
 data1   | text   | 
 data2   | text   | 
 data3   | text   | 
 data4   | text   | 
 data5   | text   | 
 data6   | text   | 
Index :
    "idxfacts" btree (dim1, dim2, dim3, dim4, dim5, dim6, dim7, dim8, dim9, 
dim10)

I hoped that the dim3=dim3.a condition could be used in the index scan on facts 
(I have chosen the 8526 value because it only has one matching row). Maybe 
that's not possible, or not efficient, but I thought I'd rather ask.


Regards.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] star schema and the optimizer

Reply via email to