Re: POC, WIP: OR-clause support for indexes

Andrei Lepikhov Tue, 13 Feb 2024 02:03:43 -0800

On 13/2/2024 07:00, jian he wrote:

+ newa = makeNode(ArrayExpr);
+ /* array_collid will be set by parse_collate.c */
+ newa->element_typeid = scalar_type;
+ newa->array_typeid = array_type;
+ newa->multidims = false;
+ newa->elements = aexprs;
+ newa->location = -1;


I am confused by the comments `array_collid will be set by
parse_collate.c`, can you further explain it?

I wonder if the second paragraph of comments on commit b310b6e will beenough to dive into details.

if OR expression right arm is not plain Const, but with collation
specification, eg.
`where a  = 'a' collate "C" or a = 'b' collate "C";`

then the rightop is not Const, it will be CollateExpr, it will not be
used in transformation.

Yes, it is done for simplicity right now. I'm not sure about cornercases of merging such expressions.


set enable_or_transformation to on;
explain(timing off, analyze, costs off)
select count(*) from test where (x = 1 or x = 2 or x = 3 or x = 4 or x
= 5 or x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
35.376 ms

The time is the last result of the 10 iterations.

The reason here - parallel workers.

If you see into the plan you will find parallel workers withoutoptimization and absence of them in the case of optimization:


Gather  (cost=1000.00..28685.37 rows=87037 width=12)
        (actual rows=90363 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test

Filter: ((x = 1) OR (x = 2) OR (x = 3) OR (x = 4) OR (x = 5)OR (x = 6) OR (x = 7) OR (x = 8) OR (x = 9))


Seq Scan on test  (cost=0.02..20440.02 rows=90600 width=12)
                  (actual rows=90363 loops=1)
   Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))

Having 90600 tuples returned we estimate it into 87000 (less precisely)without transformation and 90363 (more precisely) with the transformation.But if you play with parallel_tuple_cost and parallel_setup_cost, youwill end up having these parallel workers:


 Gather  (cost=0.12..11691.03 rows=90600 width=12)
         (actual rows=90363 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test
         Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))
         Rows Removed by Filter: 303212

And some profit about 25%, on my laptop.

I'm not sure about the origins of such behavior, but it seems to be anissue of parallel workers, not this specific optimization.


--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC, WIP: OR-clause support for indexes

Reply via email to